Genome-wide markers associated with photosynthetic parameters
A subpanel of 102 diverse maize genotypes including stiff stalk, non-stiff stalk, and iodent sub-populations representing the Wisconsin Diversity Panel (WiDiv) in maize [31, 32] were selected for characterizing genotype responses to heat stress (Fig. 1a). A set of 1,132,322 variants (1,032,834 SNPs + 99,488 InDels) was employed for this population (see “Methods” for filtering details). The genotypes that we used for this study include 3 to 32 inbred lines in each of seven previously characterized sub-populations of maize (Fig. 1a). The selection criteria used to select the subset of inbreds from the larger WiDiv panel retains high levels of diversity while reducing the overall population structure. The first two principal components (PCs) did provide some separation based on these known sub-populations. In total, the first five PCs calculated from the genotype matrix explained 17.1% of population structure variation in the selected 102 genotypes and were used as covariates to control for population structure in subsequent analyses. Seedlings (14 days after sowing) for each of these genotypes were subjected to a 4-h heat stress at 40 °C. The third leaf for each plant was also assessed for three chlorophyll fluorescence-based parameters including (1) light-adapted effective quantum yield of PSII [Y(II)], and two non-photochemical dissipation routes, (2) non-regulated energy dissipation at PSII [Y(NO)], and (3) regulated energy dissipation at PSII [Y(NPQ)] in control and heat stress conditions. Higher Y(II) values indicated less photosynthetic stress, high Y(NPQ) values indicated ongoing photochemical stress that was being mitigated through carotenoid-based energy dissipation, and high Y(NO) values indicated ongoing photochemical stress that was not being mitigated. On average, plants exhibited lower Y(NO) values and higher Y(II) values under the heat stress condition compared to the control condition, while the center of recorded Y(NPQ) did not show a significant shift between control and heat conditions (Fig. 1b; Additional file 1: Fig. S1). All three traits [Y(II), Y(NPQ), and Y(NO)] exhibited higher broad-sense heritability (H2) in control (0.692, 0.682, and 0.716) compared to heat-stressed plants (0.538, 0.563, and 0.642), indicating more variation for these traits in heat stress condition compared to control plants.
The BLUP (best linear unbiased predictor) was applied to represent the phenotypic value of each genotype per trait in each condition in order to remove underlying variation. BLUPs from the chlorophyll fluorescence-based traits (Additional file 2: Table S1) were used to perform GWAS (genome-wide association study) using the same set of >1M variants in this population. We identified candidate association loci under two levels of thresholds (2.2e−6 and 1.1e−7, see “Methods”) using the FarmCPU model [33]. Distinct patterns of GWAS significant hits were observed for traits taken from control and heat stress conditions (Fig. 1c–e). In total, 20 candidate genes were identified to be associated with at least one of the significant loci (Additional file 2: Table S2). Several interesting candidate hits included SNPs located within introns of the farl8 (FAR1-like transcription factor 8, Zm00001d017164) gene that were associated with Y(NO) under the control condition. Orthologs of Zm00001d017164 in Arabidopsis—AT1G52520 and AT1G80010—were demonstrated to show visible phenotypes under abiotic stress [34]. A SNP located ~1kb downstream of the transcription factor—bbr3 (BBR/BPC-transcription factor 3, Zm00001d049800)—was uniquely identified to be significantly associated with Y(II) measured under the heat condition. Zm00001d049800 encoded as a member of the BASIC PENTACYSTEINE (BPC) proteins and its ortholog AT1G14685 in Arabidopsis was found to be involved in the salt stress response regulation [35].
Transcriptome responses to heat stress in a subset of WiDiv panel
We proceeded to evaluate the transcriptional response to a heat stress event in the same panel of genotypes. Seedlings (14 days after sowing) were subjected to control conditions or 4 h of heat stress (40 °C) and leaf tissue collected from a pool of 2–3 individuals for each treatment / genotype combination was used for RNA-seq. The RNA-seq data was aligned to the maize B73 AGPv4 reference genome and all samples exhibited high overall alignment rates, ranging from 84.5 to 98.6% (Additional file 2: Table S3). After removing genes with no read counts in any samples, we obtained a gene count matrix with reads assigned to a set of 39,511 maize AGPv4 gene models. In order to assess the consistency between samples during the collection, three biological replicates of B73 were separately collected at early, middle, and late collection time points in each condition. The average pairwise Spearman correlations between replicates were 0.987 and 0.972 in control and heat respectively (Additional file 1: Fig. S2), indicating a high repeatability of samples during collections. The overall variability in the transcriptomes of all samples were assessed using a principal component analysis (Fig. 2a). The first principal component separated samples based on treatment, suggesting a significant impact from the heat stress (Fig. 2a). The second principal component provided a modest separation based on genotypes but did not suggest major contributions of the known population structure (Fig. 2a). We hypothesized genotypes with less transcriptome shift (based on principal component 1) in response to heat stress might exhibit a more similar distribution of BLUPs for photosynthetic parameters between control and heat condition. The absolute difference of PC1 between control and heat conditions reflected the transcriptome plasticity of each genotype and could be used to classify genotypes as having large, medium, or small transcriptome responses. The BLUP for each genotype was determined as the genotypic effect after controlling experimental factors for the measured phenotype from either of conditions. We calculated correlations between BLUPs in both control and heat conditions for genotypes with large, medium, or small transcriptome responses to heat stress (Fig. 2b). Overall, for each measured phenotype, we observed an increased correlation coefficient with a larger transcriptome plasticity, suggesting a positive correlation between phenotypic and transcriptomic response in this study (Fig. 2b; Additional file 1: Fig. S3). A GO (Gene Ontology) enrichment analysis on genes with significant upregulation in response to heat in B73 revealed several significant GO enrichments that were associated with heat-responsive functions (i.e., GO:0009408 “response to heat”; GO:0010286 “heat acclimation”; GO:0031072 “heat shock protein binding”) (Additional file 2: Table S4), providing evidence of a strong heat stress response in the B73 samples. In addition to these enrichments for some terms expected for a heat response, there were also a number of other GO terms with significant enrichments, including some without apparent connections to heat stress responses (Additional file 2: Table S4).
The analysis of the full dataset revealed 2628 genes that exhibited consistent upregulation in response to heat in the majority of genotypes (see “Methods” for details). A comparison of the mean response to heat stress and the coefficient of variation (CV) for expression level in heat stress for the 2628 genes highlighted a subset of genes with strong expression response and limited variability in the response for different genotypes (Fig. 2c). Many of the genes with the highest expression response and relatively low variation among genotypes were previously identified as heat shock proteins (HSPs). There were 40 HSPs identified in the maizeGDB database (https://www.maizegdb.org) and 25 of these HSPs were detected to be consistently upregulated with high expression values and low CVs under the heat stress (Fig. 2c). Six additional genes (Zm00001d004243, Zm00001d022630, Zm00001d031436, Zm00001d033990, Zm00001d039933, and Zm00001d048592) that were not annotated as HSPs also exhibited strong and consistent responses to heat stress and might play important roles in response to heat stress. The heat shock factors (HSFs) were a class of transcription factors that were known to play important roles in gene expression responses to heat stress and activated HSPs. Of 31 previously reported non-redundant HSFs in maize [17], 17 of them exhibited detectable expression in our dataset. Eight of these HSFs (ZmHsf02, ZmHsf25, ZmHsf05, ZmHsf08, ZmHsf26, ZmHsf11, ZmHsf03, and ZmHsf13) were consistently upregulated in response to heat stress across a hundred of genotypes (Fig. 2d). Interestingly, ZmHsf06 was downregulated across all genotypes used in this study and also exhibited reduced expression in response to heat stress at multiple timepoints in B73 in a previous study [36] (Additional file 1: Fig. S4). The results suggested that the heat stress event resulted in a robust response to heat stress in this diverse panel and provided an opportunity to further examine variable responses in more diverse genotypes.
Characterization of variable gene expression responses to heat stress
To map regulatory variants that might influence gene expression levels or responses to heat stress in this panel, we retained genes with counts per million (CPM) > 1 in at least 10% genotypes, resulting in 20,255 and 20,306 genes expressed in control and heat conditions, respectively. Of these genes, 19,642 genes were commonly expressed in both conditions, while 613 and 664 genes were uniquely detected as expressed in control or heat conditions. Of 664 genes uniquely detected in heat condition, three were heat shock factors (ZmHsf03, ZmHsf11, and ZmHsf26). We performed eQTL mapping separately using control or heat expression data to map the genomic elements influencing transcript levels of expressed genes. Significantly associated SNPs located within 1Mb distance from the targeted gene were classified as cis-eQTL, while SNPs located >1Mb from the target gene were classified as trans-eQTL. The 1Mb cut-off for cis-eQTL has been used in prior studies of maize eQTL [37, 38] and is far enough to have quite low linkage disequilibrium (r2<0.05) in our population. Potential trans-eQTL hotspots were assessed by counting the number of significant trans-interactions for each 10-kb window in the genome (Additional file 1: Fig. S5). Several chromosomal regions with at least 10 trans-interactions were commonly detected for both heat and control conditions. Three 10-kb regions on chromosomal 9 were detected to be interacted with gene expression distantly under heat stress. However, the detected “hotspots” had somewhat limited numbers of targets (<20) and the power to detect trans-eQTL hotspots in this study might be limited by our sample size. Consistent with prior studies [39], identified cis-eQTLs in this study could explain more variance of gene expression than identified trans-eQTLs in both control and heat conditions (Additional file 1: Fig. S6). The cis-eQTL analysis identified 10,548 and 10,391 significant cis-eQTL regulated genes (eGenes) for control and heat condition, respectively. The cis-eQTLs detected for genes that were only expressed in heat stress conditions were particularly interesting, as they reflected variable levels of gene expression activation under heat stress. There were 518 of 664 heat expressed only (heo) genes that had significant cis-eQTL (heo-cis-eQTL) and we subsequently referred to these as heo-eGenes (Additional file 2: Table S5; Additional file 1: Fig. S7). The heo-eGenes included a number of transcription factors, such as zhd20, iaa37, abi33, gata17, sbp18, wrky35, bhlh110, dof27, and ereb196. In addition, two heat shock factors—ZmHsf03 and ZmHsf26—were detected as heo-eGenes.
To identify cis-regulatory variation associated with heat responsiveness per gene in 19,642 commonly expressed genes under both control and heat conditions, a linear mixed model was incorporated with covariates to detect the interaction effect between each pair of cis-eQTL and eGene. To avoid redundancy testing, only the most significant cis-eQTLs in either condition (control or heat) were retained for each gene (two different cis-eQTLs might be selected for testing if they had equivalently significant p-values; see “Methods”), resulting in 15,588 tested gene/cis-eQTL pairs. Of tested cis-eQTLs, the majority had positive correlations for effects between control and heat conditions (Fig. 3a). However, a set of 273 cis-eQTLs with significant differences in effects between control and heat conditions were defined as responsive eQTLs (reQTLs) (Fig. 3a). These reQTLs provided examples of genes with significant differences in response to heat stress due to cis-regulatory variation for responsiveness. Compared to all cis-eQTLs identified in either of conditions or selected top significant cis-eQTLs for reQTLs identification, reQTLs per se tend to be distributed around TSS (transcriptional start site) regions of regulated genes (Fig. 3b).
The reQTL regulated genes (reGenes) were further assessed and classified into two categories based on the strength of responsiveness under heat stress: reference alleles with greater response or alternative allele with greater response to heat stress. Both of these groups could be further subdivided based on whether the more responsive allele was up- or downregulated in response to heat stress. Of total 258 identified reGenes (Additional file 2: Table S6), 54.4% of these reGenes were examples with reference alleles that have consistently higher response to heat compared to the alternate allele while the other 45.6% exhibited consistently higher response for the alternate allele (Fig. 3c). Four genes related to heat responses (hsp1, hsp22, hsp101, and ZmHsf06) were identified, suggesting potential cis-regulatory elements involved in the regulation of heat responsiveness of these genes. Only limited number of cis-eQTL overlapped with GWAS hits and perhaps this was due to the distinct categories of SNPs could be enriched between GWAS and eQTL mapping [40]. However, the transcription factor—farl8 with SNPs inside its intron regions was significantly associated with Y(II) in control-specific condition and was also identified as a reGene to be regulated by a reQTL within its intron. Given the responsive difference per reGene between genotypes with reference alleles and alternative alleles, responsive patterns of reGenes could be classified into different scenarios according to the median expression value. There were 165 reGenes that exhibited stronger upregulation response for either the reference allele (78) or the alternate allele (87) and these included examples in which both alleles were upregulated (but to different levels) as well as examples in which only one allele was upregulated. For example, an altered InDel from ATA to TACTC in the putative 3′ UTR region of Zm00001d050304 was associated with higher expression under heat stress (Fig. 3d). Similarly, a SNP in the ~1kb promoter region of Zm00001d046322 (Fig. 3e) was strongly associated with higher heat response upregulation in genotypes with reference alleles compared to genotypes with alternative alleles. More distinct regulation patterns were observed as opposite regulations between genotypes with reference and alternative allele (Fig. 3f,g).
One of our goals was to identify sequence variation that might contribute to variable gene responsiveness under heat stress. However, only the most significant eQTL SNPs were selected for performing the analysis to detect the reQTL and its associated reGene. Although the multiple testing burden could be ameliorated this way, the tested reQTL might not reflect a direct causal relationship. Haplotype analysis revealed large haplotype blocks that could be associated with variable responses of most reGenes. For example, Zm00001d042183 was annotated as being involved in the triacylglycerol degradation pathway and we located a significant cis-eQTL on 1443 bp upstream from its TSS for reQTL mapping. The analysis of all SNPs near this gene revealed multiple highly associated SNPs that exhibited high levels of linkage disequilibrium (Fig. 3h). The reGene Zm00001d030549 that was putatively involved in the chlorophyll degradation pathway provides another example of multiple highly associated SNPs within a large haplotype block (Fig. 3i). Even though reGenes and heo-eGenes identified from our approach showed variable responsiveness between genotypes associated with different alleles, multiple SNPs exhibited similar correlations with gene expression responsiveness that fall in a large haplotype region that complicated the ability to determine the underlying causal variant.
Changes in chromatin structure associated with heat responsiveness in maize
Chromatin accessibility provided an additional avenue to document candidate cis-regulatory elements under heat stress. To document changes in accessible chromatin regions with evidence of DNA binding protein footprints in response to heat stress in maize, we generated MNase-defined cistrome-Occupancy Analysis (MOA-seq) data for three biological replicates of B73 plants grown in control and heat stress (4 h at 40 °C). We initially identified peaks of accessible regions using the full MOA-seq reads. Prior work found that using only the middle 20bp of each MOA-seq read provided likely TF footprints based on known binding sites [24]. The analysis of the high-resolution (center 20bp of each read only) MOA-seq data resulted in a set of smaller peaks (median size of 34bp compared to 179bp for full MOA-seq reads). Over 130,000 MOA-seq footprints were identified in each of the biological replicates in either control or heat condition (Additional file 2: Table S7 and Data S1). TF footprints were frequently identified in regions near the TSS of many genes (Additional file 1: Fig. S8). Three biological replicates of control and heat MOA-seq samples in B73 were compared to identify 13,792 TF footprints that exhibited significant (adjusted p-value <0.05) difference between conditions (Additional file 2: Data S1). These differential TF footprints likely revealed regions of the genome with differential occupancy of TFs or other DNA binding proteins in heat-stressed samples relative to the control samples. The analysis of genes that exhibited heat-stress-inducible expression revealed examples in which there was a distinct difference in MOA-seq coverage shape (Fig. 4a) as well as examples with quantitative differences in TF footprints (Fig. 4b), but there were relatively few truly novel peaks that were only present in heat-stressed samples. There were 4943 differential TF footprints located in putative promoter regions (−2000bp from the TSS per gene) and another 1486 located within gene regions. The differential TF footprints included more regions with increased accessibility in heat stress (11,140) compared to regions with reduced accessibility in heat stress compared to controls (2,652). The regions with increased MOA-seq coverage in heat stress compared to control might reflect increased binding or occupancy of transcription factors activated by heat stress. A metaplot of MOA-seq read depth at the heat-enriched TF footprints revealed that these regions already exhibited MOA-seq coverage in control samples but had substantially stronger signals in heat-stressed samples (Fig. 4c). Heat-enriched TF footprints were detected within 2000bp of the TSS in 8 HSFs and 18 HSPs. Interestingly, 1353 of 13,792 differential TF footprint regions were located within annotated transposable elements. Some transposable elements with heat-enriched TF footprints were also detected to be upregulated under heat stress in a previous study [30], such as RLG00007Zm00001d00242 (Gypsy retrotransposon), RLC00002Zm00001d10560 (Copia retrotransposon), and RLX06576Zm00001d00001 (unknown retrotransposon) (Additional file 2: Table S8). The regions that contained heat-enriched MOA-seq footprint peaks were used to perform motif enrichment analysis to identify potential TF binding sites. We identified 26 motifs with putative TF binding activities in the cis-BP database [41] (e.g. AP2, E2F, Myb/SANT, GATA, Dof, bHLH, NAC/NAM) and another 24 motifs that do not contain previously characterized TF binding sites (Additional file 2: Table S9). Prior work had shown that HSFs played important roles in regulation of gene expression in response to heat stress and variants of the HSF binding motif were often enriched near heat-responsive genes [14]. We identified an enriched sequence (GAAGCTTC) matching HSF motif—HSFB2A—in the heat-enriched TF footprints.
The same tissue samples that were used for B73 MOA-seq were also used to generate B73 RNA-seq data in the same conditions. To eliminate the experimental batch effect for comparing the changes in chromatin accessibility and potential variable regulatory elements in the panel of diverse genotypes, we focused on a subset of 793 genes that were commonly identified as upregulated genes and 11,501 genes that were commonly identified as expressed (mean CPM > 1) but not DEG in both RNA-seq datasets (the matched MOA-seq samples and the three biological replicates of B73 sampled as part of the population study). These 793 genes and 11,501 genes were classified as differentially expressed genes and control genes, respectively. We found that the differentially expressed genes exhibit significant enrichment (p-value < 2.2e−16) for heat-enriched TF footprints within their 2-kb flanking regions compared with control genes (Fig. 4d). We noted that the 2-kb flanking regions of six HSFs (ZmHsf03, ZmHsf05, ZmHsf08, ZmHsf11, ZmHsf25, ZmHsf26) that were consistently upregulated across genotypes contained heat-enriched TF footprints, suggesting potential regulatory regions of these HSFs. This suggested that the heat-enriched TF footprints were often associated with gene activation under heat stress. However, it was noteworthy that 42% of 793 genes that were upregulated in heat stress did not contain heat-enriched TF footprints. These included many examples of quite strong upregulation without evidence for altered TF footprints. The vast majority (>99%) of the 793 upregulated genes had TF footprints within 2kb. This revealed that differential TF footprints were enriched at upregulated genes but that some upregulated genes had consistent TF footprints even with altered expression levels. Together, these results suggested that identified chromatin accessibility changes near heat-upregulated genes could highlight particular cis-regulatory elements (CREs) that might be involved in transcriptional responses to heat stress.
Application of B73 TF footprints to identify candidate causal eQTL variants
We sought to utilize the MOA-seq data to prioritize candidate variants near the reGenes as potential causative changes. There were 258 reGenes that were expressed in many genotypes in both control and heat conditions that exhibited significant variation for heat responsiveness as well as another 518 heo-eGenes that were only expressed in heat-stressed samples that exhibited significant cis-eQTL suggesting variable activation of these genes in response to heat stress. The reGenes were classified into four groups as RefControl (n=42), RefHeat (n=78), AltControl (n=56), and AltHeat (n=87) based on which haplotype exhibited a strong response to heat stress (Alt / Ref) and whether the more responsive allele was up- (Heat) or down- (Control) regulated in response to heat stress (Fig. 3c). Similarly, the heo-eGenes could be classified as RefHeat (n=313) and AltHeat (n=205) based on whether the reference or alternate allele exhibited greater expression in heat-stressed samples. The RefHeat or AltHeat groups of reGenes or heo-eGenes all exhibited upregulation in response to heat and many of these genes (30–37%) have a heat-enriched MOA-seq footprint within 2kb of the gene (Additional file 2: Table S10). These were enriched relative to other expressed genes. In contrast, the RefControl and AltControl genes that did not necessarily show upregulation in response to heat stress have only 14–18% of genes with a heat-enriched MOA-seq footprint, which was similar to all expressed genes.
In most cases, there were multiple variants that were significantly associated with the heat response (for reGenes) (Fig. 3h,i) or expression levels in heat stress samples (for heo-eGenes). We assessed whether at least one of the most highly associated (top 3 p-values) variants was located within a heat-enriched MOA-seq footprint for the subset of genes that contained these footprints. Among the genes that had decreased expression in response to stress (RefControl or AltControl) with a heat-enriched MOA-seq footprint there were only 12.5% with a highly associated variant located within the small footprint region. In contrast, 24% of the RefHeat and 30% of the AltHeat reGenes with heat-enriched TF footprints had at least one highly associated variant that was located in the footprint region (Additional file 2: Table S10). There were 21.7% and 22.6% of the RefHeat and AltHeat heo-eGenes, respectively, that had a highly associated variant within the heat-enriched TF footprints. Through rerunning the reQTL mapping using all variants located in reGenes and their flanking regions, it revealed that significant difference of interaction effects for variants located inside of heat-enriched TF footprints and variants outside of the TF footprints for either RefHeat (Wilcox-test, p=8.67e−3) or AltHeat reGenes (Wilcox-test, p=2.07e−3; Fig. 4e; Additional file 2: Table S11). For both reGenes and heo-eGenes, our findings suggested a robust power of heat-enriched MOA-seq signals derived from a single genotype on predicting responsive genomic regions. It was worth noting that in many of the AltHeat genes there was a significant response to heat stress for both the reference and alternate allele, but the alternate allele had a significantly stronger response. In these cases, the B73 allele might still have a heat-enriched footprint and the alternate allele might have a variant that allows for enhanced TF binding. Together, our results suggested overlaps between heat-enriched TF footprints and identified responsive variants might pinpoint regulatory genomic regions in maize under heat stress.
Transient expression assays confirm allelic variation for heat response
We were interested in determining whether the 2-kb proximal regions of the reference and alternate alleles of reGenes would be sufficient to recapitulate the variable responses to heat stress. We selected three pairs of alleles—Zm00001d005114 (B73) / Zm00014a020759 (Mo17), Zm00001d017187 (B73) / Zm00035ab255850 (MS71), and Zm00001d042183 (B73) / Zm00039ab143500 (Oh43) for these experiments. Two of these examples exhibited significant differences for TF footprints in response to heat stress (Fig. 5a,b) while the other gene had ~19.4% increased signals in the TF footprints under heat stress but did not pass the significance filter (Fig. 5c). The two genes with significant differences in the footprints Zm00001d005114 and Zm00001d017187 showed differential response to heat stress for the reference and alternate alleles (Fig. 5d,e). For both of these genes, the reference allele had a much weaker response to heat stress as compared to the alternate allele. The third gene, Zm00001d042183, exhibited no significant response to heat stress in genotypes carrying the reference allele, but had significant upregulation in genotypes with the alternate allele (Fig. 5f). A comparison of the allele-specific expression in three biological replicates of B73xMo17 or B73xOh43 F1 hybrids was used to validate the allelic ratio in control and heat stress plants for Zm00001d005114 and Zm00001d042183 that were heterozygous for the reference and alternate alleles in these hybrids (Fig. 5g,h). Both of these genes showed a significant difference in the ratio of reference:alternate allele expression in heat stress samples compared to plants grown in control conditions, confirming the cis-allelic variation for heat responses for these alleles.
The 2-kb promoter sequences (including the transcription start site and 5′ UTR) were cloned and fused to a luciferase reporter construct for assessing segregations of their activation roles for reGenes under heat stress. Alignments of the two promoters revealed significant conservation in the region immediately upstream of the transcription start site, but in each case the more upstream regions were not alignable, often due to polymorphic transposon insertions (Additional file 1: Fig. S9). Two of the examples included a differential heat-enriched MOA-seq footprint that overlaps the transcription start site (Additional file 1: Fig. S9a-b). The third example had several MOA-seq footprints that were only detected in heat-stressed samples but these were not identified as significantly heat-enriched (Additional file 1: Fig. S9c). Each of the three pairs of promoters exhibited differences in the level of heat activation using the dual luciferase reporter assay that mirrored the RNA-seq data for the reference and alternate alleles (Fig. 5i–k). In each case, the reference allele had a much smaller response to heat relative to the alternate allele. It is worth noting that the level of activation in response to heat stress was highly variable for the three different promoters with Zm00001d017187 exhibiting much stronger heat response than the other two promoters.