Genomic basis underlying the metabolome-mediated drought adaptation of maize

Background Drought is a major environmental disaster that causes crop yield loss worldwide. Metabolites are involved in various environmental stress responses of plants. However, the genetic control of metabolomes underlying crop environmental stress adaptation remains elusive. Results Here, we perform non-targeted metabolic profiling of leaves for 385 maize natural inbred lines grown under well-watered as well as drought-stressed conditions. A total of 3890 metabolites are identified and 1035 of these are differentially produced between well-watered and drought-stressed conditions, representing effective indicators of maize drought response and tolerance. Genetic dissections reveal the associations between these metabolites and thousands of single-nucleotide polymorphisms (SNPs), which represented 3415 metabolite quantitative trait loci (mQTLs) and 2589 candidate genes. 78.6% of mQTLs (2684/3415) are novel drought-responsive QTLs. The regulatory variants that control the expression of the candidate genes are revealed by expression QTL (eQTL) analysis of the transcriptomes of leaves from 197 maize natural inbred lines. Integrated metabolic and transcriptomic assays identify dozens of environment-specific hub genes and their gene-metabolite regulatory networks. Comprehensive genetic and molecular studies reveal the roles and mechanisms of two hub genes, Bx12 and ZmGLK44, in regulating maize metabolite biosynthesis and drought tolerance. Conclusion Our studies reveal the first population-level metabolomes in crop drought response and uncover the natural variations and genetic control of these metabolomes underlying crop drought adaptation, demonstrating that multi-omics is a powerful strategy to dissect the genetic mechanisms of crop complex traits. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02481-1.

maize plants after drought treatment, indicating negative roles of DIMBOA and DIMBOA-Glc in maize drought tolerance" This experiment does not show that DIMBOA has any direct relevance to drought tolerance. It only shows that DIMBOA has a negative effect on maize plants. This is not surprising because DIMBOA causes oxidative stress and also induces secondary defense responses. At a minimum, a control experiment showing that DIMBOA sprayed on non-drought-stressed plants is necessary. However, my prediction is that there will be negative effects on non-drought-stressed maize plants if they are sprayed with DIMBOA. 5. Line 337-339: "Together, these data suggested that Bx12 played positive roles in maize drought tolerance likely through down-regulation of DIMBOA-Glc accumulation upon drought stress." The authors assume that lower abundance of DIMBOA-Glc provides protection against drought stress. The opposite hypothesis, that higher abundance of HDMBOA-Glc provides protection against drought stress also needs to be considered. HDMBOA-Glc is produced from DIMBOA-Glc by BX12.
6. Lines 374-375 "Based on our data sets, ZmGLK44 was a hub gene correlated / associated with 30 drought-responsive metabolites ( Fig. 5a; Additional File 13, Table S9)." Figure 5a shows only 27 metabolites. What are the other three metabolites? 7. Line 517-519: "More than 60% of the phenotypic variance of survival rate could be predicted by combined 15 m-traits (Additional File 25, Figure S10; Additional File 26, Table S16)," This is a result. Results should be brought up first in the results section. The discussion should be limited to discussing the significance of the results, rather than presenting new results.
8. The authors only discuss measurements of DIMBOA-Glc was HDMBOA-Glc detected in these assays? For instance, Figure S6 should include analysis of HDMBOA-Glc. 9. Figure 4a is misleading because is suggests that only HDMBOA-Glc is broken down by a glucosidase. There would be similar glucosidase-mediated breakdown of DIBOA-Glc, DIMBOA-Glc, DIM2BOA-Glc, and HDM2BOA-Glc. 10. Table S15 lists about 1,500 genes that are regulated by ZmGLK44, but there is no way to determine which of these genes the authors placed in the different categories that are listed as overexpressed in Figure 5c. Without know this, it is not possible to determine the overall value of the gene categories listed in Figure 5c. Perhaps the genes in Table S15 could be marked to show which category they belong to in Figure 5c. 11. Table S15: At the bottom of this table there are some genes with an expression level of zero with or without ZmGLK44 expression. How can these be considered differentially expressed? 12. Figure 5c: Glycerophosphodiester degradation is listed as an enriched category. What genes are actually in this category? Is it possible that this is also tryptohan biosynthesis? A key step in tryptophan biosynthesis is indole-3-glycerolphosphate degradation.
13. Figure 5c: Both tryptophan biosynthesis and benzoxazinoid biosynthesis are differentially regulated by ZmGLK44. However, based on other information provided by the authors, this could have opposite effects on drought tolerance. Tryptophan increases drought tolerance and benzoxazinoids (DIMBOA-Glc a downstream product of DIBOA-Glc, a low-abundance intermediate) decreases drought tolerance. It would be good to consider more carefully which benzoxazinoid biosynthesis genes are induced by ZmGLK44 and not just focus on tryptophan biosynthesis. For instance, is benzoxazinoid biosynthesis up-regulated or down regulated by ZmGLK44? What about Bx12 specifically? 14. Figure 5d: It would be good to also connect this pathway to DIBOA-Glc biosynthesis, which is listed as enriched in Figure 5c. Indole is a branch point for the biosynthesis of tryptophan and benzoxazinoids. See for instance: Richter et al, Plant Journal, 2021, 106:245-257. doi: 10.1111 15. Line 403-404: "(h) Activation of TRP2p1::LUC, but not the other promoter fragments, by ZmGLK44 in maize protoplasts." In the figure TRP2p2::LUC is activated (not TRP2p1::LUC) 16. Figure 5e: There are at least two tryptophan synthase beta subunit (TRP2) genes in maize. Which one is shown in this figure?
17. Figure 5e,h: There are at least three genes that encode tryptophan synthase alpha subunit activity (indole synthase; TRP1 in the figure) in maize. The authors should specify which gene they are looking at here. Ideally, the authors should look at expression of all three indole synthase genes (TSA, BX1, and IGL).
18. Figure 5d. TRP1 and TRP2 (tryptophan synthase alpha and beta subunits) are more commonly called TSA and TSB in maize. See for instance: Kriechbaumer et al, BMC Plant Biology, 2008, 8:44. doi: 10.1186/1471-2229. Line 494,499 lepidopteran should not be italicized 20. Line 498-499: "breeding maize cultivars with both drought tolerance and insect (e.g. lepidopteran species) resistance." This would likely cause sensitivity to other insects (aphids). 21. Lines 687-680: "LC-MS data were obtained using a Waters Acquity UPLC 679 system (Waters, http://www.waters.com), coupled to a Thermo Fisher Q Extractive (Thermo Fisher, http://www.thermofisher.com)." This is not enough information. The authors need to specific what column was used, what gradient, what mass ranges were measured, etc. I should be able to repeat this experiment based on what is written in the methods section.
22. Lines 686-687: "The further processing of the MS data included isotope clustering, adduct detection, and library search." Again, not nearly enough information. I would not be able to repeat this analysis based on the information that is provided. 23. Methods section: Although there is a description of the exogenous tryptophan treatment, there is no description of the method that was used for exogenous DIMBOA treatment. Given the reactive nature of DIMBOA and its spontaneous degradation to other metabolites, it should also be noted where DIMBOA was purchased and what purity it had at the time that it was added to maize plants. The half-life of DIMBOA in an aqueous solution has been reported to be about 24 hours.
24. Table S3 lists only the 1035 metabolites that the authors deem to be drought-responsive. For the scientific record, negative evidence is also important. Also, some readers may have an interest in a particular mass feature that the authors did not conclude to be drought-responsive. Please include all 3,890 reliably detected mass features in this table.
25. Table S5. What about the mQTL for the metabolites that the authors did not deem to be drought-responsive? There is a lot of information that would results from a more complete analysis of the metabolomic data set that is just being ignored. The 1035 chosen metabolites are just based on some arbitrary cutoff that the authors have determined. Readers will also be interested in the mQTL affecting the other 2,855 metabolites that were detected in these assays. Please include this information in Table S4. 26. Table S14. A putative metabolite is described as "34N" (in cell C10 of the spreadsheet). That is not a very informative metabolite description.
27. Table S14: PP_Group_00384 is described as likely being tryptophan (line 7). However, in line 41 a different metabolites is described as being tryptophan.
28. Line 879-880: "All other reasonable requests for data and research materials are available by contacting the corresponding authors." All raw data for large data sets, e.g. RNAseq and metabolomics should be placed in a public database. RNAseq data should be in GenBank short read archive. Metabolomic raw data (.raw files) should be placed in Dryad, Cyverse or some other public database. If ten or twenty years from now someone wants to look at these data, there is no certainty that the authors will still have the data and will be able to respond to a "reasonable request." 29. Is the large SNP dataset for the inbred line population available from a public database? If so, which database? If not, it should be made available.
30. The English-language writing is generally good, but there are occasional lapses. I recommend that one of the native English-speaking co-authors should do some careful proofreading. For instance, a sentence like "Of the upregulated metabolites, there were tryptophan (Trp) and its relative metabolite Achillamide derivatives." is not quite grammatically correct.
Genomic basis underlying the metabolome-mediated drought adaptation of maize The Abstract should include which tissues or organs of the maize plant were profiled for metabolite and transcript abundances.
Line 110, change from principle to principal component analysis Lines 136-139, this is a far too sweeping statement especially given that this is an artificial "drought" study in pots. It needs to be couched as upregulated for this environment. Let's see the results across multiple field environments before concluding as such. Also, there is way to conclude that these shifts in metabolites are causal in conferring drought tolerance. They are responding. That is all that can be concluded.
Lines 151-154, these rrBLUP results with metabolites would be better served with the inclusion of a comparison to prediction of the dry mass, fresh weight, and survival rate phenotypes with genome-wide genetic markers. This population has been scored with SNP markers, so it is a straightforward comparison. The focus on significance here is misleading, as the focus should be on prediction ability/accuracy. How is this statistical test carried out anyway? It should not be a t-test. The rationale for not using a t-test is that the magnitude of the denominator of the t-statistic is dependent on the number of resampling runs (standard error of the difference in means). Therefore, if performing a sufficient number of resampling runs, the denominator would be very small, and it could then be possible to obtain a significant p-value for any comparison with an arbitrarily small difference. Another way of understanding the problem with the t-test is that because each fold is drawn from the same original dataset, the different folds are not independent of each other, which violates the t-test assumption that observations are independent. I suggest using a bootstrap significance testing approach, which I believe to be valid in this case. If properly conducted, it should be independent of the number of resampling runs.
Line 162, kinship not kinships Line 164, need to report what multiple test correction approach was used. If none was used, then either use Bonferroni or FDR. Same for eQTL mapping. The 1/X method is too permissive.
Lines 175-176, concluding many new putative drought-responsive QTL is a bit of a stretch. By no means was the maize drought linkage/GWAS mapping literature deeply surveyed.
Line 178, need to define "lead SNP". I assume this is a peak SNP -a SNP with smallest P-value at a locus, but it still needs to be defined.
Lines 179-180, a statistical test of enrichment needs to be conducted to say anything meaningful about finding 190 TFs. Line 192, I have never heard of it referred to as a permutation assay. Permutation test?
Lines 212-217, I do not agree with the candidate gene only analysis in this context. It is more important to see how these associated genes rank genome-wide rather than compared to a random subset. I am not convinced on reliability of results as stated in lines 217-219 based on the collectively results. It is easy to subsample data in different ways with biological hypotheses, but the presentation and argument thereof needs to be better conveyed and supported beyond correlative machinery. Otherwise, it is a weakened argument.
Line 258, 1Mb window really needs to be used for cis-eQTL and is more often the standard than 20kb despite the citation. This is resulting in inflated number of trans-eQTL.
Lines 272-273, static is misleading, as it implies that it is constitutive. Shared? Anyway, something else is needed.

Editor's comments
As you will see from the reports, both referees are broadly favorable and find the work of potential interest, but they raise important issues that we must ask you to address, in the form of a revised manuscript, before we reach a final decision on publication. In particular, it seems to us to be essential that both Referees' concerns regarding the methods and analyses resolved. Referee 2 has also requested additional statistical analyses, and Referee 1 also has raised concerns about the experiments with Bx12 and ZmGLK44. Please ensure that these and all other issues raised by the referees are addressed in full. Please make sure all data is deposited in a public repository and at least made available for reviewers on re-review.

Reply:
We really appreciate the insightful comments and valuable suggestions from the editor and the reviewers. We added more information of methods and analyses as suggested by both reviewers. We conducted more experiments or carefully addressed the concerns or suggestions about Bx12 and ZmGLK44 raised by reviewer #1. We also performed more statistical analyses as suggested by reviewer #2. As regarding the source data, we deposited them in the public repository, which can be reached easily.
Please see our point-by-point responses to the reviewers below.

Reviewer reports:
Reviewer #1: This manuscript describes a large-scale metabolomic analysis of a genotyped population of maize inbred lines, with and without drought stress. The results will provide an important resource for further research on metabolomic changes induced in maize by drought. Although this is useful information, there are some improvements that need to be made to the manuscript: A. The methods are not described adequately for others to repeat them. I mention a few examples below, but the authors should consider the methods section very carefully to determine whether a plant molecular biologist could repeat these experiments with the information that is provided.
B. Large data sets that underlie important conclusions in this manuscript are available "upon request". They should be placed in public databases.
C. Two genes, Bx12 and ZmGLK44, were examined in more detail. However, as described below, there are some problems with the methodology and conclusions.

Reply:
We really appreciate your careful reading, evaluation, and positive comments on our manuscript. We have tried our best to revise the manuscript according to your insightful comments and invaluable suggestions, and hopefully addressed all your concerns in our revised manuscript. Please see our point-by-point answers below.
Specific comments: 1. It would be helpful if the authors wrote out the name or the described the function of key genes that play a central role in their findings. For instance, ZmCS1 and ZmCS2 are never described as encoding citrate synthase, except in Figure S3. This is confusing because ZmCS1 can also mean chorismite synthase, e.g. Babu et al, Plant Breeding 131, 20-27 (2012).
Reply: Thanks for your suggestions, which are very helpful. Yes, we should give more details for the key functional genes in our previous manuscript. In the main manuscript, we mentioned several key functional genes, including ZmSKD1, ZmCS1, ZmCS2, Bx12, ZmGLK44 etc. We have added the information of these key genes to the revised manuscript, please see line 209: "…ZmSKD1 (suppressor of K + transport growth defect 1)…", lines 239-240: "…ZmCS1 (citrate synthase 1, GRMZM2G063909), ZmCS2 (citrate synthase 2, GRMZM2G064023)…", lines 260-262: "…Bx12 (GRMZM2G023325, encoding an enzyme for the benzoxazinoid metabolic pathway) and ZmGLK44 (GRMZM2G124540, encoding Goden 2-like TF)…". The annotation of other functional genes can be found in Table S5, Table S8 and Table S10. 2. GRMZM2G063909 and GRMZM2G064023 (CS1 and CS2) are described as citrate synthases in this study. Arabidopsis AT2G44350 is described as the closest Arabidopsis ortholog (Table S9). Is the converse also true, i.e. are GRMZM2G063909 and GRMZM2G064023 the most similar maize genes to AT2G44350, or are there other possible citrate synthases encoded in the maize genome? This is important if the authors are assigning citrate synthase as a function for these two genes. As far as I know, the actual enzymatic function has not been confirmed for GRMZM2G063909 and GRMZM2G064023 and previous research indicates that these two enzymes may have other activity.
Reply: Thanks for your valuable suggestions, which are helpful to clarify the roles of ZmCS1 and ZmCS2 in citrate metabolism. Based on your suggestions, we performed phylogenetic assays for the putative citrate synthases in maize and Arabidopsis. As shown in Figure S5, the putative citrate synthases could be divided into two subclasses, and AT2G44350 (AtCS4) belongs to subclass 1 and has three orthologs in maize, including GRMZM2G063909 (ZmCS1), GRMZM2G064023 (ZmCS2) and GRMZM2G063851. We reworded our statement in the revised manuscript in lines 239-242 as follows: "Three hub genes ZmCS1 (citrate synthase 1, GRMZM2G063909), ZmCS2 (citrate synthase 2, GRMZM2G064023), which are two putative citrate synthase genes (Additional File 1, Figure S6), and ZmMYBR88 (GRMZM2G064328) were statically detected under both WW and DS conditions."  Shoot of four plants were harvest, weighted (fresh weighted), dried at 65°C for constant weight to obtain dry mass." Next, we analyzed the plant heights, fresh mass, dry mass, H2O2 contents and malondialdehyde (MDA, indicator of oxidation stress) levels of the maize plants grown under normal conditions. The results showed that all these indices did not show significant difference between maize plants treated with or without DIMBOA ( Fig S9).
Thus, DIMBOA, at the concentration used here, might have no significant impact on maize growth under normal growth conditions. However, spray of DIMBOA caused sensitivity of maize plants to drought (Fig 4n and o), indicating a negative impact of DIMBOA on maize drought tolerance.  5. Line 337-339: "Together, these data suggested that Bx12 played positive roles in maize drought tolerance likely through down-regulation of DIMBOA-Glc accumulation upon drought stress." The authors assume that lower abundance of DIMBOA-Glc provides protection against drought stress. The opposite hypothesis, that higher abundance of HDMBOA-Glc provides protection against drought stress also needs to be considered. HDMBOA-Glc is produced from DIMBOA-Glc by BX12.
Reply: Thanks for pointing out this. We were focusing on DIMBOA-Glc and Bx12 because DIMBOA-Glc was a drought responsive metabolite in the maize population based on our standard shown in line135 "(|Fold Change| ≥ 2, false discovery rate [FDR] < 0.05)", while HDMBOA-Glc didn't show significant difference in the maize population treated with or without drought based on our standard (Reply Figure 1).
More importantly, Bx12 was induced by drought and showed significant association with DIMBOA-Glc only under drought conditions, while it was significantly associated with HDMBOA-Glc under both drought and well-watered conditions, indicating more specific roles of DIMBOA-Glc/Bx12 pair in maize drought responses ( Figure S7). Our analyses might indicate the functional differentiation of DIMBOA-Glc and HDMBOA-Glc, which was consistent with the previous studies showing that DIMBOA and DIMBOA-Glc treatments induced significant callose deposition in the leaves of wheat plants, while HDMBOA-Glc had no effect (Li et al., 2018). We agree with this reviewer that investigating how HDMBOA-Glc functions in environmental stress responses is an interesting topic, but the challenge is the chemical synthesis of HDMBOA-Glc, because it is not commercially available.
Reply: Thank you very much for your careful reading and checking on this. Yes, ZmGLK44 was a hub gene correlated / associated with 30 drought-responsive metabolites. We are sorry for missing those three metabolites, which are PP_Group_01992, PP_Group_18074 and PP_Group_02975. Now they were added into Figure 5a in the revised manuscript. 7. Line 517-519: "More than 60% of the phenotypic variance of survival rate could be predicted by combined 15 m-traits (Additional File 25, Figure S10; Additional File 26, Table S16)," This is a result. Results should be brought up first in the results section.
The discussion should be limited to discussing the significance of the results, rather than presenting new results.
Reply: Thanks for the suggestion. As suggested, we move the Figure S2 (previous Figure S10) and Table S4 (previous Table S16) to the result section. Please see revised manuscript in lines 155-157 as follows: "In addition, we found that a combination of 15 m-traits could explain more than 60% of the phenotypic variance of survival rate (Additional File 1, Figure S2; Additional File 2, Table S4)." 8. The authors only discuss measurements of DIMBOA-Glc. Was HDMBOA-Glc detected in these assays? For instance, Figure S6 should include analysis of HDMBOA-Glc.
Reply: Yes, we also detected HDMBOA-Glc in our study, but it was not a differentially-    Table S15 lists about 1,500 genes that are regulated by ZmGLK44, but there is no way to determine which of these genes the authors placed in the different categories that are listed as overexpressed in Figure 5c. Without know this, it is not possible to determine the overall value of the gene categories listed in Figure 5c. Perhaps the genes in Table   S15 could be marked to show which category they belong to in Figure 5c.
Reply: Thanks for this suggestion. We added the annotation information of the genes and the enriched pathways for the overexpressed genes in the revised Table S17 (previous Table S15). Reply: Thank you very much for the careful reading and checking on this. We did RNA-seq for ZmGLK44 transgenic plants driven by a drought-inducible promoter under wellwatered and drought-stressed conditions. Although most genes showing up-regulation were detected in the positive as compared to the negative transgenic plants under drought-stressed conditions, we still observed very few genes differentially expressed under well-watered conditions, which may show no difference under drought-stressed conditions, and these genes were not considered in this study. We are sorry to wrongly list them in the previous Table S15, which caused confusion. We deleted these few genes from the new revised Table S17 (previous Table S15). Figure 5c: Glycerophosphodiester degradation is listed as an enriched category.

11.
What genes are actually in this category? Is it possible that this is also tryptophan biosynthesis? A key step in tryptophan biosynthesis is indole-3-glycerolphosphate degradation.

Reply:
We listed the genes that were enriched in the cornCyc pathway "glycerophosphodiester degradation" in the revised Table S17. According to the functional annotation, these genes might have no roles in regulation of tryptophan biosynthesis. Please see the revised Table S17.  19. Line 498-499: "breeding maize cultivars with both drought tolerance and insect (e.g. lepidopteran species) resistance." This would likely cause sensitivity to other insects (aphids).

Reply:
We changed our statement to make it more precision in the revised manuscript lines 522-524 as follows: "Bx12 and its natural variation CACTA TE could be potential genetic resources or molecular markers in breeding maize cultivars with both drought tolerance and lepidopteran species resistance." 20. Lines 687-680: "LC-MS data were obtained using a Waters Acquity UPLC 679 system (Waters, http://www.waters.com), coupled to a Thermo Fisher Q Extractive  Table S3 lists only the 1035 metabolites that the authors deem to be droughtresponsive. For the scientific record, negative evidence is also important. Also, some readers may have an interest in a particular mass feature that the authors did not conclude to be drought-responsive. Please include all 3,890 reliably detected mass features in this table.
Reply: Thanks for this suggestion. We revised Table S4 and now it contains all 3,890 mass features detected in our study. The drought-responsive mass features were also indicated in the revised Table S3. 24. Table S5. What about the mQTL for the metabolites that the authors did not deem to be drought-responsive? There is a lot of information that would results from a more complete analysis of the metabolomic data set that is just being ignored. The 1035 chosen metabolites are just based on some arbitrary cutoff that the authors have determined. Readers will also be interested in the mQTL affecting the other 2,855 metabolites that were detected in these assays. Please include this information in Table   S4.
Reply: As suggested, the information of significant SNP markers associated with all 3890 metabolite traits were added to the revised Table S5 (previous Table S4).
25. Table S14. A putative metabolite is described as "34N" (in cell C10 of the spreadsheet). That is not a very informative metabolite description.
Reply: Thanks for this suggestion. This metabolite was annotated based on the database KNApSAcK. The information of this metabolite could be reached at http://www.knapsackfamily.com/knapsack_core/information.php?word=C00018512.
However, in line 41 a different metabolites is described as being tryptophan.  Table S16 lines 7-30. For the second-time analysis, we harvested new samples of ZmGLK44overexpression plants for metabolite profiling, and focused on investigating ZmGLK44mediated regulation of Trp and Trp derivatives as shown in Table S16 lines 31-44.

Reply
Based on these two experiments, we obtained a general trend for Trp accumulation, that is, Trp was more highly enhanced in production in ZmGLK44-overexpressing positive plants as compared to their negative non-transgenic siblings after drought stress (Fig   5b, f; Table S16), indicating a positive role of ZmGLK44 in Trp accumulation under drought conditions. To avoid unnecessary confusion, we removed the Trp data obtained from the second-time experiment.
27. Line 879-880: "All other reasonable requests for data and research materials are available by contacting the corresponding authors." All raw data for large data sets, e.g.
RNAseq and metabolomics should be placed in a public database. RNAseq data should be in GenBank short read archive. Metabolomic raw data (.raw files) should be placed in Dryad, Cyverse or some other public database. If ten or twenty years from now someone wants to look at these data, there is no certainty that the authors will still have the data and will be able to respond to a "reasonable request." Reply: Thanks for the reminder. We submitted the RNA-seq data to Genome Sequence Archive (GSA), which can be reached at https://ngdc.cncb.ac.cn/gsa with the accession number CRA004467, and deposited metabolomic raw data to Omix, which is available at https://ngdc.cncb.ac.cn/omix with the accession number OMIX419. Please see line 976-985 in revised manuscript Availability of data and materials section.
28. Is the large SNP dataset for the inbred line population available from a public database? If so, which database? If not, it should be made available. 29. The English-language writing is generally good, but there are occasional lapses. I recommend that one of the native English-speaking co-authors should do some careful proofreading. For instance, a sentence like "Of the upregulated metabolites, there were tryptophan (Trp) and its relative metabolite Achillamide derivatives." is not quite grammatically correct.

Reply
Reply: Thanks for this suggestion. The English-speaking co-authors carefully edited this manuscript (edited words / sentences were marked with red color), including the exampled sentence in lines 403-404: "The upregulated metabolites included tryptophan (Trp) and two tryptophan-related metabolites achillamide derivatives." Lines 136-139, this is a far too sweeping statement especially given that this is an artificial "drought" study in pots. It needs to be couched as upregulated for this environment. Let's see the results across multiple field environments before concluding as such. Also, there is way to conclude that these shifts in metabolites are causal in conferring drought tolerance. They are responding. That is all that can be concluded.

Reply:
The big challenge for drought experiments in the field is the control of the weather. Alternatively, we performed the drought experiments in the growth pools to maximally mimic the field conditions, where the growth conditions can be easily controlled. As shown in reply Figure 2 and Figure S1a, the square of each growth pool is 3 m x 7 m = 21 m 2 . Each pool consists of two layers: the top layer is for maize growth, while the bottom layer is for drainage. There are many holes equally distributed in the bottom of the top pool to allow for quick water drainage. A layer of geotextile was placed between the bottom of the top pool and the soil, to prevented soil from falling through the holes. However, we agree with this reviewer that these metabolites are drought responsive. Therefore, we changed our statement as suggested in the revised manuscript lines 138-141 as follows: "Notably, most (83.3%, or 862/1035) metabolites showed up-regulation patterns in response to drought (Fig. 1c), indicating that up-regulation of genome-wide metabolites Lines 151-154, these rrBLUP results with metabolites would be better served with the inclusion of a comparison to prediction of the dry mass, fresh weight, and survival rate phenotypes with genome-wide genetic markers. This population has been scored with SNP markers, so it is a straightforward comparison. The focus on significance here is misleading, as the focus should be on prediction ability/accuracy.

Reply:
Thanks for the suggestion. Drought is a very complex agronomic trait, which is control by many genes with small effects (Hu and Xiong, 2014). The prediction accuracies (r) by drought-responsive metabolites reached ~0.4 for survival rates and even higher for biomass (Fig 1e), indicating that drought-responsive metabolites could efficiently reflect the response and tolerance of maize to drought. As suggested, we performed rrBLUP with SNPs associated with all 3,890 metabolites (s2) and 1,035 drought-responsive metabolites (s1) to predict the survival rates, which have been widely used to monitor maize drought tolerance (Liu et al., 2013;mao et al., 2015;Xiang et al., 2017). Missing markers were imputed by the function A.mat() in R package rrBLUP. The results were shown in Reply figure 3a. Again, the prediction accuracy using SNPs associated with 1,035 drought-responsive metabolites is significantly higher than those using other SNPs. Interestingly, the prediction accuracy with drought-responsive metabolites is higher than that with SNPs associated with 1,035 drought-responsive metabolites (Reply figure 3b), which might because that metabolites play more direct roles than genes in drought response and tolerance. We did focus on the prediction accuracies, but used significance to compare the difference of prediction accuracies with different metabolites. Please see the manuscript in lines 153-155: "The results showed that the prediction accuracy for drought tolerance, especially for survival rate, using 1,035 drought-responsive metabolites, was significantly higher than those using the total or randomly selected metabolites." Reply Figure 3. Comparison of the prediction accuracies of survival rates by metabolites and SNPs associated the metabolites via the rrBLUP approach. a. The prediction accuracies of dry masses, fresh weights and survival rates by all, droughtresponsive, and randomly selected metabolites in the maize population after drought stress via the rrBLUP approach. b. The prediction accuracies of dry masses, fresh weights and survival rates by SNPs associated with all, drought-responsive, and randomly selected metabolites (as shown in a) in the maize population after drought stress via the rrBLUP approach. How is this statistical test carried out anyway? It should not be a t-test. The rationale for not using a t-test is that the magnitude of the denominator of the t-statistic is dependent on the number of resampling runs (standard error of the difference in means).

Response Element in a
Therefore, if performing a sufficient number of resampling runs, the denominator would be very small, and it could then be possible to obtain a significant p-value for any comparison with an arbitrarily small difference. Another way of understanding the problem with the t-test is that because each fold is drawn from the same original dataset, the different folds are not independent of each other, which violates the t-test assumption that observations are independent. I suggest using a bootstrap significance testing approach, which I believe to be valid in this case. If properly conducted, it should be independent of the number of resampling runs.
Reply: Thanks for the valuable suggestion. As suggested, we performed the bootstrap significance testing approach using the function boot.pval() in R package boot.pval to examine the difference of prediction accuracy by using different predictors (Canty and Ripley, 2020;Thulin, 2021). The origin hypothesis is the ratio of two groups of prediction accuracy is equal to 1. Bootstrap replicates were generated using the function boot() in R package boot and the bootstrap replicate number is 100. The comparison of t.test and boot.pval was shown in Reply Reply: Thank you very much for your careful reading and suggestions to our manuscript. It is true that stricter threshold (e.g. Bonferroni or FDR correction) could reduce false positive results. However, it could also cause the loss of false negative results, especially when there is linkage among many SNP markers, which results in the non-independence of these SNPs. Taking into consideration of the linkage among the SNP makers of the same maize populations used in our study, a previous GWAS study of these populations in drought response used a threshold of P < 1 × 10 -5 to control type I error (1/n, n is the total number of SNPs. . To balance the nonindependence of the SNP markers and the Bonferroni correction, we also used the threshold 1/n (n = total markers used) in GWAS in this study, which also has been used in previous GWAS and eQTL studies (Wen et al., 2014;Liu et al., 2017;Wu et al., 2021).
Notably, we proved the reliability of the candidate genes from metabolite-traitbased GWAS with multiple assays: 1) there were 35 previously-identified droughttolerant genes were detected in our candidate gene set ( Fig. 2a and Table S8), 2) association analysis with survival rate (SR) of this maize panel showed that there were more significant SNP-SR associations in candidate genes than random simulation (Fig.   2b), 3) we successfully validated the roles of two candidate genes ZmBx12 and ZmGLK44 in regulation of metabolite traits and maize drought tolerance via genetic and molecular experiments ( Fig. 4-6). Reply: Thanks for the reminder. We rephrased our statement as suggested. Please see lines 179-180 in our revised manuscript: "Therefore, our m-trait-based GWAS could detect many new putative drought-related QTLs in addition to these of known".
Line 178, need to define "lead SNP". I assume this is a peak SNP -a SNP with smallest P-value at a locus, but it still needs to be defined.

Reply:
Thanks for the suggestion, we mean the peak SNP here exactly. We change the description in the revised manuscript. Please see line 182-184: "Based on the peak SNPs (SNP with smallest P-value at a locus) and their linkage disequilibrium (LD), a total of 2,589 candidate genes were identified based on the 7,811 significant SNPs." Lines 179-180, a statistical test of enrichment needs to be conducted to say anything meaningful about finding 190 TFs.
Reply: Thanks for the valuable suggestion. We applied a Fisher's exact test to examine whether the TFs in our candidate gene set were enriched as compared to the genomewide TFs. There were 2,289 TF genes in all 39,625 genes (ratio = 0.058), while the TF gene number in our 2,589 candidate genes was 190 (ratio = 0.073). The P value of the Fisher's exact test is 6.6 x 10 -4 , indicating an enrichment of TFs in our candidate gene set. The transcription factor list of maize was accessible at http://planttfdb.gao-lab.org/.
The transcription factor annotated and total gene number are based on maize genome v3.31. Please see lines 184-187 in our revised manuscript: "190 genes among them were TFs (Additional File 2, Table S5), which showed a significantly enrichment as compared to the total genome-wide TFs (p = 6.6 x 10 -4 , Fisher's exact test). These results suggested the importance of transcriptional variations in maize drought tolerance regulation."  indicate the threshold of significance. The loci without significant SNPs were filtered out. All 35 candidate genes with known drought tolerance roles were annotated based on the significant GWAS signals and indicated in the panels.
Line 192, I have never heard of it referred to as a permutation assay. Permutation test?
Reply: We revised it as suggested. Please see lines 200-201 in our revised manuscript: "The significance was calculated based on 1000 times of permutation test." Lines 212-217, I do not agree with the candidate gene only analysis in this context. It is more important to see how these associated genes rank genome-wide rather than compared to a random subset. I am not convinced on reliability of results as stated in lines 217-219 based on the collectively results. It is easy to subsample data in different ways with biological hypotheses, but the presentation and argument thereof needs to be better conveyed and supported beyond correlative machinery. Otherwise, it is a weakened argument.
Reply: Thank you for the insightful comments and valuable suggestions. It is a great idea to analyze the rank of our genes genome-wide. We revised the manuscript in lines 224-230 as follows: "Next, we ranked the genes based on the p-value of the peak SNP of their locus. In the top 100 genes with most significant SNPs, 19 of them were found in our candidate gene set, while the expectation gene number is 8 (Fisher's exact test, p = 6.4 x 10 -4 ).
The number of candidate gene in top 500 gene list is 72, while the expectation gene number is 41 (Fisher's exact test, p = 3.25 x 10 -6 ), and we observed 126 genes in top 1000 gene list, while the expected genes are 82 (Fisher's exact test, p = 1.35 x 10 -6 ).
These results indicated that our candidate genes might be enriched in the putative drought-tolerant genes." Line 258, 1Mb window really needs to be used for cis-eQTL and is more often the standard than 20kb despite the citation. This is resulting in inflated number of trans-eQTL.
Reply: Different from those selfed or semi-selfed plant species, such as Arabidopsis, rice and cottons etc, maize is an outcross plants and shows more rapid LD decay. The average LD in maize diverse inbred lines was estimated to be 1.5kb (Remington et a.,

2001
). The overall LD decay in the population used in this study is 500bp (r 2 =0.1), reaching single gene resolution . Because of the rapid LD decay, the cutoff for cis-/ trans-eQTL definition in other studies with this maize population has been set as 20kb Liu et al., 2020;Pang et al., 2019).
However, we also tried the 1-Mb window size as cutoff to identified cis-/ trans-eQTLs. Although the exact amounts of eQTLs were different, the general conclusion was not changed based on these eQTLs obtained from different cutoffs, that is, cis eQTL explained larger phenotypic variations of the expression traits as compared to the trans eQTLs (Reply Table 2).