CUBIC: an atlas of genetic architecture promises directed maize improvement

Background Identifying genotype-phenotype links and causative genes from quantitative trait loci (QTL) is challenging for complex agronomically important traits. To accelerate maize gene discovery and breeding, we present the Complete-diallel design plus Unbalanced Breeding-like Inter-Cross (CUBIC) population, consisting of 1404 individuals created by extensively inter-crossing 24 widely used Chinese maize founders. Results Hundreds of QTL for 23 agronomic traits are uncovered with 14 million high-quality SNPs and a high-resolution identity-by-descent map, which account for an average of 75% of the heritability for each trait. We find epistasis contributes to phenotypic variance widely. Integrative cross-population analysis and cross-omics mapping allow effective and rapid discovery of underlying genes, validated here with a case study on leaf width. Conclusions Through the integration of experimental genetics and genomics, our study provides useful resources and gene mining strategies to explore complex quantitative traits.


Supplementary Notes. Supplementary Figures
. Pipeline for variant calling, genotyping and imputation.                   Tables   Table S1. The phenotype statistics for the 23 agronomic traits measured in this study. Table S2. Summary of genetic contributions to trait variance. Table S3. Known metabolites that are significantly correlated with ear leaf width.

Creation of desirable inbred lines by genomic modeling
Through the integration of QTL with relatively large phenotypic effects, this study provides a practical opportunity for inbred line improvement and could be further applied to the precise customization of elite parental lines for hybrid breeding. Moving beyond the potential application of individual QTL, we now demonstrate the value of the ensemble model in pyramid breeding. To simplify the issue, we chose EW, DTA, and PH as representative of yield, flowering time, and plant architecture, respectively, which are main target traits in maize improvement programs.
The QTL identified for EW also show various effects on DTA and PH (Fig. S8a). This is readily understandable because ear weight is affected by dynamic and complex combinations of traits including DTA and PH (Fig. S8b). During offspring selection, we had already produced lines with higher (or comparable) yield, earlier flowering, and reduced plant height as compared with all parents (Fig. S8b). However, not all favorable alleles for EW QTL were present in any offspring line, indicating that there is still room for yield improvement in future generations. By integrating the absent favorable alleles, the whole population displayed another 27.6g (+30.3%) of growth potential, on average, for EW, with little change (-1.3%) in flowering time but a relatively large increase (+11.1%) in plant height (Fig. S8c). Genomic modeling could provide a path for breeders to follow towards breeding optimization of all desirable alleles for all required traits. Considering multiple phenotypes together is essential and more productive than a simpler approach, since a large fraction of the genetic basis of complex traits is shared, even for those non-correlated (p>0.05) traits (Fig. S8de). The results of the present project allow us to estimate the greatest yield potential for any given genotype, with limited alteration (or simultaneous optimization) of other traits. At each stage, the boxes colored in red represent corresponding procedures, followed by descriptions and specific parameters in colorless boxes. For genotype imputation (step three), the second colored box represents the simulated standard adopted, and the box in green presents the final variant statistics.     The observed and predicted space of EW along axes for DTA and PH for the whole population. The best (red points and red-yellow surface) and worst (green points, blue-purple surface) performances were predicted by integrating and removing all favorable allele/haplotypes for EW within each line, respectively, compared to the observed trait value (blue points) at specific genotypes. The three large black points were mean performance of DTA, PH, and EW in the corresponding spaces. (d) Measuring phenotypic correlations using trait values. (e) Measuring cross-phenotypic correlations using genetic sharing. For each trait, all corresponding QTLs (Trait 1 (QTL), represented by peak trait value variants) identified by GWAS (SV-MLM) were tested if they affected other non-correlated (P>0.05) traits (Trait2 (co-effect)) in this study. The circular axis represented the significance (-log10(p-value), increasing moving outwards) of the effects on Trait2 from QTLs from Trait1, using linear regression by controlling population structure, additive effects and the phenotypes of Trait1.

Fig. S11. Enrichment of intergenic QTL on the potentially functional elements.
The plot shows the distribution of observed and expected QTL-MNaseHS (MNase hyposensitive) co-localization (a) and QTL-ACR (accessible chromatin regions) colocalization (b). The expected distributions were obtained by randomly drawing the same number of intergenic QTL regions and comparing them with the MNaseHS and ACR regions identified in maize [28,29]. The dashed black line indicates the mean of random colocalization proportion (fe), and the solid red arrow indicates the observed colocalization proportion (fo) in this study. The one-sample t test was used to test whether the fo is derived from the distribution of fe. The one-tail P-values indicate rejection of the null hypothesis and provides statistical proof that the observed value is significantly higher than the expected in this data set.

Fig. S15. Identification of functional genes by cross-population analysis and crossomics mapping. (a)
Narrowing down candidates by co-localization of eQTL and pQTL, that those candidates with expression simultaneously regulated by pQTL were likely functional. (b) The candidates whose expression significantly associated with trait variance were likely functional. (c) Narrowing down candidates by cross mapping in various populations, that the genes being identified significantly associated to same or proximity phenotypes had greater chance to be causal. (d) The co-localization of pQTL for trait STI (above panel) and eQTL for RAP2 (AP2/EREBP transcription factor, GRMZM2G700665; below panel) at chromosome 8. (e) Taking the QTL region to perform candidate genes association analysis, RAP2 was identified significantly associated with flowering time traits (DTT above and DTS below) in another un-related population. Thus the RAP2 (marked as black dash) can be proposed as a high potential functional gene. (f) LD structure of candidate RAP2. The most significant variant was marked as black dash.

Fig. S16. Cases for co-localization of eQTL and pQTL in identifying functional
genes. For each pair, the above Manhattan plot is for SNP-trait associations, and the below is for the gene who is located within the pQTL and whose expression is simultaneously regulated by pQTL.