Genome-wide haplotype-based association analysis of key traits of plant lodging and architecture of maize identifies major determinants for leaf angle: hapLA4

Traits related to plant lodging and architecture are important determinants of plant productivity in intensive maize cultivation systems. Motivated by the identification of genomic associations with the leaf angle, plant height (PH), ear height (EH) and the EH/PH ratio, we characterized approximately 7,800 haplotypes from a set of high-quality single nucleotide polymorphisms (SNPs), in an association panel consisting of tropical maize inbred lines. The proportion of the phenotypic variations explained by the individual SNPs varied between 7%, for the SNP S1_285330124 (located on chromosome 9 and associated with the EH/PH ratio), and 22%, for the SNP S1_317085830 (located on chromosome 6 and associated with the leaf angle). A total of 40 haplotype blocks were significantly associated with the traits of interest, explaining up to 29% of the phenotypic variation for the leaf angle, corresponding to the haplotype hapLA4.04, which was stable over two growing seasons. Overall, the associations for PH, EH and the EH/PH ratio were environment-specific, which was confirmed by performing a model comparison analysis using the information criteria of Akaike and Schwarz. In addition, five stable haplotypes (83%) and 15 SNPs (75%) were identified for the leaf angle. Finally, approximately 62% of the associated haplotypes (25/40) did not contain SNPs detected in the association study using individual SNP markers. This result confirms the advantage of haplotype-based genome-wide association studies for examining genomic regions that control the determining traits for architecture and lodging in maize plants.


Introduction
Maize (Zea mays L.), along with rice and wheat, is one of the most important agricultural crops worldwide [1] and has been used as a food source [2] and as a raw material for pharmaceutical and agroindustrial products [1,3], because of its nutritional composition (72% starch, 10% protein and 4% fat), versatility and broad adaptability. Usually, maize breeding programs have focused on obtaining gains in grain yield [4,5]. However, the selection of cultivars based a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 seasons in the town of Iguatemi (latitude: 23S 25'; longitude: 51W 57' and altitude: 550 m), located in the experimental fields of the Department of Agronomy of the State University of Maringá (UEM), Brazilian state of Paraná. The lines were planted according to a randomized complete block design, with 2 blocks and 28 repetitions per line. The field management was performed using artificial irrigation and according to the agronomic practices suggested by Cruz et al. [31].
The following traits related to plant lodging were evaluated: total plant height (PH), ear height (EH), and the EH/PH ratio. PH was measured from the ground to the tip of the primary inflorescence. EH was measured from the ground to the node of the first ear. The EH/PH ratio was obtained by dividing EH by PH in each repetition. In turn, the leaf angle (LA), one of the primary determinants of plant architecture, was measured in each repetition 10 days after anthesis. Four leaves above the first ear were examined to visually estimate the LA, which was measured on a categorical scale with 4 levels: (1) an inclined angle, (2) erect, (3) very erect, and (4) extremely erect.

Genomic DNA and SNP discovery
Genomic DNA was isolated from the leaf tissue of inbred lines, according to the protocol established by Coan et al. [21]. Subsequently, the DNA samples were sent to the Genomic Diversity Institute of Cornell University for SNP discovery via genotyping by sequencing (GBS), which is described in Elshire et al. [23] and at https://bitbucket.org/tasseladmin/tassel-5-source/wiki/Tassel5GBSv2Pipeline (Institute of Genomic Diversity, Cornell University). The description of the GBS method used can be summarized as follows: (I) the reduction of genome complexity by digestion with restriction enzymes sensitive to ApeKI methylation; (II) the ligation of adapters (compatible with restriction enzymes) to the genomic DNA fragments; (III) the grouping, purification and amplification of the genomic material by means of Illumina sequencing primers (with a binding site specific to the ligated adapters) and PCR; (IV) the purification of the PCR product and the confirmation of the fragment sizes within the GBS libraries by means of a DNA analyzer (BioAnalyzer, Agilent Technologies, Inc., USA); and (V) the quantification and dilution of the libraries for subsequent sequencing in an Illumina HiSeq 2000 (Illumina Inc., San Diego, CA).
Once the reading sequences were obtained (raw data), TASSEL 5.2 software [32] was used to align the data with the Zea mays version AGPV3 reference genome (B73 RefGen v3) [33]. After the alignment, the heterozygosity and the variants of Minor Allele Frequency (MAF) were calculated using VCFtools (version v0.1.12a). The readings with a quality score below 80 on the Phred scale (genotyping quality) [34] were eliminated. Finally, the database was filtered considering an MAF > 0.01 and a proportion of missing data per site < 90% [35].

Linkage disequilibrium (LD)
The LD analysis was performed in TASSEL 5.2 [32], using SNPs that had less than 25% missing data and an MAF > 0.05. The LD between marker pairs was estimated using the correlation coefficients of the allelic frequencies (r 2 ) considering all the possible combinations of the alleles. The level of significance (p-value) was calculated using 10,000 permutations.
The critical r 2 value was estimated according to Breseghello and Sorrells [36] and Laidò et al. [37], using the transformation of the square root of the r 2 values and taking the 95th percentile of these data as a threshold for which the LD is likely to be caused by a real physical linkage. Finally, the LD patterns were evaluated by means of a non-linear regression between the r 2 values and the physical distance of the SNPs (in bp) [38].

Haplotype blocks
A haplotype-block is defined as arrays of two or more SNP with high linkage disequilibrium. Haplotype blocks were constructed for each chromosome, considering SNPs with a MAF greater than 0.05 and with less than 25% missing data, using HAPLOVIEW [39]. The formation of the blocks was performed using the confidence interval method, described by Gabriel et al. [40] and implemented in HAPLOVIEW. This method considers the 95% confidence intervals of the D' values, classifying the LD as "strong LD", "inconclusive", or "strong recombination". Finally, blocks were built if 95% of the comparisons were of the "strong LD" type. Haplotype blocks were later transformed into multiallelic markers, for the subsequent haplotypetrait association analyses, regarding the allelic combinations within each block to be independent alleles.

Population structure and estimation of the kinship matrix
The kinship matrix was calculated based on identity-by-state (IBS) [41], using TASSEL 5.2 [32]. The population genetic structure was inferred using a Bayesian clustering model, available in STRUCTURE 2.3.4 [42], with a subset of markers without missing data (~5,000 SNP). For the STRUCTURE program, twenty runs were considered for each possible K (number of populations) ranging from 2 to 10, with the admixture model [42], correlated allele frequencies, a burn-in period of 1x10 5 , and 1x10 6 MCMC repetitions. The optimal value of K was estimated from the second-order change rate of the probability function with respect to K (ΔK), as proposed by Evanno et al. [43]. Additionally, a principle component analysis (PCA) was performed in TASSEL and dendrogram was constructed according to the hierarchical neighbor joining method using DARWIN 5.0.158 [44] to visualize and corroborate the results from STRUCTURE.

Statistical analysis of the phenotypic data
The analysis of the phenotypic data was performed using the following mixed linear model, with SAS software (SAS Institute, Inc., Cary, NC): where y represents the phenotypic values (for LA, PH, EH or the EH/PH ratio); X, Z 1 and Z 2 correspond to the known incidence matrices that relate y with the vectors β, l, and ls, respectively; β is the fixed effect vector of the season and the nested block within each season; l corresponds to the vector of random genotypic effects (lines); ls is the random effects vector due to the line-season interaction; and ε is the vector of the random residual effect. The restricted maximum likelihood (REML) method was used for the estimation of variance components. The following information criteria were used to determine the statistical significance of the random effects (model comparisons): where BIC corresponds to the Schwarz (Bayesian) information criterion; AIC is the Akaike information criterion; -2log RL denotes the maximum value of the log likelihood (restricted); d corresponds to the dimension of the model; and n is the number of valid observations for the estimation of RL. The best model was the one that minimized the value of the information criteria [45].
The variance components of all the traits were determined using ASREML 4 [46]. For leaf angle, the statistical analysis was performed using a generalized linear mixed model (GLMM) because it supports ordinal variables. GLMM uses an approximate probability technique called penalized quasi-likelihood (PQL) [47], also known as pseudo-likelihood [48], which is based on an approximation of the first-order Taylor series.
Heritability (H 2 ) was calculated for all traits using the following expression: where s 2 g is the genotypic variance, s 2 gs represents the variance of the interaction between the genotype and the environment, s 2 ε corresponds to the residual variance, n is the number of environments (growing seasons), and r indicates the number of repetitions.

Genome-wide association study (GWAS)
Both association analyses, haplotype-trait and SNP-trait, were performed using a mixed linear model (MLM) in TASSEL 3.0 and TASSEL 5.2, respectively. The statistical model considers the effects of the population structure (Q) and genetic relationships or matrix kinship (K) among inbred lines, which has the following expression: where y is the vector of adjusted phenotypic observations (Adjusted Entry Means) [49], α is the vector of SNP molecular markers or haplotype blocks (fixed), v is the vector of the effect of the population structure (fixed), μ is the vector of polygenic effects or genetic background (random), and ε is the vector of residual effects. S, Q and Z are the incidence matrices of the associated vectors.
The SNP markers with a MAF less than 0.05 and with missing data greater than 25% were excluded from the association analyses to reduce the false positive rate (type I error) [50].

Trait correlations and pleiotropic
Trait correlations were calculated according to Pearson's correlation coefficient in the R-project [51] software. Additionally, the probability that a locus is truly associated with more than one trait was evaluated by the logarithm of Bayes Factor (log10 (BF)) and the Posterior Probability of Association (PPA) [52]. The PPA was calculated as follows: where π is the prior probability that a given SNP is truly associated with the trait of interest (π = 1.0 × 10-3). BF was calculated using a Bayesian multivariate regression analysis in the SNPTEST software [53].

Reference genome
Genes and QTLs recorded in the maize genome AGPV3 (B73 RefGen_v3), available in the MaizeGDB database (http://www.maizegdb.org//) [54], were used as references. For this, a window (or threshold) of twice the distance indicated by the LD (0.9 kb, plus the size of the haplotype) was established, placing the SNP or haplotype in the center of the window.

Results
According to the information criteria, the model that best fits the PH, EH and EH/PH data was the complete model (M3), which includes the effect of line-season (LxS) interaction (Table 1), while for LA, the model without LxS interaction (M2) was the best fit model. Moreover, based on the results of AIC, it was determined that there were no differences between the M2 and M3 models (Δi<2) [55] for LA. The moderate coefficients of variation (CV) found in the four traits (19% in LA, 10% in PH, 16% in EH and 11% in the EH/PH ratio) indicate an adequate level of experimental precision. The average values for PH (1.52 and 1.27, respectively) and EH (0.81 and 0.58, respectively) differed between growing seasons, with higher values observed during 2014-2015, indicating that the environment had an impact on variation for these traits. On the other hand, the LA trait did not show significant changes between seasons (1.287 and 1.291, for 2014-2015 and 2015-2016, respectively). The high values of heritability (H 2 = 0.95, 0.74, 0.94 and 0.83, for PH, LA, EH and the EH/PH ratio, respectively) were similar to those reported in previous studies [1,7,15,56]. The strong genetic control observed in this maize panel increases the detection power of genetic regions associated with the traits [57].
Correlations between the traits related to plant lodging were positive and statistically different from zero (p-value < 0.001), with values of r = 0.89 (EH and PH), r = 0.80 (EH and the EH/PH ratio), and r = 0.42 (PH and the EH/PH ratio). On the other hand, LA showed a low negative correlation (r varying from -0.17 to -0.24) with traits related to plant lodging. The high correlation between EH and PH indicates that as plant height increases, so does the ear height, avoiding an increase in EH/PH ratio. On the other hand, the EH/PH ratio presented a better correlation with EH than with PH, which could indicate that this trait is mainly influenced by a change in EH.

Population structure
According to the method proposed by Evanno et al. [43], linked to the STRUCTURE results, the 183 maize genotypes were divided into two genetically differentiated groups: field corn (98 genotypes) and popcorn (85 genotypes) (Fig 1A). These results were similar to what was visualized with the first two primary components of the PCA method ( Fig 1B) and neighbor joining dendrogram (Fig 1A). The axes PC1 and PC2 explained 11.6% and 3.4% of the variation in the genotypic data, respectively, and were the first axes that separated most of the inbred lines.

Linkage disequilibrium (LD)
The pattern of LD was estimated at the level of the complete genome and for each chromosome, considering a high density of SNP markers (38K). The LD presented a rapid decrease Table 1

. Model selection results based on the Bayesian (BIC) and Akaike (AIC) information criteria for leaf angle (LA), ear height (EH), plant height (PH) and the EH/PH ratio in inbred maize lines.
The model corresponds to a model that does not include the effects of line and the environment-line interaction; the model considers the effects due to line, and the model corresponds to the complete model.

Genome-wide association study with individual SNP markers
The linear model used in the GWAS considered the matrix information of Q (genetic structure) and K (kinship), to reduce the false positive rate and correct the spurious associations in the analysis [35]. According to the analysis of mixed models, a total of 163 SNPs were identified to be associated with the four traits of interest, 15 of which were associated in both growing seasons and 11 of which were shared among multiple traits. Twenty-five of the detected SNPs were associated with LA, 37 presented significant associations with EH, 24 were associated with PH, and 67 SNP were associated with the EH/PH ratio. The information regarding the SNPs detected in this study are summarized in Table 2  The proportion of the phenotypic variation explained by SNPs for LA varied between 10.2 and 22.2%, with SNP S1_317085830, located on chromosome 6, being the trait that presented the greatest effect during the growing season 2014-2015 (Table 2). In general, the r 2 values were moderate to relatively high, explaining between 7% (SNP S1_285330124, associated with the EH/PH ratio during the 2014-2015 season, located on chromosome 9) and 22.2% (SNP S1_317085830, associated with LA during the 2014-2015 season) of the phenotypic variation ( Table 3). The r 2 values in this study were relatively higher than those found in the literature for these traits [28]. Two of the loci with relatively major effects related to plant lodging, the SNPs S1_333289434 (located on chromosome 6) and S1_81113660 (on chromosome 10), were concomitantly associated with EH and PH in the 2015-2016 growing season, which explained 9.5-15.2% and 10.6-15.3% of the phenotypic variations, respectively. Additionally, other marker-trait associations were jointly identified for PH and EH, such as the SNPs S2_47326011 (bin 1.03), S2_230948259 (bin 1.08), S1_1158043959 (bin 3.04), S1_1280061812 (bin 3.09), S1_1037703810 (bin 5.06), S1_821562289 (bin 7.04), S1_494063499 (bin 8.01) and S1_639521797 (bin 8.06). Similarly, SNP S1_16427856 (bin 10.02) was jointly associated with the traits EH and the EH/PH ratio. These results suggest a possible pleiotropic effect for the traits related to lodging [1]. In this study, more than four associations per bin were found in the chromosomal bins 1.  (9) associated with the traits of interest, six of which are associated with PH, two with EH, and one with LA. Notably, this genomic region had associations for three of the traits, suggesting a possible link among them.
All of the associations detected in the PH, EH and the EH/PH ratio were year-specific (they are present in a particular growing season), while for LA, 75% of the associations were present in both years. This result indicates that most of the QTLs for LA were stable in both growing seasons. This result is in accordance with the AIC and BIC information criteria, which point to the significant presence of a genotype-environment interaction (line x growing season) for PH, EH and the EH/PH ratio. However, the model that did not include the genotype-environment interaction (M2) was a better fit for the LA data, according to the AIC and BIC, indicating that the genotype-environment interaction does not provide information to the complete model [55]. Thus, the inbred lines did not show interaction with the growing seasons, displaying stability in LA, and consequently, in the associations identified through GWAS.

Haplotypes associated with complex traits
A total of 7,831 haplotypes (each containing between 2 and 20 SNPs) were identified in the 10 maize chromosomes. Of these haplotypes, 55% contained two SNPs, 23% contained three SNPs, 10% contained four SNPs, and 12% contained five to 20 SNPs. In turn, the complete genome association analysis based on haplotypes identified 40 blocks that were significantly associated with the traits of interest, of which 6 were associated with EH, 11 with LA, 15 with PH and 8 with the EH/PH ratio (Table 4) (Fig 4). Three haplotypes (hapEH5.07, hapEH7.02 and hapEH8.06) were concomitantly associated with EH and PH (in the same year of measurement), suggesting a possible pleiotropic effect. In addition, 83% (5/6) of the haplotypes detected for the LA trait were present in both growing season, suggesting the possible stability of these associations. The haplotypes hapEH10.07 and hapEH4.09 showed differences greater than 40% among their allelic variants for the trait EH. In effect, individuals that presented the "b" allele of the haplotype hapEH10.07 had an ear height 62% and 93% higher than those of individuals with the alleles "a" and "c", respectively. On the other hand, individuals with the "d" allele of haplotype hapEH4.09 presented 43%, 88% and 48% higher EH than those of individuals with the alleles "a", "b" and "c", respectively. These results show the ability of these haplotypes to differentiate individuals with higher ear height. For loci with possible pleiotropic effects, individuals that presented the "a" and "c" alleles of the haplotype hapEH7.02-hapPH7.02 showed significant differences compared with individuals with the "b" allele, for both traits (EH and PH) in the 2015-2016 season. Indeed, plants that presented the "a" and "c" alleles (haplotype hapPH7.02) were 19% and 23% taller, respectively, than those of individuals with the "b" allele, and had 29% and 42% higher EH (hapEH7.02), respectively, than individuals with the "b" allele. In addition, the blocks hapEH5.07-hapPH5.07 and hapEH8.06-hapPH8.06 also presented differences between their allelic variants for both traits (EH and PH) during the 2014-2015 growing season. Plants that presented the allele "b" in the haplotype hapPH5.07 were 15% and 22% taller than individuals with the allelic variants "a" and "c", while for EH, plants that presented the "b" allele (hapEH5.07) had 22% and 33% higher EH than individuals with the "a" and "c" alleles. Similarly, individuals with the "c" and "d" alleles of the haplotype hapPH8.06 were 10% to 14% taller, and had 9% to 21% higher EH than those of individuals with the "a" and "b" alleles.
A total of 38% of the haplotypes significantly associated with a certain trait contained one or more SNPs that were previously detected during the association analysis using individual markers. This result shows that the haplotype analysis allowed the identification of genetic regions that were not detected by GWAS with the use of individual markers. The phenotypic variation explained by the haplotypes varied between 16% and 29% for LA, between 8% and 17% for PH, between 9% and 18% for EH, and between 8% and 16% for the EH/PH ratio ( Table 4). As in the analysis using individual SNPs, the LA trait presented the highest values for r 2 (Table 5), in which two haplotypes can explain more than 23% of the phenotypic variation (haplotypes hapLA4.2 and hapLA1.1, located on chromosome 4; Table 5). These values are slightly higher than those observed in the analysis of individual SNPs. Although the haplotypes encompass a larger genetic region (due to the grouping of SNPs), 45% of these did not increase the phenotypic variation explained in comparison to the SNPs that compose them; for example, the haplotype hapPH9.02 explained 8% of the phenotypic variation, while the sum of its SNPs explained 14%. However, 55% of the haplotypes explain a phenotypic variation greater than the sum of the SNPs that compose them; for example, the haplotype hap4.05B explains 23% of the variation, while the sum of its 3 SNPs only explains https://doi.org/10.1371/journal.pone.0212925.g004 6%. The latter example shows the potential presented by the use of haplotypes over individual SNPs to increase the phenotypic variation explanations for a trait.

Gene annotation based on individual SNPs and haplotypes
Based on the physical position (reference genome B73) of the significantly associated SNPs and haplotypes, according to GWAS, 122 SNPs and 31 haplotypes are close to candidate genes. Twenty of these SNPs, and eight haplotypes were associated with more than one trait or one season at a time. Moreover, some SNPs and haplotypes (20) are close to same candidate genes. Therefore, only 95 unique candidate genes were found in the present analysis. Six of these candidate genes are of particular interest because of their proven participation in the expression of the traits of interest (i.e., GRMZM2G141386, GRMZM2G130675, GRMZM2G104262, GRMZM 5G845755, AC205122.4_FG003 and GRMZM2G042429). For LA, the GRMZM2G104262 gene has an ortholog in Arabidopsis thaliana that encodes cryptochrome 1 (CRY1), which, at the low wavelengths of blue light, establishes the rapid induction of the hyponastic growth of the petiole [58], independent of ethylene [58,59]. Moreover, Wu and Yang [60] demonstrated that the overexpression of CRY1 positively regulates systemic acquired resistance (SAR) and positively regulates the expression of R protein, which mediates resistance to Pseudomonas syringae, limiting the growth of bacteria in Arabidopsis.
For EH, the AC205122.4_FG003 gene encode for an uncharacterized protein in maize; however, its homolog in Arabidopsis encodes the protein C2-domain ABA-related 5 (CAR5), which is related to the sensitivity to abscisic acid (ABA) [61]. In fact, CAR5 mutants showed reduced sensitivity to ABA and rapid growth [61]. In addition, the GRMZM5G845755 gene encodes the subunit protein kinase alpha 4 (CKA4), a component of protein kinase CK2 (a structural tetramer complex), which has properties very similar to those described in other plants [62]. Lee et al. [63] demonstrated that this protein appears to be involved in plant growth. Further, Wang et al. [64] reported that, in Arabidopsis, a knockout mutant of CKA4 exhibits defects both in the development and elongation of the hypocotyl, delaying growth in general.
For marker-PH associations, the GRMZM2G141386 gene encodes a putative RNA-binding protein (ARP1). However, Makabe et al. [65] characterized this protein as a disease resistance activator and a plant growth repressor in Nicotiana tabaccum. The genes GRMZM2G042429 (close marker-PH associations) and GRMZM2G130675 (EH/PH ratio), express small auxin-up RNA 52 (SAUR52) and SAUR1, respectively, two members of the SAUR family, which is composed of genes with early responses to auxin and are key effectors of the hormonal and environmental signals that regulate the growth and development of plants [66]. In soybean and Arabidopsis thaliana, these proteins can promote the elongation of the hypocotyl when stimulated by auxins [67,68].

Genotype-environment interaction (locus-environment)
For LA, 15 SNPs (75%) and 5 associated haplotypes (83%) were identified during both growing seasons. The selection of the model that does not consider the genotype-environment interaction as the best fit for LA, according to the AIC and BIC criteria, supports the stability resulting from LA associations. However, for the traits PH, EH and the EH/PH ratio, the model with the best fit, according to AIC and BIC, includes the effects of the genotype-environment interactions (the complete model), indicating that the response of the inbred lines was dependent on the growing season. This result allows us to explain the absence of stable SNPs or haplotypes during both seasons. Our results indicate that the genotype-environment interactions were important for the study of traits related to plant lodging. In addition, this result could be influenced by differences between the two growing seasons, as both PH and EH showed higher average values in the first season, indicating better growth conditions. Asaro et al. [19] mentioned that local soil differences (from the same test) can favor the differential identification of QTLs between two growing seasons, suggesting that the identified associations may be responding to components such as precipitation, temperature or other climatic factors. The lack of stability between the QTLs detected in PH, EH and the EH/PH ratio can be attributed to the environmental variations that occur during the seasons, demonstrating the importance of performing tests over time and/or in different environments [20]. According to Edmeades [18], in many maize populations, there are random events that can cause genotype-year interactions that are difficult to interpret. For this reason, to obtain associations between genetic stability and the high phenotypic variations explained, the same genetic material should be evaluated during different seasons or in different environments.
The growing seasons in which the inbred maize lines were evaluated did not affect the stability of the associations found for LA, which is in accordance with related studies. Maize can be grown in a wide range of conditions, but abiotic stress variables, such as temperature, soil fertility and climatic factors, affect its production [69]. Therefore, the detection of stable SNPs and haplotypes during different growing seasons can be very useful for breeding programs that seek to identify the best varieties that can adapt to changes in environmental conditions.

Haplotypes and genetic regions associated with complex traits
The rapid decay in LD is typical in the tropical maize germplasm [21,70]. Remington et al. [71] and Yan et al. [72] reported that LD decays within the range of 0.1 to 10 kb, depending on the population and the genetic region studied, and always decays faster in tropical maize [70]. The size of the haplotypes is closely related to the degree of LD present in the population studied [73], in which a rapid decay of LD produces smaller haplotypes because a smaller number of loci will be linked [73,74]. The majority of the haplotype blocks were formed with two or three SNPs (78%), due to the rapid decay of the LD (0.9 kb). Lorenz et al. [78] and Contreras-Soto et al. [22] demonstrated that the use of haplotype information can be beneficial when identifying marker-phenotype associations and can offer advantages for the genetic dissection of loci underlying complex traits. The use of haplotype-based analysis reduces the number of multiple comparisons (or multiple testing), compared to individual SNP-based association analysis, as haplotypes can group SNPs from the LD pattern observed in the data. The dimension reduction, from a biological perspective (haplotypes) becomes more relevant when considering the new genomic data platforms (high-density genomic data), thereby increasing the possibility of finding genomic regions controlling the variation of a trait.
A total of 38% of the significantly associated haplotypes contained one or more SNPs that were previously detected in the GWAS, indicating the advantage that haplotypes have in the detection of multiple DNA variants [22]. The selected SNP-GWAS and haplotype-GWAS thresholds were based on the Bonferroni correction method, these thresholds were equivalent to p-values = 2.6x10 -6 and 1.3x10 -5 , respectively. Although the loci identified did not surpass Bonferroni thresholds, the p-values in both analyses were significant (p-values < 1×10 −3 ), in which 73% of the loci had p-values lower than 8.5x10 -5 (and 41% lower than 8.3x10 -6 ). It is worth noting that 16% of DNA variants were consistently detected in both methods, confirming the high reliability of the detected loci. In this study, the associated haplotypes explain a greater proportion of the phenotypic variability of a trait than the SNPs that compose it. For example, the haplotypes hapLA4.05B and hapEH4.09 explained 23% and 18% of the phenotypic variation for LA and EH, respectively, while the sum of its 3 SNPs only accounted for 6% and 4%, respectively. Moreover, the haplotype hapEH4.09 showed significant differences (up to 88%) among its allelic variants, with individuals containing the "d" allele being superior to those with the other alleles. This result indicates the potential utility of haplotypes to explain the phenotypic variation associated with a trait, an aspect highlighted by Contreras-Soto et al. [22], Chen et al. [30] and Barendse [79].
The EH and PH traits had a high and positive correlation (r = 0.89), which is in accordance with previous studies [1,12]. However, the EH/PH ratio had a greater correlation with EH (r = 0.80) than with PH (r = 0.42), indicating that the variation in the EH/PH ratio is dominated primarily by changes in EH. Therefore, it is possible to observe that, in the same genomic regions, markers can be found that are significantly associated with both PH and EH, and with EH/PH ratio and EH [1,80]. In the present study, three haplotypes and ten SNPs showed significant associations with PH and EH concomitantly, whereas only one SNP was simultaneously associated with the EH/PH ratio and EH. This suggests a common genetic control for these traits, which could result in possible pleiotropic effects for these loci [1].
The pleiotropy among loci associated with the traits of plant lodging was corroborated by a multivariate Bayesian regression [53]. As PPA combines the evidence in the observed association data (BF) with the prior probability (π) that an SNP is associated with a given trait, we regarded that loci with values of APP > 0.9 are truly associated with a given phenotype. This value of PPA corresponds to a log10 (BF) > 3.95, which is higher than previous association studies [75,76,77].
The PPA values of 0.924 and 0.994, for SNPs S2_230948259 and S1_81113660, respectively ( Table 6) provide convincing evidence of the association of these two loci with more than one phenotype (EH and PH). Similarly, the high values of log10 (BF) (> 3.95) can be considered as strong evidence against the null hypothesis of no association. This Bayesian analysis indicates that only two loci presented a significant pleiotropic effect.
In the present study, the SNPs and haplotypes associated with PH, EH and the EH/PH ratio were distributed in all of the chromosomes, confirming their quantitative architecture, while those associated with LA were only found in chromosomes 1, 2, 4, 5, 6, 7 and 8. Several Table 6. Analysis of pleiotropic loci with their respective the p-values for traits related to plant lodging. The Bayes Factor (BF), posterior odds (PO) and the corresponding posterior probability of association (PPA) for two values of the prior probability are shown. previous studies in maize that used biparental populations and recombinant inbred lines for associative mapping have reported similar genomic localizations for marker-trait associations. Weng et al. [7], for example, found 27 QTLs in bin 1.03, 6 associated with PH and 21 with EH. In addition, Li et al. [1] reported three QTLs in bin 2.07 associated with EH, and Pan et al. [81] identified four QTLs, three associated with PH (bins 3.09, 8.06 and 9.03) and one with EH (bin 6.06).
In the present study, several hotspots (containing more than three SNPs or haplotypes) were identified in the chromosomal bins 1.03, 9.03 and 10.03 for PH, bins 4.08, 7.00, 7.02, 8.04 and 8.06 for EH, bins 4.05 and 5.03 for the LA trait, and bins 1.01, 4.09, 7.02 and 8.03 for the EH/PH ratio. In addition, the SNPs S2_47326011 and S1_639521797 that were concomitantly associated with PH and EH were located in bins 1.03 and 8.06, respectively, which are two hotspots related to the lodging of plants. At the haplotype level, hapEH8.06-hapPH8.06, which is also concomitantly associated with both EH and PH traits, is located in bin 8.06. Coincidentally, Weng et al. [7] reported QTLs shared between EH and PH in bin 1.03, suggesting that these regions may contain genetic mechanisms that control both traits.
To date, several QTL mapping studies have been reported for plant lodging and architecture [16,82,83,84,85]. However, only a few have reported QTLs or associations with relatively major effects. In this study, a total of 9 SNPs and 14 haplotypes linked to QTLs were able to explain a significant percentage of the phenotypic variation (>15%) in the lodging and architecture traits of the plant (Table 3 and Table 5). These associations were found in chromosomes 1, 2, 4, 5, 6, 7, 8 and 10, which agrees with previously reported studies. For traits related to the lodging of plants, several studies have identified loci with major effects; for example, Zhang et al. [84] identified a QTL explaining much of the phenotypic variation for PH on chromosome 4, while Zhu et al. [85] found major effect QTLs for EH and PH that were located on chromosomes 1, 8, 9 and 10. In this study, one SNP (S1_81113660) and three haplotypes (hapEH8.06, hapPH8.06 and hapPH10.03) of greater effect coincide with the QTLs described by Zhu et al. [85] on chromosomes 8 and 10. For LA, QTLs have been identified that contribute in large part to the phenotypic variation; for example, Ku et al. [16] identified two QTLs that explain more than 17% of the variation, Wang et al. [82] reported only one major effect marker on chromosome 7, and Zhang et al. [83] found a QTL explaining 36.82% of the phenotypic variation. Among the major effect QTLs identified in this study, the haplotype hapLA1.01 (16.3%) and SNP S1_1296983734 (16.8%) are adjacent to the major effect QTLs found by Ku et al. [16] on chromosomes 1 and 2, respectively, while the haplotype hapLA7.03 (17%) is adjacent to qLAa7-1, described by Wang et al. [82]. Likewise, two SNPs, S1_1648381508 (17.8%) and S1_1550828517 (18.6%), and three haplotypes, hapLA4.05A (18.1%), hapLA4.05B (22.8%) and hapLA4.04 (28.3%), were located adjacent to QTL qLA4-1 (36.82%), detected by Zhang et al. [83], a key locus that explains much of the phenotypic variation for LA. Importantly, the major effect QTLs identified for LA (hapLA4.05B = 23.3% and hapLA4.04 = 28.8%) were located on chromosome 4, which is consistent with the study by Zhang et al. [83]. This result confirms that chromosome 4 plays a key role in the variation of LA and should be considered in breeding programs of maize.
Our results indicate that the traits of interest (plant lodging and architecture) are controlled by multiple genes, with variable contributions to phenotypic expression. In this work, the first major effect haplotypes associated with leaf angle are presented, representing a fundamental aspect for architecture of maize and marker-assisted selection, and which appear to be stable between growing seasons. In general, the haplotypes that are significantly associated with traits related to plant lodging and architecture explain a greater proportion of the phenotypic variance in comparison to the associated SNPs, which supports the hypothesis that using haplotypes in association studies allows the identification of genomic regions responsible for controlling a large part of the variation in the traits of interest.

Annotated genome data
Maize is an ideal biological system for the application of GWAS through panels of inbred lines [21] because there is a high-quality reference genome that allows the generation of enough informative markers [21,33]. In addition, one of the advantages of GWAS is that it allows high-resolution mapping. For the identification of genomic annotations for the traits of interest, we considered a 0.9 kb window (determined by the LD pattern) upstream and downstream of the significant associations. Based on this principle, annotations of 126 unique candidate genes were found, of which only six genes were considered to be biologically important due to their proven participation in the studied traits (i.e., the genes GRMZM2G141386, GRMZM2G 130675, GRMZM2G104262, GRMZM5G845755, AC205122.4_FG003 and GRMZM2G042429) [58,60,61,62,63,64,65,66,67,68].
LA presented a gene orthologous to GRMZM2G104262 in Arabidopsis, which encodes CRY1. According to Millenaar et al. [58], CRY1 has been reported to have two positively correlated functions with light intensity: the rapid induction of the hyponastic growth of the petiole at low levels of low blue light wavelength, and the positive regulation of Pseudomonas syringae resistance [60]. In turn, Joshi and Chand [86] showed a positive correlation between LA and spot blotch resistance, caused by Bipolaris sorokiniana in wheat, in which individuals with straight or semi-straight leaves presented a lower incidence of the disease compared to individuals with droopy leaves. These results would justify the double function observed in CRY1 and suggest that LA can influence the incidence of a disease. It is therefore necessary to perform investigations that elucidate the genetic structure of LA in maize and its degree of association with diseases.
The genes AC205122.4_FG003 and GRMZM5G845755 are found in genomic regions associated with EH and encode polypeptides of unknown function. However, these genes have orthologs in Arabidopsis that express the CAR5 and CKA4 proteins, which have been associated with hypocotyl elongation and plant growth [61,63,64]. Similarly, for PH and the EH/ PH ratio, orthologs for the GRMZM2G042429 and GRMZM2G130675 genes, respectively, encode SAUR52 and SAUR1 proteins that can promote hypocotyl elongation through auxin stimulation [67,68]. Auxin is a key plant growth hormone, which regulates processes such as cell elongation, division, differentiation and morphogenesis during the growth and development of the plant [87]. SAUR proteins participate in the auxin early response, and play key roles in hormonal and environmental signals that regulate the growth and development of plants [66].
In this study, some haplotypes contained SNPs that were not detected in the GWAS using individual SNP markers. This result represents an advantage of using haplotypes over individual SNPs in the detection of multiple DNA variants. Another advantage of haplotypes is that they present multiple allelic variants, which could facilitate the search for loci that affect the gene expression of a certain trait. Finally, the use of the haplotypes and SNPs identified in this study could increase the efficiency of maize breeding programs, based on important determinants of plant lodging and architecture.