Genome-wide association analysis of agronomic traits in wheat under drought-stressed and non-stressed conditions

This study determined the population structure and genome-wide marker-trait association of agronomic traits of wheat for drought-tolerance breeding. Ninety-three diverse bread wheat genotypes were genotyped using the Diversity Arrays Technology sequencing (DArTseq) protocol. The number of days-to-heading (DTH), number of days-to-maturity (DTM), plant height (PHT), spike length (SPL), number of kernels per spike (KPS), thousand kernel weight (TKW) and grain yield (GYLD), assessed under drought-stressed and non-stressed conditions, were considered for the study. Population structure analysis and genome-wide association mapping were undertaken based on 16,383 silico DArTs loci with < 10% missing data. The population evaluated was grouped into nine distinct genetic structures. Inter-chromosomal linkage disequilibrium showed the existence of linkage decay as physical distance increased. A total of 62 significant (P < 0.001) marker-trait associations (MTAs) were detected explaining more than 20% of the phenotypic variation observed under both drought-stressed and non-stressed conditions. Significant (P < 0.001) MTA event(s) were observed for DTH, PHT, SPL, SPS, and KPS; under both stressed and non-stressed conditions, while additional significant (P < 0.05) associations were observed for TKW, DTM and GYLD under non-stressed condition. The MTAs reported in this population could be useful to initiate marker-assisted selection (MAS) and targeted trait introgression of wheat under drought-stressed and non-stressed conditions, and for fine mapping and cloning of the underlying genes and QTL.


Introduction
Wheat is an important commodity crop and a vital component of a healthy global diet providing starch, dietary proteins, fiber, fats, vitamin B, zinc, calcium, and iron [1]. Its production and productivity in most parts of the world, especially in sub-Saharan Africa, is constrained by drought and high temperature stresses [2][3][4]. Breeding for drought tolerance is one of the key components that could enhance sustainable wheat production and productivity. Concerted a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Plant materials and phenotyping
The study used a population of 93 bread wheat genotypes that were purposefully selected for drought tolerance breeding, based on their divergent pedigree and reaction to drought stress. This sample size was in the same range with those successfully used in some previous studies [18,27,28,30]. Eighty-seven lines were included from a set of heat and drought tolerant and susceptible nurseries received from CIMMYT, while the remaining six lines were drought-susceptible local checks widely grown in South Africa. The 93 genotypes were rigorously phenotyped for key agronomic traits across eight testing environments involving two cultivations (greenhouse and field), two seasons (2014/2015 and 2015/2016) and two contrasting water regimes (drought-stressed and non-watered conditions) [29]. The population was phenotyped for the number of days-to-heading (DTH), plant height (PHT), number of days-to-maturity (DTM), spike length (SPL), number of spikelets per spike (SPS), numbers of kennels per spike (KPS), thousand kernel weight (TKW) and grain yield per plot (GYLD). The data were checked for normality, homogeneity of variance and validity for analysis of variance following the Bartlet (1974) test [31], and were then analyzed as described by Mwadzingeni et al. [29]. Broad sense heritability (H 2 ) estimates were calculated from phenotypic variance (σ 2 p ) and the genotypic variance (σ 2 g ) according to Allard [32] using the formula; H 2 ¼ s 2 g = ðs 2 g þ s 2 gwls =wls þ s 2 gls =ls þ s 2 glw =lw þ s 2 gsw =sw þ s 2 gs =s þ s 2 gw =w þ s 2 gl =l þ s 2 e =rlswÞ ¼ s 2 g = ðs 2 g þ s 2 g xe =e þ s 2 e =reÞ ¼ s 2 g =s 2 p : Where w, l, s and r were the water regime, site, season and replications respectively.

DNA extraction and DArT sequencing
Genomic DNA of the 93 genotypes was extracted from fresh leaf tissue of 2 week old seedlings following the plant DNA extraction protocol for DArT [33]. Population structure, linkage disequilibrium and marker-trait association analyses Population structure was determined using the software STRUCTURE v2.3.4 [34]. The parameters of the project were set at 10,000 burn-in periods, with 10,000 Markov chain-Monte Carlo (MCMC) repetitions after burn-in. Ten iterations were ran for K values of 1 to 20 to allow selection of the replication with the highest mean value of ln likelihood. Genotypic data was imputed for missing values using TASSEL v4.3.15 (https://sourceforge.net/p/tassel/ tassel4-standalone/ci/master/tree/). Linkage disequilibrium was estimated using the squared allele frequency correlations R 2 value from which the number of significant allele pairs (P < 0.01) was determined using 1,000 permutations. Marker trait association analysis, probability values and % of the effect of these markers were calculated following the GLM procedure in TASSEL v4.3.15 according to Bradbury et al. [35].

Phenotypic traits evaluation
Analysis of variance indicated significant differences (P < 0.05) due to the genotype, site, water regime and their interaction effects for all the studied traits. These and the Pearson's correlations (r) were reported by Mwadzingeni et al. [29]. High and positive correlations and high heritability estimates were detected for most of the traits considered in the current study. Spike length, number of spikelets per spike, plant height, number of kennels per spike, number of days-to-heading and thousand kernel weight had higher levels of genotypic variance (σ 2 g ), hence high heritability values of > 50% ( Table 1). The number of days-to-maturity and grain yield had moderate heritability estimates (20% H 2 < 50%).

Population structure
Population structure was constructed to reveal the genetic relationships and to aid genotype selection. Nine distinct populations were recognised (Fig 1) after the LnP (D) kept increasing from -766,307 at K = 1 to -627,026 (with a mean value of ln likelihood of -590,791) at K = 9.  Table 2. The expected heterozygosity of genes among individuals varied from 0.07 to 0.29 with fixation index (Fst) varying from 0.31 to 0.89 among clusters.
In the structure, Cluster 1 consisted of six and four genotypes from the heat and drought tolerance nurseries, respectively ( Table 2). Cluster 2 consisted of only four genotypes from the heat tolerance nursery. This was followed by the largest group (Cluster 3) which comprised of 29 genotypes of which 21 were from the heat tolerance nursery while the remaining eight were from the drought tolerance nursery. Cluster 4 had only genotypes from the heat tolerance nurseries, while Clusters 5, 6 and 7 had mixtures of genotypes. All the local checks (LM61, LM64, LM66, LM67 and LM70) were grouped in Cluster 8, together with ten other genotypes including LM12 from the heat tolerance nursery and nine genotypes from the drought tolerance nursery (Table 2). Likewise, the last cluster contained the genotypes LM78 and LM94 from the drought tolerance nursery.

Linkage disequilibrium
Linkage disequilibrium analysis revealed the presence of 597,871 loci pairs within a physical distance extending up to 16,356 bp. About 45,835 (7.67%) of loci pairs were in significant LD (P < 0.05). Further, 5,188 (0.87%) of the pairs were in complete LD (R 2 = 1). Marker pairs in LD were observed over long distances, however, a clear and rapid decline in LD with distance was observed. Pearson's correlation coefficients revealed negative correlation (r = -0.0813 between the linkage disequilibrium (R 2 ) and the physical distance (bp); as well as between the P-value and R 2 (r = -0.59), revealing the existence of linkage decay.  Table 1 for codes of genotypes.

Marker-trait association
A total of 334 significant (P < 0.05) marker-trait associations (MTAs) were observed. S1 Table  provides the list of significant (0.05 > P > 0.001) MTAs that could also have influence on respective traits. Only the MTAs that had P values < 0.001 (Table 3) were considered as significant for all traits except for grain yield, thousand seed weight and number of days-tomaturity where significant (P < 0.05) marker-trait associations were considered because the three traits are highly complex, often with moderate to low heritability [36]. These markers explained > 20% of the total phenotypic variation observed on all respective traits. Of the MTAs that were considered significant, four loci were identified to be highly associated with the number of days-to-heading, explaining 24.96% to 37.77% of the total phenotypic variation. Two of these makers were located on chromosome 5A, while the other two were found on chromosomes 5B and 6B ( Table 3). The number of days-to-heading were recorded immediately before imposing drought stress but the means from the stressed and non-stressed experiments were used separately for GWAS to check for repeatability. Marker-trait-association analyses revealed association between specific phenotypes and genetic variants within a genome, which could lead to the discovery of genes controlling the traits. Two markers located on chromosomes 1A and 2D were associated with plant height under drought-stress. Under non-stressed condition, six markers were associated with plant height of which two were located on chromosome 2B and the rest were on chromosomes 5A, 5B, 6B, and 7B. These markers explained 23.75% to 28.8% of the variation in plant height. Spike length was associated with thirteen markers under drought-stressed condition explaining 22.17% to 31.96% of the total phenotypic variation; and eight markers under non-stressed condition; explaining 21.20% to 30.45% of the variation in spike length. The markers observed for this trait under drought-stress were from chromosomes 1B, 2B, 2D, 3A, 4B, 5B, 6A, 6B and 7A. Eight DArT markers were associated with spike length under non-stressed condition of which seven markers were consistent under both drought-stressed and non-stressed conditions from chromosomes 2B, 2D, 5B, 5B and 7A (Table 3). Under drought-stress, SPS was highly associated with eight markers located on chromosomes 6B, 2D, 2B, 5D, 1B and 4B; while under the same stress level, seven significant MTAs were recorded that were located on chromosomes 1B, 2D, 4B, 5A, 5B and 6B. Six of the markers, except for one located on chromosome 2B, one on 5A and one on 5B were consistent with the ones obtained under droughtstressed condition ( Table 3).
The B genome had most of the significant MTAs observed for this trait. The marker 6B|079.586479380|3949288|3949288 explained the highest proportion of the phenotypic variation (41%) under drought-stressed condition while a marker on chromosome 2B explained the least proportion (28.06%) of the phenotypic variation observed under the non-stressed condition. Under drought-stressed condition, the number of kernels per spike was associated with two markers located on chromosomes 2D and 4A, explaining 30.04% and 30.24% of the observed phenotypic variation, in that order. Eight significant MTAs were detected under non-stressed condition on chromosomes 2D, 6B and 7A explaining 28.06% to 36.46% of the variation of the number of spikelets per spike, respectively. Three MTAs on chromosomes 6B, 7B and 5D were considered significant (0.05 > P > 0.001) for the number of days-to-maturity, thousand seed weight and grain yield accounting for 24.03%, 23.94% and 22.57% of the phenotypic variation, respectively. Table 4 summarises the number of DArTseq markers observed for each of the nine agronomic traits evaluated under drought stressed and non-stressed conditions. Subject to further validation, these markers will be useful for marker-assisted selection for respective traits under target growing conditions. A pleiotropic locus is associated and affects the expression of more than one phenotypic trait. In this study, several pleiotropic loci were identified including the marker 5A| 084.411633690|3534155|3534155 that was associated with DTH, PHT and SPS under nonstressed condition (Table 3). Days-to-heading, PHT and DTM under non-stressed condition; SPL under drought-stressed condition; and SPS under drought-stressed condition were associated with the marker 6B|079.586479380|3949288|3949288 located on chromosome 6B. On chromosome 2D, the locus 2D|128.146584600|4021827|4021827 was associated with PHT under drought-stress condition as well as SPL, SPS, and KPS under both drought-stressed and non-stressed conditions. Plant height and SPL under drought-stressed condition were associated with the marker 1B|063.445873190|3937163|3937163 on chromosome 1B. The marker 7A|065.934336980|1118335|1118335 was associated with SPL under both drought-stressed and non-stressed conditions as well as with KPS under non-stressed condition. Additionally, 5B|000.000000000|3023157|3023157 was associated with SPL under drought-stressed and non-stressed conditions as well as with SPS under droughtstressed condition only. Spike length and SPS under drought-stressed condition were associated with the marker 2B|108.086871100|1132117|1132117, while the marker 4B| 042.180040830|1081624|1081624 was associated with SPL under drought-stressed condition only and SPS under both drought-stressed and non-stressed conditions. Further, the locus 5B|000.649324338|1209883|1209883 was associated with DTH and PHT under non-stressed condition as well as SPS under drought-stressed condition. Finally, 6B|031.043100140| 1237876|1237876 was associated with SPL and KPS under drought-stressed condition. Blast searches of the marker 6B|031.043100140|1237876|1237876 on the National Center for Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and GrainGenes (http://wheat.pw.usda.gov/GG2/blast.shtml) databases indicated that this marker has a sequence alignment that is 97% identical to the TaMFT gene that regulates seed dormancy on chromosome 3A (Nakamura et al. [37]; http://www.uniprot.org/uniprot/ A0A0K2RW47).
Out of the 65 significant marker-trait associations observed, 25 trait-specific MTAs were differentiated. Chromosome 2B had four trait specific MTAs of which one was associated with spike length under drought-stress, two with plant height under non-stressed condition and one with spike length under non-stressed condition. Traits that were represented by at least one significant trait-specific marker-trait association under either of the two water conditions Table 4

Number of MTAs Chromosomes Number of MTAs Chromosomes
Days-to-heading 4 5A (2) were days-to-heading, plant height, spike length, number of spikelet per spike, number of kernels per spike, days-to-maturity, and grain yield (Table 3).

Discussion
Understanding the genetic bases of complex traits in polyploid crops such as wheat, presents an opportunity for drought tolerance breeding. To complement the growing need for such knowledge, the current study explored the population structure and association of genomic regions with yield and yield related traits in a diverse population of drought tolerant and susceptible wheat genotypes. High heritability estimates as well as significant and positive correlations were observed among the studied traits ( [29], Table 1) confirming the value of the data in the present marker-trait association analyses. This is supported by Laido et al. [38] who reported the relevance of traits that had high heritability estimates for QTL detection.

Population structure and linkage disequilibrium
The population evaluated was grouped into nine distinct genetic structures (Fig 1). This is expected given that the genetic materials possess diverse pedigrees which were systematically developed by CIMMYT. However, the existence of common origin or parents in the pedigrees of some genotypes often results in some levels of relationship among genotypes. The results obtained from the structure analysis will be useful in tracking potential parents that could be useful for drought tolerance breeding. Thus, future studies could use a sub-sample of the genetically divergent lines from this genetic pool exhibiting farmers-preferred and quality attributes.
Most of the unique groupings identified could be explained by existence of at least one common parent in the pedigree of genotypes within each cluster. For instance, in Cluster 1, five of the six genotypes from the heat nursery (LM13, LM15, LM21, LM25 and LM29) shared the common parent PASTOR in their parentage. Also, the genotypes LM50, LM51, LM52 and LM53 found in Cluster 2 could be related due to the sharing of crosses involving HUW234 +LR34/PRINIA Ã 2// in their parentage. Similarly, the ancestral genotype SOKOLL which was common in most pedigrees of genotypes in Cluster 4 could be the source parent of some similarities in that grouping. Interestingly, all the genotypes in Cluster 5 were descendants from the parents MILAN, KAUZ and PRINIA. Likewise, seven of the genotypes in Cluster 6 had pedigrees containing ATTILA and the parent PBW343 was common in all pedigrees of genotypes in Cluster 7, which could have contributed to formation of these respective clusters.
Existence of marker pairs in LD over long distances and closely linked pairs with non-significant LD observed in the current study has been previously reported in various crop species [27,30,39]. This could reflect that LD is not static as it can be influenced by other factors such as genetic admixtures apart from the genetic or physical distance.

Marker-trait association
The present study identified 334 significant (P < 0.05) marker-trait associations (Table 3; S1  Table). This will add to previously identified genomic regions influencing similar or complimentary traits [10,15,16]. Although only those MTAs observed at P < 0.001 were considered significant in this study, the rest of these associations observed at P < 0.05 (S1 Table) may be useful for drought tolerance breeding. These MTAs could be located on regions that influence the respective traits directly or indirectly. Thus, the proportion of the phenotypic variation (R 2 > 0.2) observed for all significant markers suggests their possible influence on respective traits. In this light, the observed MTAs for grain yield, days to maturity and thousand seed weight, were considered significant at 0.05 > P > 0.001, since the traits are highly complex with low heritability [36,40]. Drought tolerance is highly influenced by genotype by environment interaction [41] which is explained by the lower number of significant MTAs observed under drought-stressed than non-stressed condition.
Several research efforts have been directed at locating genes and QTL influencing various agronomic traits to facilitate MAS in wheat improvement in the face of increased droughts along with other key production constraints [42][43][44]. The genes identified in the current population adds to the currently available pool of genetic resources and candidate genes. Some of these loci could be located on regions that were already confirmed to be housekeeping genes and QTLs for the traits under study. For instance, in the present study, significant MTAs have been identified on chromosomes that had previously been reported to house QTL for respective traits. Plant height was reported to be associated with genomic regions on chromosomes 1B [44,45], 2B [42,44,45], 5A [45], 5B [44,45] and 6B [45]. Chromosome 5D was reported to harbour QTL for grain yield [46] in agreement with the present study. Further, Peleg et al. [7] reported loci affecting the number of days-to-heading on chromosome 5A under varied drought-stress levels. In the present study, an MTA for TKW was recorded on Chromosome 7B, which was previously reported to have significant associations with the same trait using the markers Xwmc606, Xgwm537, wPt1715 and wPt2449 in a collection of tetraploid durum wheat genotypes [38]. Blast search on the NCBI database revealed that the DArTseq markers associated with DTH on chromosome 5A in the present study seems to be located on a highly conserved region since it has almost 100% sequence similarities with other regions in other crop species including Sorghum bicolor and Oryza sativa (http://blast.ncbi.nlm.nih.gov/Blast.cgi).
Typically, loci or QTL regions that influence a particular trait under stress also control the trait under non-stressed condition [45]. This could be the case with loci that influenced spike length under non-stressed condition and were consistently observed under drought-stress in the present study. Similar explanation can be presented for the markers affecting the number of spikelet per spike under non-stressed condition that were consistent under drought-stressed condition except for the locus at 5A|084.411633690|3534155|3534155. Ideally, the effects of such loci may not be influenced by the change in external environment. Such genomic regions could be useful in MAS or gene introgression when breeding for broad adaptation. On the other hand, some gene loci may influence particular traits differently under different sets of growing environments, resulting in markers or loci becoming inconsistently associated with particular traits when environmental conditions change. This has been witnessed on markers such as 1B|063.445873190|3937163|3937163 and 4B|042.901192180|1027953|1027953 that were associated with plant height and spike length, respectively, only under drought-stress.
High phenotypic trait correlations could be explained in terms of direct or indirect contribution of one trait to another. Looking into the genome, loci controlling such traits could be similar. This is evidenced by the existence of several multi-trait associations where one gene will have pleotropic effects on highly correlated traits. Dholakia et al. [47] reported that highly correlated traits are often controlled by a common QTL. For instance, the locus 2D| 128.146584600|4021827|4021827 controls several traits such as plant height, spike length, number of spikelets per spike and the number of kernels per spike; which are often highly correlated [48,49]. Such findings support the need to verify if the locus 6B|031.043100140| 1237876|1237876 associated with spike length and the number of kernels per spike, is not also linked to seed dormancy, since it shared similar sequence alignment with the region controlling the latter trait in wheat. Interestingly, chromosome 5B which reportedly harbor a region controlling several agronomic traits [10] is found to carry genomic regions associated with DTH, PHT, SPS and SPL in the present study. Some loci, however, influenced only one trait, for instance, 2B|013.546408570|977308|977308 and 2B|023.182120080|1251215|1251215 affected plant height than other traits evaluated in the present study.

Conclusions
Marker trait association is key to identifying genomic regions that are associated with phenotypic traits of breeding significance. The present study identified a total of 65 highly significant marker trait associations under contrasting water regimes (Table 4). Only one marker per trait was considered significant at P = 0.05 for grain yield, days to maturity and thousand seed weight, which had low heritability values. The markers identified in this study are useful genomic resources to initiate marker-assisted selection and trait introgression of wheat under drought-stressed and non-stressed conditions, and for fine mapping and cloning of the underlying genes and QTL. Further studies are required to validate the significant markers identified in the present study using a larger population, following the multiple loci mixed model (MLMM) as proposed by Segura et al. [50] to increase the power of association detection.