Genome-Wide Association Mapping of Yield and Grain Quality Traits in Winter Wheat Genotypes

The main goal of this study was to investigate the genetic basis of yield and grain quality traits in winter wheat genotypes using association mapping approach, and identify linked molecular markers for marker assisted selection. A total of 120 elite facultative/winter wheat genotypes were evaluated for yield, quality and other agronomic traits under rain-fed and irrigated conditions for two years (2011–2012) at the Tel Hadya station of ICARDA, Syria. The same genotypes were genotyped using 3,051 Diversity Array Technologies (DArT) markers, of which 1,586 were of known chromosome positions. The grain yield performance of the genotypes was highly significant both in rain-fed and irrigated sites. Average yield of the genotypes ranged from 2295 to 4038 kg/ha and 4268 to 7102 kg/ha under rain-fed and irrigated conditions, respectively. Protein content and alveograph strength (W) ranged from 13.6–16.1% and 217.6–375 Jx10-4, respectively. DArT markers wPt731910 (3B), wPt4680 (4A), wPt3509 (5A), wPt8183 (6B), and wPt0298 (2D) were significantly associated with yield under rain-fed conditions. Under irrigated condition, tPt4125 on chromosome 2B was significantly associated with yield explaining about 13% of the variation. Markers wPt2607 and wPt1482 on 5B were highly associated with protein content and alveograph strength explaining 16 and 14% of the variations, respectively. The elite genotypes have been distributed to many countries using ICARDA’s International system for potential direct release and/or use as parents after local adaptation trials by the NARSs of respective countries. The QTLs identified in this study are recommended to be used for marker assisted selection after through validation using bi-parental populations.


Introduction
Wheat is the most important and strategic food crop at global level in general and in the Central and West Asia and North Africa (CWANA) region in particular as the region have the highest average demand of wheat (191 kg/capita/year). The CWANA region grows wheat on 55 million hectares and produces about 112 million tons on annual basis [1]. The average wheat productivity in the region (2.5 t/ha) is very low as compared to the global average (3 t/ ha) mainly due to yellow rust, drought, and heat stresses associated with climate change. The effect of climate change is also evident on the quality of wheat as increased heat results in shriveled wheat grains [2].
In most developing countries, apart from grain yield and disease resistance, grain quality was not a strong criterion of variety selection. However, things have changed as a result of changing food habits, increasing urbanization and trends towards raising middle class society. As a consequence, some developing National Agricultural Research Systems (NARS) are critically looking for better quality varieties suiting for preparation of different end products. Identification and utilization of molecular markers for marker assisted selection would enhance the development of widely adapted and high yielding varieties with resistance/tolerance to abiotic and biotic resistance and acceptable level of end use quality [3][4][5].
Association mapping (AM) using phenotypic and genotypic data of association panels has become an important approach in identifying molecular markers (QTLs) linked to traits of interest for potential use in marker assisted selection for the fact that it enables to use diverse set of germplasm (landraces, cultivars, elite breeding lines, etc), and provides broader genomic region/allelic coverage with high resolution with-out the need to develop bi-parental mapping populations [6]. Association mapping principally uses linkage disequilibrium (LD) approaches. It is important, however, first to separate LD due to physical linkage from LD due to population structure which can be caused by many natural and artificial factors including the selection and improvement schemes in crop breeding programs [7].
Bayesian analysis using unlinked set of markers has been effectively used to determine population structure by assigning individuals to subpopulations (Q matrix) [8,9]. Clustering and scaling of populations can be used as alternative approaches to determine population structure [10]. To-date, AM has been carried out in many crops and QTLs associated to traits of interest have been identified [11]. For example, in wheat, QTLs associated to kernel size and milling quality [12], grain yield [13], high-molecular-weight glutenins [14], resistance to foliar diseases [13,15,16], Fusarium head blight (FHB) resistance [17], resistance to nodorum blotch [18] and major insect pest resistances [19] have been reported using AM approaches.
In this study, we investigated the association of approximately 3,051 polymorphic diversity array technology (DArT) markers with grain yield, yield-related and quality attributes in 120 elite winter facultative wheat genotypes in order to determine the genetic structure within these wheat genotypes and identify closely associated markers with grain yield and quality for possible use in marker-assisted selection (MAS).

Germplasm and phenotyping
A total of 118 elite facultative winter wheat (FWW) genotypes and two check varieties, Solh and Bezostaya, were used for this study (S1 Table). Genotypes were planted each in a plot size of 6 m 2 (5 m length, 6 rows at 0.2 m spacing) in two replications using alpha-lattice design in 2011 and 2012 seasons under rain-fed condition at Tel Hadya research field of the International Center for Agricultural Research in the Dry Areas (ICARDA), Syria. The same genotypes were planted each in non-replicated large strip plot of 24m 2 (20 m length, 6 rows at 0.2 m spacing) at Tel Hadya, Syria for two years (2011 and 2012) under irrigated condition. Trials were managed as per the recommended management practices. Data were recorded for days to heading, day to maturity, plant height (cm), grain yield (kg/ha), 1000 kernel weight, grain color and test weight. All analyses were carried out with GENSTAT [20] software.

Grain quality analysis
Protein content was assessed using near-infrared transmittance spectrophotometer according to the approved methods of the American Association of Cereal Chemists, AACC, Method No. 39-10 [21]. Dough water absorption (FSA), departure time (FDT), stability time (FST) and mixing tolerance index (MTI) were determined using farinorgraph (Brabender, Germany) according to AACC Method No. 54-21 [21]. White flour samples were used to determine the following rheological properties of dough biaxial extension: tenacity (P, maximum overpressure), extensibility (L, length of the curve), strength (W, deformation energy), and the configuration ratio (P/L) with the alveograph (Chopin S.A., Villeneuve la Garenne, France) following the ICC standard method No. 122 [22]. High molecular weight (HMW) glutenin subunits were determined using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) [23].
Principal component analysis (PCA) was performed on the correlation matrix, calculated on the data of the quality traits. The PCA analysis was performed using GENSTAT [20] software.

Genotyping
Genomic DNA was extracted from two weeks old pooled leaf samples collected from five plants per line. The samples were frozen in liquid nitrogen and stored at -80°C before DNA extraction. DNA extraction was carried out according to Ogbonnaya et al. [24], after which 10 μl of a 100 ng μl-1 DNA of each sample was sent to Triticarte Pty. Ltd, Australia (http://www. triticarte.com.au/) as a commercial service provider for whole genome scan using Diversity Arrays Technology (DArT) markers [25]. Three thousand and fifty one DArT markers were used to genotype the 120 wheat genotypes (S2 Table). The markers were integrated into a linkage map by inferring marker order and position from the consensus DArT map [26].

STRUCTURE analysis
The genetic structure of the 120 genotypes was investigated using 250 unlinked DArT markers distributed across the wheat genome with at least two loci on each wheat chromosome [27]. Bayesian clustering method was applied to identify clusters of genetically similar individuals using the software STRUCTURE version 2.3 [28]. A burn-in length of 10 4 cycles (to minimize the effect of starting configuration), a simulation run of 10 6 cycles, and the admixture model option were applied in the Structure program. We chose cluster values (K) ranging from 2 to 24 and six independent runs for each value in order to obtain consistent results. Additionally, the results were further confirmed by the Bayesian Information Criterion for different number of populations obtained using the adegenet package [29] for R statistical software [30].

Linkage disequilibrium
From the complete set of 1743 polymorphic markers, only 1143 markers with known position [26] were selected to perform the linkage disequilibrium analysis using TASSEL V4.3.1 software [31]. Linkage disequilibrium (LD) was estimated as squared allele frequency correlations (R 2 ), and only P-values 0.01 for each pair of loci were considered significant.

Association mapping
Grain yield, days to heading, plant height and grain quality data of the 120 elite FWW genotypes and the corresponding DArT data were used for the association mapping. TASSEL version 4.3.1 was used to perform association mapping analysis using both the General Linear Model (GLM) and Mixed Linear Model (MLM) methods. The two different models of GLM: the model with no control for population structure and relatedness (naïve model), and the model with population structure (the Q model) were used. For MLM also two models were used; the model that considers the familial relatedness between accessions (the K model), and the model that takes into account both the population structure and the familial relatedness i.e. the Q+K model [32]. The general equations for GLM and MLM are: y = Xa + e; and y = Xa + Qb+ Zu + e; respectively; where y is vector for phenotypes; a is the vector of marker fixed effects, b is a vector of fixed effects, u is the vector of random effects (the kinship matrix), and e is the vector of residuals. X denotes the genotypes at the marker; Q is the Q-matrix obtained from the STRUCTURE software and Z is an identity matrix. Both models were applied with and without considering the fixed effect of the population structure. False discovery rate (FDR) values were calculated at 0.05 according to [33]. The marker is considered significant if its P value is lower than the correspondence FDR value. However, marker alleles with P values 0.001 in both MLM and MLM-Q models were declared significantly associated with quality parameters and yield and yield related traits since none of the tested markers passed the FDR test.

Grain yield and quality performance
There was significant difference in grain yield performance among the genotypes both under rain-fed and irrigated environments. Average yield of the genotypes ranged from 2295 to 4038 kg/ha and 4268 to 7102 kg/ha under rain-fed and irrigated conditions, respectively ( Table 1).
The average yield of the check cultivar (Solh) was 3690 and 5823 kg/ha under rain-fed and irrigated conditions, respectively. The most commonly grown cultivar, Bezostaya, yielded 5200 kg/ha under irrigated conditions. Among the top 20 elite genotypes indicated in Table 2, G19, G30, G56, G70, and G104 out-yielded the check cultivar Solh both under rain-fed and irrigated conditions. G25 and G21 with yield levels of 4038 and 7102 kg/ha are the highest yielding genotypes under rain-fed and irrigated conditions, respectively (Table 2, S1 Table) Significant differences were also observed in days to heading, maturity and plant height under both irrigated and rain-fed conditions. Mean days to heading ranged from 138-155 days under rainfed conditions and 145-165 days under irrigated conditions. Similarly, mean plant height ranged from 60-93 cm and 77.5-115 cm under rain-fed and irrigated conditions, respectively.
The analysis of the rheological behavior of the dough through the alveograph showed that the check Bezostaya performed better in dough strength, dough tenacity and protein content as compared to the 118 FWW genotypes. The top 20 high yielding genotypes showed a very close performance in quality traits to the best checks. Additionally, the best 20 lines showed significantly higher (p<0.05) average grain protein content, Farinograph development time and stability time than the other 98 FWW genotypes ( Table 2).
Principal Component Analysis (PCA) of the 12 quality parameters in 120 FWW genotypes indicated that protein content, FDT, FST and W are very important traits in discriminating the genotypes ( Fig 1A). Based on the results obtained, a wide range of quality traits combinations was found among the set of 120 genotypes ( Fig 1B). Bezostaya, G87, G41, G30, and G42 are situated in the positive sense of the protein content, FDT, FST, FAB and W vectors indicating that they performed particularly well at these quality traits (Fig 1B and 1C).

DArT markers statistics
All genotypes were tested with 3,051 DArT markers. A total of 1734 DArT markers were selected for analysis due to their polymorphism. Five hundred and seventy nine markers were distributed on the A genome, while a total of 708 and 316 polymorphic markers were distributed on the B and D genomes, respectively. The position of 131 polymorphic markers was unknown. The average P value, call rate and polymorphism information content (PIC) for all the markers was 78.5, 0.3 and 90.3, respectively.     Linkage disequilibrium was calculated separately for locus pairs within the same chromosomes and between chromosomes. There were 41192 (6.9%) inter-chromosomal pairs of loci showing significant LD (p < 0.01), 1715 (4.2%) of which had R 2 > 0.2. Of the intra-chromosomal locus pairs, 9917 (24.8%) had a significant LD of which 5044 (50.9%) had R 2 > 0.2.
Intra-chromosomal locus pairs have a higher mean R 2 value (0.10) than inter-chromosomal locus pairs (0.02). The scatter plots of LD (R 2 ) as a function of the inter-marker distance (cM) within the same chromosome for all genotypes indicated a clear LD decay with genetic distance (Fig 4). LDs with R 2 > 0.2 extended to distances up to 35 cM suggesting that the mapping resolution using these genotypes would generally be well below 35 cM. Genome wide R 2 estimates declined rapidly from 0.58 for markers with 0 interval distance to 0.13 within 5 cM of genetic distance across all chromosomes.

QTLs associated with yield and yield related traits
DArT markers significantly associated with grain yield and yield related traits under rain-fed and irrigated conditions were identified (Table 3, Fig 5).
The DArT markers wPt0298 (2D), wPt3509 (5A) and wPt8183 (6B) were significantly associated with yield under rain-fed conditions during the 2011 season. During the 2012 season in rain-fed condition, only wPt4680 (4A) was significantly associated with yield explaining about 12% of the variation. Based on the two year's mean grain yield data under rain-fed conditions, wPt731910 (3B) was found to be associated with grain yield with R 2 value of 9%. Under irrigated condition during the 2011 season, only tPt4125 on chromosome 2B was significantly associated with yield explaining about 13% of the variation. A total of 15 significantly markertrait associations were found with days to heading in rain-fed and irrigated conditions during 2011 and 2012 seasons and corresponding averages. During the 2011 season under rain-fed condition, wPt3761 on chromosome 3B and wPt2938 on chromosome 3A were significantly correlated with days to heading covering 17 and 14% of the variation, respectively. Similarly,  markers significantly associated with plant height both under rain-fed and irrigated conditions were identified (Table 3).

QTLs associated with grain quality traits
The implementation of the MLM using the Q+K model showed that out of the 1734 DArT markers, only 20 markers showed significant association (p 0.001) with quality traits (Table 4, Fig 5). Seven of these markers were located on the A genome while 9 and 4 markers were located on the B and D genomes, respectively. Four of the 20 markers identified showed association with two quality traits resulting in 24 marker-quality trait associations. Out of the total number of markers significantly associated with grain quality, rPt7987 and wPt4487 were present on both 4A and 7A chromosomes; two DArT markers do not have known positions, while 16 DArT markers with known position were associated with ten quality parameters on 9 different chromosomes (Table 4). A total of four QTL for dough extensibility (L); three QTLs for each of dough strength (W), configuration ratio (P/L), thousand kernel weight (TKW) and test weight (TW), twoQTLs each for farinograph departure time (FDT), particle size index (PSI) and protein content and one QTL each for mixing tolerance index (MTI) and tenacity (P) were identified (Table 4). Markers wPt2607 (5B) and wPt3327 (3B) were significantly associated with protein content covering 16 and 15% of the variations, respectively. Three markers: wPt733835 (1D), wPt1482 (5B) and wPt3177 (1B) were associated significantly with alveograph strength (W) with R 2 values of 14% each. Marker wPt742908 with unknown position showed significant association with P/L and P while marker wPt3076 on 5B showed association with P/L and L ( Table 4).

Discussion
Combining yield potential with drought tolerance As water is becoming scarce even in the irrigated areas, ICARDA's germplasm development approach aims to combine high yield potential with drought tolerance so that wheat genotypes targeted for irrigated areas can cope with temporary drought periods. Similarly, this approach enables the wheat breeding program to minimize and maximize yield gains during drought and good seasons, respectively, for the rain-fed production system. To this end, the wheat breeding program at ICARDA in collaboration with the International Winter Wheat Improvement Program (IWWIP) have made rigorous efforts and identified elite facultative/winter wheat genotypes which combines high yield potential and drought tolerance with acceptable grain quality and resistance to yellow rust [15]. The outstanding performance of these genotypes both under rain-fed and irrigated environments indicates the wide adaptation of the genotypes. Wide adaptation of wheat genotypes have been reported by many authors [2,[49][50][51][52]. In this breeding approach, focus is on empirical selection of genotypes that grow faster and establish complete canopy in winter under lower evaporative demand and complete the cropping cycle as early as possible [2].
The quantity and quality of the gluten proteins are major factors determining wheat enduse quality. Gluten proteins are the main components of the gluten matrix and play the main role defining its properties [53]. Among the gluten proteins, high molecular weight glutenin subunits (HMW-GS) are of particular interest in bread wheat due to their large influence over the rheological properties of dough [23,54,55]. The present study showed that the elite genotypes tend to carry HMW-GS that promote dough strength and improve end-use quality. It has been established that low proportion of the null alleles encoded in Glu-A1 have a negative effect on W [56,57]. Similarly high proportion of 7+8, 17+18, 13+16 alleles on Glu-B1 [55,56] and the 5+10 HMW-GS on Glu-D1 [58][59][60] have positive effect on dough quality. Unlike most of the elite genotypes, the check cultivars, Solh and Bezostaya, possess HMW-GS that promote dough strength and end-use quality. Bezostaya´s exceptionally high protein content, the second main factor controlling the gluten matrix properties, has probably contributed to its high quality values. The grain quality result in the present study is in line with the negative relationship between grain yield and grain protein content reported in previous studies [34,61,62]. Increases in grain yield are usually accompanied by decreases in grain protein as a result of a dilution of nitrogen compounds when carbohydrate deposition increases during photosynthesis [34].

Marker-Trait Associations
In the present study, several markers associated with grain and dough quality attributes were identified. Sixteen markers were found to be associated with yield or yield related traits. From these, six markers on chromosomes 2D, 2B, 3B, 4A, 5A, and 6B were directly related to grain yield. Consistent QTL's associated to yield on chromosome 2D have previously been reported [13,37]. The photoperiod response-related gene Ppd-D1 and the reduced height gene Rht8 were previously mapped on chromosome 2D and have a main effect on yield and plant adaptation [63]. Li et al. [35] also reported a QTL at 6B associated with grain yield. Major QTLs associated with grain yield on chromosome 4A were identified in cultivar Dharwar-Dry [64]. Lopes et al [43,45] also identified major QTLs on 6A in Krichauff from the Berkut/Krichauff population, and on 7A from RAC875/Excalibur as well as other QTLs on 1B, 3B, 4A and 4B from Seri/Babax population. In our study, five markers associated with plant height were identified in chromosome 2B, 4B, 6B and 7A. The markers in 2B are probably linked to the same QTL since they are at close distance from each other. The five QTL's associated with plant phenology in this study were located on chromosomes 3A, 3B and 5B. The group 3 chromosomes have been reported to be associated with vernalization (3B) and earliness per se (3A) [38]. The QTL found in chromosome 5B is probably linked to the Vrn-B1 gene reported in the same chromosome.
Our results indicate the presence of both common and environment-specific QTL in our elite winter wheat germplasm for yield, days to heading and height. Those QTL will facilitate for better planning for future crosses depending on the purpose of the breeding program and the targeted environments. Moreover, the validation and pyramiding of QTL detected under a range of environmental conditions such as the ones mapped in chromosome 2B, 3A, 3B and 5B will be major aims in our future research.
Out of the 24 marker-quality trait associations identified, two involved grain protein content. These markers, wPt3327 and wPt2607, on chromosomes 3BL and 5BL, showed no significant association (p>0.05) between them and TKW or yield performance at any environment, suggesting that these QTL's may not have a negative impact on these traits. Heo and Sherman [65] also identified two QTL's on 3BL and 5BL related to grain protein content in a Choteau by S-Yellowstone recombinant inbred line (RIL) population involving a cross between a winter wheat, donor of both favourable alleles and different from the varieties involved in the present study, and a spring wheat variety.
Fourteen marker-trait associations were identified regarding mixing and rheological properties. Some of them involved the same marker associated with different traits, probably due to the high correlation between some of the mixing and rheological properties [66]. Six markers were located on the three homeologous chromosome 1, i.e. 1A, 1B and 1D. Two of these 6 markers (wPt3177 and wPt733835) were associated with traits related to dough rheological and mixing properties like dough strength and MTI while the four markers on chromosome 1A showed association with test weight and thousand kernel weight. The large effect of the Glu-1, Glu-3 and Gli-1 loci complexes over dough properties is widely known [58,60] and their physical position in chromosomes 1A, 1B and 1D [41,42,46,66] suggest that they may be behind some of these associations. Also, some QTL have been described in the homeologous group 1 chromosomes associated to rheological [66] and mixing properties [66,67]. The four markers identified on chromosome 1A associated with test weight and thousand kernel weight are probably linked to different QTL since independent QTL for seed morphology in chromosome 1A have already been reported [44,68]. In the present study, markers associated with rheological and mixing properties were found on chromosomes 2B, 4A/7A, 5B, 5D, 6B and 7D. This is in line with previous reports [66,67,69,70]. Further research using bi-parental populations is recommended in order to validate and determine the similarity of the QTL identified from the present and previous studies. The elite genotypes with high yield potential, drought tolerance and acceptable grain quality traits are recommended for potential direct release and/or use as parents after local adaptation trials by the NARSs of respective countries.