Linkage Disequilibrium and Genome-Wide Association Mapping in Tetraploid Wheat (Triticum turgidum L.)

Association mapping is a powerful tool for the identification of quantitative trait loci through the exploitation of the differential decay of linkage disequilibrium (LD) between marker loci and genes of interest in natural and domesticated populations. Using a sample of 230 tetraploid wheat lines (Triticum turgidum ssp), which included naked and hulled accessions, we analysed the pattern of LD considering 26 simple sequence repeats and 970 mostly mapped diversity array technology loci. In addition, to validate the potential for association mapping in durum wheat, we evaluated the same genotypes for plant height, heading date, protein content, and thousand-kernel weight. Molecular and phenotypic data were used to: (i) investigate the genetic and phenotypic diversity; (ii) study the dynamics of LD across the durum wheat genome, by investigating the patterns of LD decay; and (iii) test the potential of our panel to identify marker–trait associations through the analysis of four quantitative traits of major agronomic importance. Moreover, we compared and validated the association mapping results with outlier detection analysis based on population divergence. Overall, in tetraploid wheat, the pattern of LD is extremely population dependent and is related to the domestication and breeding history of durum wheat. Comparing our data with several other studies in wheat, we confirm the position of many major genes and quantitative trait loci for the traits considered. Finally, the analysis of the selection signature represents a very useful complement to validate marker–trait associations.


Introduction
Determination of the genetic basis of agronomic traits has been one of the major scientific challenges in the process of crop improvement.In plants, linkage mapping is currently the most common approach for detection of quantitative trait loci (QTLs) that correspond to agronomic traits.In linkage mapping, linkage disequilibrium (LD) is generated by establishing populations that are derived from bi-parental crosses, and co-segregation of alleles of mapped marker loci and phenotypic traits allows the identification of linked markers.In durum and bread wheat, QTL mapping has been used frequently to dissect out the genetic architecture of agronomic and quality traits using doubled haploids, recombinant inbred lines, and backcross populations.Several main-effect QTLs have been identified for yield [1][2][3][4], grain protein content [5][6][7][8][9][10], thousand-kernel weight [6][7][8], disease resistance [11][12][13][14], and flowering time [15][16][17][18].Due to the restricted number of meiotic events that are captured in a biparental mapping population, the genetic resolution of QTL mapping often remains confined to a range of 10 cM to 30 cM [19][20].Moreover, linkage analysis can only sample a small fraction of all of the possible alleles in a population from which the parents originated.
To overcome the constraints inherent to linkage mapping, a complementary strategy is based on the correlation of genotype with phenotype in domesticated and natural populations, and this is commonly referred to as association mapping (or LD mapping), where the focus turns from families to populations.The principle that underlies this approach is that LD tends to be maintained between linked loci over many generations.High LD is expected between loci in tight linkage, while recombination should have eliminated LD between unlinked loci [21].Linkage disequilibrium mapping exploits all of the historical recombination events that have occurred in the population from the origin of the markertrait association [22][23][24][25].Linkage disequilibrium mapping was first introduced in genetic mapping studies in human [26][27], and was then later considered for plant studies [19,[28][29][30][31][32][33].
A major problem of association mapping is the high probability of false-positive associations between a marker and a trait (type I error) [34].However, for LD mapping to reduce the risk of false positives, more complex statistical tools are required than those used for linkage mapping.The Bayesian model-based cluster method was developed to infer population structures in complex pedigree populations [35], whereby a set of random markers is used to estimate the population structure (Q), which is incorporated into a general linear model (GLM) for testing associations.This has been widened to a mixed linear model (MLM), which also includes the kinship relationships of the samples, and offers improved control of type I error rates, which accounts for the population structure and relatedness in association mapping [36].
In addition, a useful strategy to reduce the risk of false positives is the use, when appropriate, of independent validation of the detected associations (e.g., by linkage mapping).In some cases, to validate the results of mapping studies and to identify loci/genes or genomic regions that are involved in the genetic control of important adaptive traits, a promising approach is to combine the association mapping with an analysis of the signature of selection based on the structure of the molecular diversity, without the use of phenotypic data [37][38].The success of LD mapping depends on the quality of the phenotypic data, the population size, and the degree of LD in a population [28][29].The level of LD that characterises the species and the population used [22,25,30,39] is a critical aspect for the planning of association and/or population genomics studies, because this determines the resolution of the association mapping.When LD is low, a candidate gene approach is preferred, because in this case, too many markers will be needed to perform a whole genome scan to cover the variation in the entire genome.On the other hand, when LD is moderate/high, a whole genome scan can be more appropriate.An ideal situation would be to use different populations distinguished by variable LD levels.
Durum wheat (Triticum turgidum ssp.turgidum var.durum) is of significant commercial importance, because of its end-use products: pasta, cous-cous, bread and bulgur.For this species, the combined effects of domestication and modern plant breeding have contributed to its greatly reduced genetic diversity in comparison to the wild forms [40][41][42][43].In this context, landraces, wild forms (Triticum turgidum ssp.), and other related wild species might have crucial roles in breeding programmes, given their wide variability in terms of their many phenological, morphological, abiotic, biotic and quality traits.Here, the data used are from an association-mapping panel that is composed of 230 tetraploid wheat accessions (elite varieties, wild and domesticated accessions) that have been fingerprinted according to 26 simple sequence repeats (SSRs) and 970 diversity array technology (DArT) markers, and evaluated for plant height (PH), heading date (HD), protein content (PC), and thousand kernel weight (TKW).These data are used to: (i) investigate the genetic and phenotypic diversity; (ii) study the dynamics of LD across the durum wheat genome, by investigating the patterns of LD decay; and (iii) test the potential of our panel through the analysis of four quantitative traits of major agronomic importance, to identify marker-trait associations (MTAs) using both GLMs and MLMs.

Plant material and genotyping
This study evaluated a tetraploid wheat (T.turgidum L., 2n = 4x = 28; AABB genome) collection of 230 inbred lines (Table 1), as reported by Laido `et al. [43].The whole collection consists of 128 durum wheat varieties plus 102 wild and domesticated accessions.These durum wheat varieties are indigenous and exotic landraces, and modern cultivars, most of which are materials that are representative of the Italian durum wheat breeding programmes over the last 100 years.Table 1 gives the number of accessions for each sub-species, and Laido `et al. [43] provided a detailed list of the genotypes (number/name, year of release, country, pedigree).The whole tetraploid wheat collection, which also includes wild and domesticated accessions, was assembled to cover the phenotypic variability for the breeding traits that were evaluated in the present study.The LD analysis was conducted on the whole collection, on the 128 durum wheat varieties (henceforth referred to as the durum sub-sample), and on two subgroups identified through the population structure analysis (Q1, Q2) as reported by Laido `et al. [43].
As described in Laido `et al. [43], for each accession, the leaf tissue of the plants that represented the prevalent biotype of each accession was used for DNA extraction, and the whole collection was genotyped according to the 26 SSR and 970 DArT markers used.Thus, in the present study, we used the dataset developed by Laido `et al. [43], and we exploited the mapping information obtained by Marone et al. [44], which was used to determine the map positions of the SSR and DArT markers.
Alleles that occurred at low frequency (f,0.05) were excluded from all of the present analysis.The genetic diversity, or heterozygosity (H), was investigated within the whole collection and within the durum sub-sample for each chromosome, by calculation of the unbiased estimator of genetic diversity (H E [45]).To measure the relative loss of genetic diversity in the whole collection versus the durum sub-sample, according to Vigoroux et al. [34], we used the parameter DGD = 1-(H E(wc) /H E(durum) ), where H E(wc) and H E(durum) are the genetic diversity for the whole collection and for the durum sub-sample, respectively.

Linkage disequilibrium
The LD between the mapped DArT markers on the durum wheat consensus linkage map constructed by Marone et al. [44] was assessed for each sample only on markers that showed frequencies ranging from 0.05 to 0.95.The LD (allele frequency correlation, r 2 ) estimates [46] between marker pairs were obtained  using the TASSEL 3.0.115software.The significance of the pairwise LD (P values) was computed using 1,000 permutations.The LD statistics were calculated per chromosome, and subsequently aggregated over all of the chromosomes of the A and B genomes.
The LD was calculated separately for loci on the same chromosome (intrachromosomal pairs) and for unlinked loci (interchromosomal pairs).From the unlinked loci, a critical value of r 2 was estimated following Breseghello and Sorrells [47], by root transforming the r 2 values and taking the 95% percentile of this distribution as the threshold beyond which the LD is probably caused by a real physical linkage.The intrachromosomal r 2 values were plotted against the genetic distance, and a smooth line was drawn using second-degree locally weighted polynomial regression (LOESS) [48], obtained with the statistical programme R (http:// www.r-project.org), to determine at which distance the LOESS curve intercepts the critical r 2 , to see how rapidly the LD decay occurs.The LD analysis for the mapped markers on the durum wheat consensus linkage map was performed separately for the whole collection, the durum sub-sample, and the two subgroups identified with the genetic structure analysis (Q1, Q2).The LD was also estimated among the unmapped MTAs and mapped markers, although only for the whole collection.

Phenotypic evaluation and association analysis
To test the potential of our panel to perform association studies, we investigated four quantitative traits of major agronomic importance.In particular, the tetraploid wheat collection of 230 inbred lines was evaluated for PH, HD, PC and TKW in a field trial conducted in Foggia (41u28 N, 15u32 E; altitude, 75 m a.s.l.) over one season (2007/2008).The field trial was sown on 20 December, 2007, in clay-loam soil (typic Chromoxerert), in a randomised block design with four replicates.The plots were 2.0 m61.5 m, with a seed density of 350 germinated seeds/m 2 .According to the standard agronomic practices in the study area, fertiliser applications were made at pre-sowing (36 kg/ha N, 90 kg/ha P 2 O 5 , as ammonium biphosphate) and top dressing (52 kg/ha N, as urea) at Zadoks growth stage 2.2 and 3.1 [49], respectively.The plants were harvested after physiological maturity, on 10 June, 2008, and the grain was ground into whole meal in an experimental mill (Tecator Cyclotec1093), and used for the grain quality analysis.
The HD is reported as the number of days from 1 April, 2008, until the ears of ca.50% of the tillers had emerged from the flagleaf sheaths by approximately half of their length (growth stage 55 of the Zadoks scale; [49]).The PH (cm) was measured during the milk waxy maturation stage, when the maximum height was achieved, from the ground to the ear tip (excluding awns), on five main culms per plot.Following determination of the grain nitrogen content according to the standard Kjeldhal method, the PC was then calculated by multiplying the Kjeldhal N value by 5.7, which was then expressed as a percentage on a dry-weight basis.The TKW (g) was calculated from the mean weight of three independent samples of 1000 grains for each plot.
The TASSEL software, version 3.0.115,was used to calculate the associations between the markers and each trait in turn.Both GLMs and MLMs were used in the association analysis for the whole collection and for the durum sub-sample.For the GLMs, genotypic data, phenotypic data and the Q matrix were integrated as covariates to correct for the effects of population substructure, which were determined for the whole collection (K = 2 with SSR markers) and for the durum sub-sample (K = 3 with SSR markers).In the MLMs, the Kinship matrix (K matrix) was used in addition to the genotypic data, phenotypic data and Q matrix, to correct for both population and family structure.In total, the genotypic score was available for 984 loci in the whole collection (958 DArT markers and 26 SSR loci), and for 871 markers in the durum subsample (845 DArT markers and 26 SSR loci).To generate a marker similarity matrix containing all of the lines (K matrix) using the TASSEL software, the trimmed marker datasets were used.TASSEL calculates the kinship as the proportion of alleles shared between each pair of lines.Once this matrix was calculated, the numbers were rescaled, so that they were between 0 and 2 [38].The critical P values for the assessment of the significance of the MTAs were calculated separately for each trait, based on a false discovery rate (FDR) of 0.05, which is defined as the expected proportions of the true null hypotheses that are rejected [50].
To compare our data with data obtained in other studies, we considered the most recently published information on genes and QTL mapping for the traits (PH, HD, PC, TKW).All of the significant MTAs, identified using the MLM model, located on the durum wheat consensus map within short map intervals (5-10 cM) were grouped into a single putative QTL.
As reported in Laido `et al. [43], the population differentiation was also assessed by analysis of molecular variance (AMOVA) using the ARLEQUIN software, version 3.5 [51].Through the Fst-outlier detection method, and using 100,000 simulations, 211 loci under selection were identified, of which 109 markers were mapped on the durum wheat consensus map and used to validate the MTAs, or the region containing the MTAs, that were identified for each trait by association mapping.

Diversity and linkage disequilibrium
From the 970 polymorphic DArT markers, 592 were genetically mapped on the durum wheat consensus linkage map constructed by Marone et al. [44].Of these, none were excluded, because the minor allele frequencies in the whole collection and in the durum sub-sample were .0.05.Although these only represented half of the markers used in the present study, they covered an estimated 2,247 cM, which represents approximately 73.5% of the durum wheat consensus map [44].These DArTs were distributed over all of the chromosomes, with an average spacing of 4.0 cM (Table 2).The distribution of DArTs varied within and among the chromosomes, with a maximum of 75 markers on chromosome 3B, and a minimum of 12 markers on chromosome 5A (Table 2).
To understand the overall genetic diversity among the whole collection and the durum sub-sample, we evaluated the genetic diversity (H E ) and the loss of diversity for each chromosome (Table 2).The results obtained with 592 DArT markers indicate that the diversity of the durum sub-sample represents 94% of the whole genome, thus with a loss of diversity of 6%, with H E of 0.42 and 0.39 for the whole collection and for the durum sub-sample, respectively.Chromosome 1A shows the highest values of loss of diversity (DH E = 0.23), with H E values for the whole collection and the durum sub-sample, of 0.40 and 0.31, respectively (Table 2).
Linkage disequilibrium analysis was performed for the whole collection, the durum sub-sample, and each of the two subgroups identified with the genetic structure analysis using the SSR markers (Q1, Q2).Pairwise LD was estimated using the squaredallele frequency correlations (r 2 ).As reported in Figure S1, we observed that the LD varies along the chromosomes, with regions of high LD interspersed with regions of low LD.The LD pattern in the whole collection was assessed based on the 174,936 pairwise combinations of 592 DArTs.Interchromosomal LD was investigated for the set of 162,635 pairs of non-syntenic loci (i.e., loci located on different chromosomes), with an average r 2 of 0.02 Table 2. DArT markers coverage, distribution, index of diversity and loss of diversity among the whole collection and durum sub-sample, with the distribution and index of diversity in the two Q groups identified in the whole collection.(Table 3).Based on permutation analysis, the portion of DArT pairs that showed significant LD was 22.7%, at P,0.01.
The plots of the LD estimates (r 2 ) as a function of genetic distance in centiMorgans (cM) indicated a clear decay of the LD with genetic distance, and also suggested that the LD is dependent on the population structure (Figure 1).As expected, pairs of physically linked (syntenic) loci (12,301 pairs in total, with an average r 2 of 0.08) showed LD estimates that were largely influenced by their physical distances (Table 4; Table S1).Due to the presence of significant LD effects that would be determined by population structure, assuming an absence of epistatic selection between the loci on different chromosomes, the ad-hoc critical thresholds for the LD estimates between syntenic loci were obtained from the distribution of the non-syntenic loci.According to Breseghello and Sorrells [47], the 95 th percentile of the LD estimate distribution was used as the critical threshold to better discriminate the LD that was most likely due to physical linkage.In the whole collection, a critical value of r 2 (basal LD) was calculated from the interchromosomal LD analysis (r 2 = 0.120), beyond which the LD is assumed to be caused by physical linkage (Figure 1a).Four classes of marker pairs were defined on the basis of the map positions determined by Marone et al. [44]: class 1 (tight linkage; distance, ,10 cM); class 2 (moderate linkage; distance, 10-20 cM); class 3 (loosely linked; distance, 20-50 cM); and class 4 (independent pairs; distance, .50cM).The percentages of significant loci pairs decreased with the distance between the loci; 77.7% of the 2,717 pairs of the tightly linked markers (within 10 cM) showed significant LD at P,0.01, with an average r 2 of 0.26 (Table 4; Table S1).For the remaining classes, the LD varied from 52.8% (class 2) to 25.8% (class 4), decreasing at increasing distances.Figure 1a shows the scatterplot of the distributions of the r 2 values as a function of the genetic distances between the syntenic loci pairs for the whole collection.The point at which the LOESS curve intercepts the critical r 2 was determined as the average LD decay of the population.Based on these criteria, the intrachromosomal LD decayed between 5 cM and 20 cM for the individual chromosomes (considering only chromosomes with .50markers) (Figure S1) and the average LD decay of the whole genome was 14 cM (Figure 1a).Also, for the portion of r 2 values that exceeded the basal LD level, the LD decreased from 60.7% of significant pairwise comparisons (class 1; mean r 2 = 0.41) to less than 6.0% (class 4; mean r 2 = 0.13).All of the loci pairs that are in complete LD are spaced at a genetic distance ,10 cM, except for two loci pairs at genetic distances from 10 cM to 20 cM.

Patterns of linkage disequilibrium within subgroups and the durum sub-sample
An overview of the interchromosomal LD in the two subgroups and in the durum sub-sample is given in Table 3.The mean of the interchromosomal LD is much lower than that observed for the intrachromosomal LD.The Q1 and Q2 groups that were obtained from the analysis of the population structure show higher r 2 and lower numbers of significant pairs, compared to the whole collection.The Q1 group shows a mean r 2 and number of significant pairs similar to the durum sub-sample (Table 3).At the intrachromosomal level, the LD pattern for the durum sub-sample was assessed based on 12,301 pairwise combinations, with an average r 2 of 0.12 (Table 4; Table S1).Based on permutation analysis, the portion of the DArT pairs that show significant LD is 37.2%.The amount of significant LD and the r 2 in the durum subsample and in these two subgroups are listed in Table 4 and Table S1.In all of the classes, the values for the durum sub-sample and for the Q1 group are very close to the values for the whole collection, in terms of the number of pairs that show significant LD, whereas the mean r 2 are much higher in all of the classes based on the genetic distances.Furthermore the number of pairs in the total LD is much higher in the durum sub-sample and the Q1 group (279 and 280 loci pairs, respectively), compared to 192 in the whole collection and 201 in the Q2 group (Table 4; Table S1).The pairs in total LD have mean distances of 0.80 cM and 0.89 cM in the durum sub-sample and the Q1 group, respectively, whereas in the Q2 group and in the whole collection, this spans over mean distances of 0.61 cM and 0.53 cM, respectively.At 0.151 and 0.180, the critical r 2 for the Q1 (Figure 1c) and Q2 (Figure 1d) groups, respectively, are higher compared to the whole collection (critical r 2 = 0.120) (Figure 1a) and the durum sub-sample (critical r 2 = 0.142) (Figure 1b).The durum sub-sample and the Q1 group follow the same pattern in the decay of the LD, and the mean LD decays at ,18 cM (Figure 1b, c); the Q2 group falls below the 0.180 threshold from a distance of ,5 cM (Figure 1d).

Phenotypic data
The accessions at the experimental farm of CRA-CER Foggia, Italy, were evaluated for the key traits of PH, HD, PC and TKW.For all of these traits investigated, a clear quantitative genetic basis was evident, both for the whole collection and for the durum subsample (see phenotypic distributions in Figures S2, S3).The ANOVA analysis shows that the genotypic variance is significantly .0 for all of the traits (P,0.001).The hereditabilities ranged from 0.67 to 0.91 in the whole collection, and 0.29 to 0.84 in the durum sub-sample, which indicates the robustness of the data and the low error rate.All of the statistic parameters for the traits are given in Table 5, for PH, HD, PC and TKW.

Association mapping
The marker-phenotype association analysis was based on the polymorphisms present in 26 SSRs, and for DArTs, at 958 and 845 loci in the whole collection and in the durum sub-sample, respectively.Each locus was characterised by the presence of at least one non-rare allele, with a frequency between 0.95 to 0.05.
Both GLM and MLM analysis were compared for all of the traits.A cut-off of 0.05 for the false discovery rate [50] was used to identify the MTAs.Each MTA represents a significant markertrait association, and for each marker, a maximum of ''n'' MTAs can be identified, where ''n'' is the number of traits under study.The number of significant MTAs with the GLM is much greater than with the MLM (Table 6).Indeed, for the whole collection, there were 1,296 and 221 MTAs for the GLM and MLM, respectively, while in the durum sub-sample, there were 464 and 191 MTAs identified with the GLM and MLM, respectively.Thus, with the MLM, the MTAs were reduced by 82.9% in the whole collection, and by 58.8% in the durum sub-sample.There are 161 MTAs in the whole collection and 160 in the durum subsample that are shared by the two models (Table 6).The number of unique (significant only for one model) MTAs for the GLM was 87.6% (whole collection) and 65.5% (durum sub-sample), compared to only 27.1% and 16.2%, respectively, for the MLM (Table 6).

Trait mapping
Regarding the significant associations, all of the data for each MTA are shown in Table S2.For all of the traits, only the significant MTAs mapped on the durum wheat consensus map that were derived from the MLM are shown in Figures 2 and 3, and reported in Tables 7, 8, 9, 10, for PH, HD, PC and TKW, respectively.
There were 51 loci (11 SSRs and 40 DArTs) identified as MTAs for the same trait in both the whole collection and the durum subsample, of which 30 map on the durum wheat consensus map.Of these 51 MTAs, 47.1% (24 loci) are specific for a single trait (PH, 5; HD, 6; PC, 5; TKW, 8 loci), while 27 are multi-trait MTAs.Overall, PH is involved in the highest number of multi-trait MTAs (15) identified in the two samples, followed by HD (13), TKW (12) and PC (11).

Mapped MTAs
There were 128 mapped MTAs (Table S2; Figures 2 and 3) in the whole collection (corresponding to 111 loci) and 110 in the durum sub-sample (corresponding to 94 loci).For the whole collection, out of the 111 loci (14 SSRs and 97 DArTs) for which there is at least one MTA, 86.5% (96 loci) are associated with only one trait, while the rest consist of associations of up to three traits (15 loci).In the durum sub-sample, out of the 94 loci (11 SSRs and 83 DArTs), 85.1% (80 loci) are single-trait MTAs.Among these 111 (whole collection) and 94 (durum sub-sample) loci, 16 and 15, respectively, are putatively under selection.
The MTAs are located in all of the chromosomes of both genome A and B. The highest number of MTAs identified in the two samples are on chromosome 3A (32 MTAs for 17 loci), followed by chromosome 4A (30 MTAs for 18 loci) and chromosome 2B (27 MTAs for 17 loci), with the lowest on chromosome 1A (6 MTAs for 5 loci) and chromosome 4B (5 MTAs for 2 loci).Considering the homoeologous groups, group 3 contains the highest number of MTAs (53 for 36 loci), followed by group 2 with 38 MTAs (27 loci); with the lowest numbers for groups 5 (25 MTAs) and 6 (27 MTAs), for 14 and 22 loci, respectively.Altogether, the A-genome has a higher number of MTAs (127), for 83 loci, compared to the B-genome with 122 MTAs, for 84 loci.
The MTAs that are identified for each trait are presented in Tables 7, 8, 9 and 10.Concurrence between our association mapping results in tetraploid wheat and those reported in previous wheat association mapping and linkage mapping studies was observed in several cases.To compare our data with data obtained in other studies, we considered the most recently published information on genes and QTL mapping for these traits (PH, HD, PC, TKW) and for the domestication syndrome, which includes genes for photoperiod insensitivity (Ppd), vernalisation (Vrn), earliness per se (Eps), dwarfism (Rht), grain protein content (Gpc), and domestication syndrome traits (Tg, Br, sog, Q locus).
For PH, 26 (whole collection) and 22 (durum sub-sample) MTAs were identified (Table 7).Overall, these MTAs (which correspond to 38 loci) detected 25 putative QTL regions and are located on 9 different chromosomes: 1A, 4B, 5B (1 marker each), 2B (2 markers), 3A (3 markers), 5A (5 markers), 7B (7 markers), 1B (8 markers) and 4A (10 markers).The phenotypic variation of each of these loci ranged from 2% (13 loci) to 9% (1 locus).Overall, 10 loci Table 4. Overview of LD in the intrachromosomal pairs for the whole collection, the durum sub-sample, and the Q1 and Q2 groups, and for genomes A and B. are confirmed as MTAs for PH in both the whole collection and the durum sub-sample, and 4 loci are putatively under selection [43].As expected, the PH-related QTL on chromosome 4B that was tagged by Xgwm495 (84.0 cM), which has a phenotypic variation of 3% (whole collection) to 5% (durum sub-sample), was identified in both the whole collection and the durum sub-sample in the genomic region harbouring the growth habit gene Rht-B1 [33].At about 5 cM from this MTA, Zhang et al. [52] reported a PHrelated QTL associated with the Xgwm513 (78.8 cM) SSR locus.Three QTLs that show relatively high R 2 values (from 6% to 9%) are located on chromosome arms 1BS, 4AL and 7BL, and are tagged by the markers wPt-0308, Xgwm937 and wPt-5138, respectively (Table 7; Figures 2 and 3).

Dataset
The first putative QTL was identified by the DArT marker wPt-0308, which is also putatively under selection [43], and is about 14 cM from another PH-related QTL, tagged by the Xcfd15 SSR marker, associated in both the whole collection (R 2 , 2%) and the durum sub-sample (R 2 , 4%).The Xcfd15 marker is also associated to multi-trait MTAs for PH and HD.In this chromosome region, Maccaferri et al. [32] identified QTLs for yield components (TKW and kernel number spike [KNS]) using association mapping on a panel of 189 durum wheat varieties.
The second putative QTL is associated to the Xgwm937 SSR marker, and was identified in both the whole collection and the durum sub-sample, with R 2 of 6% and 9%, respectively.This marker is also associated to a multi-trait MTA for PH, HD and TKW.At 7 cM from this MTA, the wPt-4660 locus is associated with PH in the whole collection, and in the same region, Maccaferri et al. [32] reported the presence of QTLs for PH, and also for grain yield and HD.
The third QTL is in the distal region on chromosome 7BL and was identified by the wPt-5138 (244.6 cM) marker, associated in both the whole collection (R 2 , 2%) and the durum sub-sample (R 2 , 6%).In the same region, the wPt-9746 (258.9 cM) marker is also putatively under selection, and is associated to a MTA only in the durum sub-sample (R 2 , 5%).In the same chromosome region, three MTAs (wPt-8312, wPt-0841, wPt-9133) located at 151 cM that span an interval map of 0.2 cM are associated with PH only in the durum sub-sample (R 2 , 4%, for each MTA).In a panel of 154 common wheat accessions, Zhang et al. [52] used association mapping analysis to identify a PH-related QTL at the Xgwm302 SSR locus mapped in the durum wheat consensus map [44], at about 4 cM from the MTAs in the present study.
The HD is associated to 31 (whole collection) and 23 (durum subsample) MTAs.These MTAs (corresponding to 47 loci) detected 30 putative QTL regions (Table 8; Figures 2 and 3) and are located on 11 different chromosomes (not including chromosomes 4B, 5A and 7B).The phenotypic variation of each of these loci ranges from 2% (20 loci) to 8% (1 locus).Overall, 7 loci are confirmed as MTAs for HD in both the whole collection and the durum sub-sample, and 6 markers are putatively under selection.As expected, the two homoeologous copies of Ppd-1 on the chromosomes of group 2 were close to the MTAs identified for HD in our association mapping panel.In particular, in the region on chromosome arm 2AS, the wPt-4533 (21.2 cM) locus was identified as a MTA for HD only in the whole collection.This MTA is close on the interval map, among the Xgwm359 (38.9 cM) and Xgwm122 (25.4 cM) markers, within which Griffiths et al. [53] mapped the photoperiod insensitivity locus (Ppd-A1) using meta-QTL analysis.Then on chromosome arm 2BS, Le Gouis et al. [33] mapped the Ppd-B1 locus close to the wPt-6932 marker, located at 66.5 cM on the durum wheat consensus map.In the present study, at a distance of ,10 cM from the wPt-6932 marker, one HD-related QTL was detected from the wPt-2600 and wPt-3720 markers, only in the durum subsample, with R 2 for both of 4% (Table 8; Figure 3).Considering the vernalisation requirement loci (Vrn1, Vrn2, Vrn3), in the present study, there are HD-related QTLs only in the chromosome regions harbouring the Vrn-B1 and Vrn-A3 loci, which are located on chromosome arms 5BL and 7AL, respectively, as reported by Le Gouis et al. [33].In particular, the HD-related QTL associated with the rPt-7889 (153.5 cM) locus on chromosome arm 5BL, with R 2 of 5%, is close to the Vrn-B1 locus.Then, on chromosome arm 7AL and very near to the Vrn-A3 locus, there are three MTAs (wPt-3883, wPt-7734, wPt-9796) associated to HD only in the whole collection, with a maximum R 2 of 3%.
On chromosome arm 3BS in an interval map of ,16 cM, 7 loci were identified as MTAs for three HD-related QTLs (Table 8; Figure 3), with R 2 from 2% to 5%.The first two of these QTLs are only in the durum sub-sample, while the third QTL was identified from 4 MTAs, of which the wPt-3536 (24.9 cM) locus is associated to HD in both the whole collection (R 2 , 2%) and the durum subsample (R 2 , 5%), and the wPt-0302 locus is putatively under selection.The wPt-0302 marker was associated with components of earliness in an association mapping study conducted by Le Gouis et al. [33], and in the same region Maccaferri et al. [32] identified QTLs for HD, PH and KNS.On chromosome arm 3AL, the Xgwm1042 (68.8 cM) SSR marker identified a MTA for HD in both the whole collection (R 2 of 4%) and the durum subsample (R 2 , 8%), which is a multi-trait MTA associated to the other traits considered in the present study.Close to this MTA, Blanco et al. [10] reported a QTL for PC and TKW, using a Table 5. Statistical estimations of all of the traits for the whole collection and the durum sub-sample, including the heritability (H 2 ) and least significant different (LSD).recombinant inbred line population derived from two durum wheat cultivars (Svevo, Ciccio), and Maccaferri et al. [32] used association mapping analysis to identify putative QTLs for PH, HD and TKW.Overall, all of the 10 HD-related QTLs (from 18 MTAs) on the group 3 chromosomes that were identified in the present study (Table 8; Figures 2 and 3) have also been confirmed in two other studies that were based on association and linkage QTL mapping [32][33].
In the distal region on chromosome arm 4AL in an interval map of 6.6 cM, the present study identified 4 MTAs, each of which explain 4% of the phenotypic variation, where the wPt-1007 DArT locus was detected in both the whole collection and the durum sub-sample (Table 8; Figure 2).In this same chromosome region, the studies of Maccaferri et al. [32] and Le Gouis et al. [33] showed HD-related QTLs.Finally, there are two MTAs (wPt-1375, wPt-5572) mapped on chromosome arm 6AL (159 cM) identified in both the whole collection and the durum sub-sample (R 2 from 2% to 4%).The putative QTL detected in this chromosome region is close to markers identified for significant associations reported by Maccaferri et al. [32].
The PC is involved in 38 (whole collection) and 26 (durum subsample) MTAs.These MTAs (corresponding to 58 loci) detected 39 putative QTL regions (Table 9; Figures 2 and 3) spread across all of the chromosomes.The phenotypic variation associated with these loci ranges from 2% (20 loci) to 10% (1 locus).Overall, there are 6 MTAs for PC identified in both the whole collection and the durum sub-sample, and 13 loci are putatively under selection.As expected, on chromosome arm 6BS, and tagged by the wPt-8721 (89.9 cM) DArT MTA (R 2 , 3%), the PC-related QTL was identified in the genomic region harbouring the Gpc-B1 gene, among the Nor2 (77.3 cM) and Xgwm193 (107.8 cM) markers, as reported by Uauy et al. [54].
On chromosome 2B, four putative QTLs that show high R 2 were identified by the BQ170801, wPt-2600, wPt-5736 and wPt-2724 MTAs (Table 9; Figure 3).The first QTL was detected in both the whole collection and the durum sub-sample (R 2 from 4% to 8%) on chromosome arm 2BS.In the same chromosome region, the second PC-related QTL was identified by two multitrait MTAs (wPt-2600, wPt-7320) and is associated only in the durum sub-sample, with R 2 of 4%.The other two PC-related QTLs were identified on chromosome arm 2BL in an interval map of ,10 cM, and were detected from 6 MTAs (R 2 from 2% to 7%), of which three were identified as putatively under selection (Table 9; Figure 3).The putative QTL detected in this chromosome region is close to markers identified as significant associations for PH, HD, TKW and KNS reported by Maccaferri et al. [32].
On chromosome arm 5AL, the Xgwm865 marker identified in both the whole collection and the durum sub-sample with R 2 from 5% to 10% is a multi-trait MTA associated also with TKW.In the same chromosome region, Blanco et al. [10] reported a QTL for grain protein content and KNS.
In an interval map of ,10 cM on chromosome arm 6AL, two PC-related QTLs were identified from 6 MTAs only in the whole collection (R 2 from 2% to 3%), for which two DArT markers (wPt-2632, wPt-6829) are putatively under selection.In the same chromosome region, and using linkage mapping analysis for kernel characteristics in a segregating population of common wheat, Tsilo et al. [8] reported a QTL for grain protein content.
The TKW is associated with 33 (whole collection) and 39 (durum sub-sample) MTAs.These MTAs (corresponding to 65 loci) detected 42 putative QTL regions (Table 10; Figures 2 and 3) distributed across all of the chromosomes.The phenotypic variation of each of these loci ranges from 2% (15 loci) to 8% (5 loci).Overall, 7 loci were confirmed as MTAs for TKW in both Table 6.Comparison of the two models for the calculation of the associations between the mapped and unmapped markers, and the traits, for the whole collection and the durum sub-sample.the whole collection and the durum sub-sample, and 8 loci are putatively under selection.There are three TKW-related QTLs on chromosome 2B; the first was identified on chromosome arm 2BS from two MTAs (wPt-1064, wPt-0615) and was detected for the whole collection and in the durum sub-sample, with R 2 from 3% to 6% (Table 10; Figure 3).The other two QTLs located on chromosome arm 2BL were identified by two MTAs for each QTL, for which the wPt-8284 marker is putatively under selection and wPt-7506 is associated in the whole collection (R 2 , 2%) and in the durum sub-sample (R 2 , 5%) (Table 10; Figure 3).The putative QTLs identified on chromosome 2B are close to markers that were identified as significant associations in the studies of Maccaferri et al. [32], Blanco et al. [10] and Yu et al. [55], for grain yield and yield components (TKW and KNS).On chromosome arm 3AL, five putative TKW-related QTLs (R 2 from 3% to 8%) were identified, of which four are detected only in the durum sub-sample dataset.In particular, in the distal region of this chromosome arm, there are 3 putative QTLs identified by 9 MTAs, of which three markers (tPt-7492, wPt-1888, wPt-2144) are putatively under selection, and two (wPt-9160, wPt-0398) are multi-trait MTAs associated also to PC and HD (Table 10; Figure 2).The QTL identified by two MTAs for wPt-1888 and wPt-3978 is the only one detected in both the whole collection and the durum sub-sample (R 2 from 2% to 4%).The putative QTLs detected in this chromosome region are close to markers identified as having significant associations in others studies of QTL mapping, as reported by Maccaferri et al. [32] and Peleg et al. [56].

Unmapped MTAs
Three SSRs and 378 DArT markers were not located on the durum wheat consensus map.The three SSR loci (Xgwm408, Xgwm1066, Xgwm1093) were identified as multi-trait MTAs, and the first two were detected in both the whole collection and the durum sub-sample, while the third marker was detected only in the durum sub-sample.Out of the 378 DArT markers, 55 were identified as MTAs only in the whole collection (12 putatively under selection), 38 loci were detected as MTAs only in the durum sub-sample (13 putatively under selection), and 21 are MTAs in both samples (5 putatively under selection).
Linkage disequilibrium analysis was performed between 114 unmapped DArTs identified as MTAs (in the whole collection and/or in the durum sub-sample) and 592 DArT loci mapped on the durum wheat consensus map.For 11 of these unmapped MTAs that are associated with all of the traits except PH, strong or complete LD was detected for the mapped markers, and therefore they are considered to be mapped in the same region.Five and 6 MTAs are located on genomes A and B, respectively, and are traitspecific MTAs, except for the wPT-9577 locus on chromosome arm 3BL, which is associated with PC and TKW.With few exceptions, all of the unmapped markers that are located on the consensus map using LD confirm the positions of MTAs identified using mapped markers.There are new MTAs associated to unmapped markers in only two cases: 1. wPt-8140 MTA associated only in the durum sub-sample is associated to TKW, while the closely linked wPt-6216 locus, on chromosome 3B, is associated with HD (Table 9; Figure 3); 2. wPt-9577 confirms the TKW MTA on chromosome 3BL, tagged by the closely linked wPt-6785 locus (Table 10; Figure 3), which is also a MTA for PC.

Discussion
In the present study, we have reported the results of a mapbased analysis of the genetic diversity, population structure, and LD patterns of a panel of tetraploid wheat accessions (230 inbred lines) that is constituted by a large set of durum wheat varieties and by a representative sample of T. turgidum evolutionary lineages, including wild and domesticated accessions.Using this structured panel, we conducted a parallel analysis of marker-trait association for four key agronomic traits on two samples with different LD levels and structures, with the possibility of cross validation of the association mapping results and their comparison with outlier analysis conducted in a previous study [43].
The analysis of the literature on QTL mapping for the same four traits confirms many of our MTAs and clearly indicates that our approach was successful for the detection of relevant QTLs for the traits investigated.Most likely, these positive results are also associated with the high heritability observed for the target traits.Considering the whole collection, the heritabilities for all of the traits exceeded 0.85 (except for PC at 0.67).Indeed, association mapping is strongly influenced by the heritability and by the quality of the phenotypic data [24,57].
Our study was greatly aided by the availability of the consensus map that was reported in Marone et al. [44], which is a very useful tool to compare the locations of the markers (SSRs and DArTs) from the present study with other markers reported in other studies.Our data rely on a dataset that for both marker number and sample size (,1000, including SSR and DArT markers) is larger than that of similar association mapping studies that have been carried out in both durum and bread wheat [22,25,30,32,33,52,58,59].

Genetic diversity and linkage disequilibrium
We were able to analyse the genetic diversity of our samples using mapped markers that are well distributed among the chromosomes, with a density of 1 marker every 4.0 cM.Our data indicate that the level of diversity among the chromosomes is heterogeneous in the durum sub-sample compared with that observed in the whole collection (where many naked and hulled tetraploid types were included, along with a few wild emmer genotypes), with a relatively strong reduction in diversity in the durum sub-sample for chromosome 1A.The genetic determinants of PC and TKW have been mapped previously for chromosome 1A [60][61].The selection in durum wheat due to domestication, and particularly to obtain the modern varieties, has been targeted to meet very strict agronomic and quality (high PC) requirements [43], so for chromosome 1A, our data can be explained by the occurrence of a higher loss of diversity in the regions harbouring loci involved in PC and TKW.
The durum wheat consensus map developed by Marone et al. [44] was needed for the present study, to determine the loci positions for the combination of the results of different segregation analysis into a single map.Although consensus maps inherently contain errors in marker order and linkage distance, the pattern of LD detected was in good agreement with the consensus marker positions.Thus we can be confident in our analysis of the LD pattern and decay over genetic distances.
The means of r 2 of all of the interchromosomal pairs in the whole collection and in the durum sub-sample are 0.023 and 0.029, respectively, which are slightly higher than the 0.022 described by Breseghello and Sorrells [47] and 0.019 reported by Neumann et al. [25] in populations of smaller sizes to that investigated here, and with a similar number of marker pairs.This indicates that the LD due to the population structure is very similar between the durum wheat, bread wheat and barley accessions studied here and in the cited literature.
For the intrachromosomal LD, in each dataset the percentage of locus pairs showing significant LD is highest for adjacent locus pairs (,10 cM), lower for locus pairs on the same chromosome (.10 cM), and lowest for locus pairs on independent chromosomes.In the durum sub-sample, the LD decay (the point in which the LOESS curve intercepts the critical r 2 ) was at ca.18 cM, while in the whole collection the decay was more rapid, at 14 cM.The level of LD in our durum sub-sample was larger than that observed Table 7. Representative markers significantly associated with the plant height trait, for the whole collection (normal), the durum sub-sample (underlined), and both (bold).in durum wheat by Maccaferri et al. [32,39], and in common wheat by Breseghello and Sorrells [47], Chao et al. [58] and Somers et al. [22].The durum sub-sample and the Q1 group follow the same pattern in the decay of the LD, whereas in the Q2 group, the LD falls at a distance of 5 cM.Overall, the level of LD was greater in the durum sub-sample compared to the Q2 group, which includes wild and domesticated accessions.
Overall, this picture indicates that in tetraploid wheat the pattern of LD is extremely population dependent and related to the process of domestication.This has also been reported in other autogamous species, such as barley, where the genome-wide LD extended from 10 cM to 15 cM [62][63][64][65].The higher level of LD found in the durum sub-sample compared to the wild and domesticated accessions (the Q2 group) can be explained by the different levels of historical recombination in the two samples, and because of the effects of the bottlenecks of domestication and breeding, as increases in the level of LD.Our data can be explained under just a neutral model, where the effective recombination rate due to the effective population size along with the physical and recombination distance between the loci is the major determinants of the pattern of LD.For this reason, using the LD pattern [65], we were also able to map the loci that were unmapped in the consensus map [44], and in particular those that were MTAs.
However, we cannot exclude the role of selection in favouring the conservation of haplotype blocks, along with the possible role of inversions and other chromosomal mutations.On the other hand, we do not observe much evidence of such phenomena, and only in a few cases did we observe evidence of long distance or interchromosomal LD, as has been reported for other species where epistatic selection might have been the cause of the observed patterns [30,66].Only in one case did we observe Table 8.Representative markers significantly associated with the heading date trait, for the whole collection (normal), the durum sub-sample (underlined), and both (bold).two co-segregating DArT markers (wPt-1976 and wPt-3403) were detected as putatively under selection, the first of which was identified as a MTA for PC.Three novel TKW-related QTLs were detected on chromosomes 3A, 5B and 6B.On chromosome arm 3AL, a TKW-related QTL was identified in both the whole collection and the durum sub-sample by two MTAs, of which the wPt-1888 marker is putatively under selection.In this centromeric region of 13 cM on chromosome arm 3AL, six DArT markers were mapped, with all of these identified as MTAs for TKW, with R 2 from 2% to 6%.The second QTL was mapped on linkage group 5B-3, from two MTAs, of which wPt-7605 was putatively under selection.The last QTL on chromosome arm 6BS was detected by three MTAs, of which wPt-4742 was identified in both the whole collection and the durum sub-sample.Between these MTAs, we detected two DArT markers (wPt-5333, wPt-3309) that are putatively under selection, so this QTL was identified by two different approaches.

Conclusions
Our data have important implications in relation to the development of association mapping studies in durum wheat, which suggests that whole-genome scan approaches are feasible.Moreover, with careful assessment of the population structure, it will be possible to choose appropriate populations to obtain different resolutions in association studies in tetraploid wheat.Additional LD studies that focus on a few genomic segments will be needed to obtain a more complete picture of the level and structure of the LD, and to conduct LD mapping based on a candidate gene approach using specific tetraploid wheat populations (e.g., the wild form).
By the genotyping of collections of accessions and cultivars with a large number of markers, association studies provide the means to improve the genetic characterisation of the germplasm.Markers associated with genomic regions that control traits of agronomic interest improve our understanding of the genetic value of the individual genotypes within germplasm collections.Moreover, as similar studies are completed, the genetic characterisation of many different germplasm sets will provide researchers with a comprehensive global perspective of the tetraploid wheat gene pool.The current study is amongst the first to assemble and analyse a population that represents all of the tetraploid wheat sub-species, among which 108 durum wheat accessions (96 mainly as elite cultivars) are representative of the Italian durum wheat breeding programmes over the last 100 years.The information that we have generated is also valuable for the choice of parental lines to use in crosses (in the form of bi-parental or MAGIC populations) to complement genome-wide association mapping data and to analyse some of the novel associations in more detail.Table S1 Overview of the LD in the intrachromosomal pairs for the whole collection, the durum sub-sample, and the two groups (Q1 and Q2).

(DOCX)
Table S2 List of all of the MTAs identified for the four traits (PH, HD, PC, TKW) using the GLM and MLM.(XLSX)

Figure 1 .
Figure 1.Overview of the LD parameter r 2 of the intrachromosomal pairs in the whole collection (Pop) (a), in the durum sub-sample (b), and in the two Q groups, Q1 (c) and Q2 (d).The scatterplots show the distributions of the LD parameter r 2 according to the genetic distance.The horizontal line indicates the 95% percentile of the distribution of the unlinked r 2 , which gives the critical value of r 2 .doi:10.1371/journal.pone.0095211.g001 frequency correlations (r 2 ) for all pairs, number (Nu) of pairs and percentage (%) significant in LD (P,0.01),Nu and % of physically linked pairs (r 2 .criticalr 2 ), Nu of pairs in complete LD (r 2 = 1).doi:10.1371/journal.pone.0095211.t004 linear model (Q matrix), MLM = mixed linear model (Q matrix + kinship matrix), sign.= significant with P,0.05.doi:10.1371/journal.pone.0095211.t006

Figure 2 .
Figure 2. Genome A: Durum wheat consensus linkage map (reported in Marone et al. [44]) of the chromosome positions of significant markers associated with plant height (red), heading date (yellow), protein content (blue) and thousand kernel weight (green).Right side: squares, MTAs in the whole collection; stars, MTAs in the durum sub-sample; circles, MTAs in both.DArT markers putatively under selection are indicate in bold, while the unmapped MTA are given in brackets.doi:10.1371/journal.pone.0095211.g002

Figure 3 .
Figure 3. Genome B: Durum wheat consensus linkage map (reported in Marone et al. [44]) of the chromosome positions of significant markers associated with plant height (red), heading date (yellow), protein content (blue) and thousand kernel weight (green).Right side: squares, MTAs in the whole collection; stars, MTAs in the durum sub-sample; circles, MTAs in both.DArT markers putatively under selection are indicate in bold, while the unmapped MTA are given in brackets.doi:10.1371/journal.pone.0095211.g003

Figure
Figure S1 Overview of the LD parameter r 2 of the intrachromosomal pairs in the single chromosome with a large number of DArT markers.The scatterplots show the distribution of the LD parameter r 2 according to the genetic distance.(PDF) Figure S2 Phenotypic distribution for plant height (PH), heading date (HD), protein content (PC), and thousand kernel weight (TKW) in the whole collection.(PDF) Figure S3 Phenotypic distribution for plant height (PH), heading date (HD), protein content (PC), and thousand kernel weight (TKW) in the durum sub-sample.(PDF)

Table 1 .
Number of accessions for each subspecies included in the whole wheat collection and in the Q1 and Q2 subgroups.

Table 3 .
Interchromosomal LD for the whole collection, the durum sub-sample, and the Q1 and Q2 groups, and for genomes A and B.

Table 9 .
Representative markers significantly associated with the protein content trait, for the whole collection (normal), the durum sub-sample (underlined), and both (bold).
complete LD at long-range distance, on chromosome 3A, in which the wPt-4545 marker is in complete LD with wPt-3978 and wPt-1888 at 11.3 cM in all of the datasets used in the present study.In other cases, on chromosome 2A there is the wPt-8925 marker in high LD (r 2 .0.75) with three co-mapping DArT markers (wPt-3611, wPt-0277, wPt-3744) at 18.3 cM in all of the datasets, with the exception of the Q2 group.In the durum sub-sample, we observe other cases of long-range LD between loosely linked markers on chromosomes 6A and 3B (in particular, for the linkage group 3B-2) at a distance of ,14 cM.However, in some cases, the pattern of LD might be indicative of areas of the genome where selection and chromosomal

Table 10 .
Representative markers significantly associated with the thousand kernel weight trait, for the whole collection (normal), the durum sub-sample (underlined), and both (bold).