Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map

Association mapping is usually performed by testing the correlation between a single marker and phenotypes. However, because patterns of variation within genomes are inherited as blocks, clustering markers into haplotypes for genome-wide scans could be a worthwhile approach to improve statistical power to detect associations. The availability of high-density molecular data allows the possibility to assess the potential of both approaches to identify marker-trait associations in durum wheat. In the present study, we used single marker- and haplotype-based approaches to identify loci associated with semolina and pasta colour in durum wheat, the main objective being to evaluate the potential benefits of haplotype-based analysis for identifying quantitative trait loci. One hundred sixty-nine durum lines were genotyped using the Illumina 90K Infinium iSelect assay, and 12,234 polymorphic single nucleotide polymorphism (SNP) markers were generated and used to assess the population structure and the linkage disequilibrium (LD) patterns. A total of 8,581 SNPs previously localized to a high-density consensus map were clustered into 406 haplotype blocks based on the average LD distance of 5.3 cM. Combining multiple SNPs into haplotype blocks increased the average polymorphism information content (PIC) from 0.27 per SNP to 0.50 per haplotype. The haplotype-based analysis identified 12 loci associated with grain pigment colour traits, including the five loci identified by the single marker-based analysis. Furthermore, the haplotype-based analysis resulted in an increase of the phenotypic variance explained (50.4% on average) and the allelic effect (33.7% on average) when compared to single marker analysis. The presence of multiple allelic combinations within each haplotype locus offers potential for screening the most favorable haplotype series and may facilitate marker-assisted selection of grain pigment colour in durum wheat. These results suggest a benefit of haplotype-based analysis over single marker analysis to detect loci associated with colour traits in durum wheat.

Introduction Marker-assisted selection (MAS) is increasing in use in plant breeding as a means to enrich selections from segregating populations for desirable alleles influencing economically important traits. In durum wheat (Triticum turgidum L. var durum), most MAS has focused on selection of traits controlled by single genes or large effect quantitative trait loci (QTL) [1]. Identification of robust markers is becoming easier because of the availability of high-density genetic maps (e.g., [2][3][4]). Although several QTL were reported in the literature, relatively few are practically used in breeding programs [5]. Reasons for their lack of practical use are mostly due to the difficulties with context dependencies caused by genotype-environmental interactions and/or epistasis, to the limitations of sampling bi-parental populations with multi-genic traits, and to lack of follow-through research to validate identified QTL [6][7][8]. Identification of marker-trait associations using association mapping techniques, could avoid some of these context dependencies.
Association mapping (AM) is a complementary strategy to QTL mapping to identify associations between genotype and phenotype [9] and is based on linkage disequilibrium (LD) in a collection of unrelated individuals. In contrast to bi-parental mapping, AM allows a broader population from which to sample multiple alleles and to map with higher resolution [9,10]. Most AM studies test correlations between a single marker and phenotypes. However, because patterns of variation within genomes are inherited as linkage blocks [11][12][13], clustering markers into haplotypes is gaining acceptance in genome-wide association studies.
Advances in high-throughput genotyping technologies have made SNPs markers of choice for genome-wide association studies. SNPs are the most abundant class of sequence variability in the genome and thus have the potential to provide the highest map resolution (Jones et al. 2007). However, SNPs are usually bi-allelic so each provides less polymorphism information content (PIC) than markers such as SSRs (multi-allelic), thus marker density must be increased. This limitation can be overcome by merging SNPs into haplotypes (Lu et al. 2012). Haplotype-based analyses have been successfully carried out mostly in human genetics due to the availability of data from the HapMap project [14,15]. Similar efforts are gaining ground in various crops such as maize [16][17][18], rice [19][20][21][22] and soybean [23][24][25][26]. In wheat, haplotype analyses were performed for QTL or marker-trait association studies [27][28][29], pattern of genetic variations [30-32] and gene diversity [33][34][35]. However, only a relatively low number of SNPs and/or SSR markers were used for marker-trait association studies.
Various arguments advocating for haplotype-based analysis rather than single marker analysis have been proposed. In particular, haplotype-based analysis could capture epistatic interactions between SNPs at a locus [36,37]; provide more information to estimate whether two alleles are identical by descent [38]; elucidate the exact biological role played by neighbouring amino-acids on a protein structure [39]; reduce the number of tests and hence the type I error rate [40]; capture information from evolutionary history [41]; and provide more power than single marker when an allelic series exists at a locus [42][43][44][45]. The fundamental question that arises from all these rationales is to know whether the power and accuracy of association mapping can be improved by grouping SNPs into haplotype blocks (see [46] for a review). Intuitively, one could expect haplotypes to be more powerful due to the simultaneous use of multiple markers information [47][48][49]. Simulation studies have shown that clustering of markers into haplotypes can provide greater QTL detection power and mapping accuracy than single markers [43,[50][51][52], and this was supported in empirical studies [17,18,[53][54][55][56][57][58][59][60]. Haplotype-based approach improves prediction accuracy compared with the individual SNP approach [61][62][63]. In contrast, a few studies found no apparent advantage of haplotype-based analysis over individual SNP analysis [64-66] for detecting QTL. The outcome of the haplotype-based analysis could change under different models relating genotype to phenotype or under different demographic scenarios [67]. Indeed statistical adjustments for population structure and inclusion of kinship relationships is critical to reduce type I error rates of association mapping studies regardless of a haplotype or single marker approach is used [68][69][70].
There are various criteria for defining haplotype blocks [12,46,57,71,72]. In particular, haplotype blocks can be defined using a sliding window [28,57,[73][74][75] or combining SNPs within a specific window size [22,58,76]. Studies in barley provided good support for the use of simple overlapping sliding windows of three SNPs [57]. Other studies proposed different numbers of SNPs for sliding windows, ranging from 2 to 10 SNPs [28,[77][78][79][80]. Although this approach is easy to implement, it could potentially lead to large degrees of freedom in the test statistic due to the large number of haplotypes.
A key factor in the success of whole-genome association mapping remains adequate marker coverage across the genome because sparse coverage reduces the power for marker identification [81]. However, the extent of genotyping required increases with rapid LD decay. Linkage disequilibrium is higher in autogamous species due to lower effective recombination [82]. In durum wheat, LD is limited to distances of 2 to 5 cM but is not uniform along chromosomes [83]. Advances in sequencing and genotyping technology allow generation of large amounts of SNP data and the Illumina 90,000 iSelect SNP chip [4] allows development of several robust high-density genetic maps of tetraploid wheat (see [84] for review). We published the first high-density SNP consensus map which anchored over 35,000 SNP markers to all 14 durum wheat chromosomes [3]. The average marker density was 0.079 cM/marker for the B genome and 0.101 cM/marker for the A genome, which provides a framework for association mapping. Because the majority of mapped SNPs are gene-derived markers, this map provides valuable anchor points for post-mapping genetic analysis of the loci and QTL [3]. Improvement of yellow pigment (YP) concentration in durum grain is targeted globally by breeding programs due to increased market demand for bright yellow colour of semolina and pasta products (see [85] for review). The genetics of YP is complex [86], and is due to carotenoid pigment content in the endosperm. Quantitative trait loci were detected on all chromosomes of the durum genome, and genomic regions housing known YP QTL were confirmed on groups 1, 2 and 3 chromosomes Reimer et al.
[87] utilized a genetically diverse collection of cultivars and breeding lines collected from global breeding programs, and performed association mapping for grain YP concentration. Although AM was successful at identifying QTL, we have not applied these to MAS because validation experiments showed most QTL did not explain sufficient proportions of phenotypic variation in our locally-adapted breeding materials. In addition, several of the QTL that we discovered were specific to lines from the diverse collection but most were identical by state in our breeding material, despite large phenotypic differences in trait expression [100,101]. One strategy to overcome such limitations is to perform association mapping in locallyadapted breeding material [5]. Phenotypic data collected during the course of testing of inbred lines within a breeding program, often with replication over environments, is a valuable resource for discovery of marker associations because these lines are expected to carry a high proportion of relevant, desirable alleles. However such phenotypic data sets are usually unbalanced because breeders tend to cull materials throughout the breeding cycle, making exploitation of such data complicated [5]. Utilization of a common set of check cultivars over successive breeding cycles in combination with mixed models which incorporate correlations among environments can be used to estimate best linear unbiased estimates (BLUEs) for individual lines and these could then be used to evaluate marker-trait associations. The utility of this approach has been demonstrated for durum wheat [101,102], bread wheat [103], barley [104][105][106][107], potato [108] and sugarcane [109].
Taken together, the recent advances in SNP marker detection in durum wheat and robust phenotypic data collected from our breeding programs [100,101] provided the opportunity to further assess association mapping strategies of practical use in a breeding program. Also, the availability of a high-density SNP consensus map allows the opportunity to assess haplotype based approaches for AM in durum wheat. The main objective of this study was to compare the two mapping approaches to explore the potential of haplotype-based analysis in durum wheat and to identify genomic regions associated with pigment colour in semolina and pasta.

Plant material
One hundred and sixty-nine durum lines were selected for the study (S1 Table) from the official Canadian durum cultivar registration trial grown in Canada between 1999 and 2013. Phenotypic data and trials were described in previous reports [101,110]. Candidate lines were tested for one to three years but only lines with at least two years of data were included in the present study. Each trial included check cultivars; AC Avonlea [111], AC Morse, AC Navigator [112] and Strongfield [113] since 1999, and Commander [114] added in 2001. The checks AC Morse and Commander were dropped in 2013 and the new check Brigade [115] was brought in. Trials were arranged in lattice designs with four replications, except in 2013 where most locations comprised three replications.

Quality analyses
End-use quality traits were measured on composite grain samples of locations within years. The composites included locations with acceptable physical condition (commercial grade Canada Western Amber Durum #3 or better), and blended to give a target grain protein concentration of about 13%. Yellow pigment (parts per million) of semolina was measured using the AACC method 14-50 (AACC 2000). Colour of semolina and of pasta dried at 70˚C was measured with a Minolta CR-200 Chroma Meter (Minolta, Japan) equipped with a 50 mm measuring head to assess CIELAB a Ã and b Ã colour space units. Semolina a Ã measurement was discontinued after 2008. The colour loss during pasta manufacture was estimated by regressing pasta b Ã on semolina b Ã [101]. The residuals for each genotype, actual minus predicted values, were used as a measure of colour loss in the analysis. Positive residuals indicate less pigment loss than the population average, while negative residuals indicate greater than average loss. The data were analysed with SAS version 9.3 [116] Proc Mixed using lines (fixed) with years (random) as replication to generate lsmeans. The analyses included all genotypes tested in the registration trial (approximately 300), not just those genotyped, so as to provide a better estimate of random variances and covariances. Pearson's correlations were performed among the lsmeans of the traits.

SNP genotyping and genetic diversity analysis
Genomic DNA was extracted from fresh young leaf tissue using a modified CTAB method [117]. DNA was quantified using PicoGreen (Invitrogen) fluorescence assay, and diluted to 50 ng/μl. Genotyping was performed according to the method published previously [4]. The 90K iSelect assay chips were run on an Illumina HiScan for imaging and the resulting data were loaded into GenomeStudio v2011.1 software (Illumina) for SNP calling. After filtering those SNPs with ambiguous calls, having more than 25% missing values, or having MAF < 0.05, a total of 12,234 polymorphic SNP markers were used for analyses. PowerMarker V3.25 software [118] was used to calculate the summary statistics including allele number, allele frequency and PIC.

Genotyping with Rht-B1b and Lxp-B1 genes
Because the Lpx-B1 deletion has been associated with reduced colour loss during processing [119,120], the registration lines were genotyped with a Lpx-B1 marker. The registration lines were also genotyped with Rht-B1b, an allele known to confer semi-dwarf growth habit in wheat [121] because the relatively few semi-dwarf lines in the panel were selected for very high pigment, presenting the possibility of spurious associations. In order to relate the association signals to Lpx-B1 and Rht-B1b, pairwise LD (r 2 ) was performed between all 4B association signals and these genes using MIDAS software [122].

Population structure and linkage disequilibrium analysis
Population structure is one of several important factors that strongly influence LD. The presence of population stratification and an unequal distribution of alleles within groups can result in spurious associations [82]. Population structure was estimated using discriminant analysis of principal components (DAPC) as implemented in the Adegenet R package version 1.4 [123]. To avoid unstable results, the maximum number of principal components (PCs) should be N/3, N being the number of lines [123]. Therefore, 56 PCs were included in the model.
Single nucleotide polymorphism markers having MAF < 0.05 were filtered out prior to estimating the LD because the estimation of LD using r 2 is dependent on allele frequency and rare alleles can inflate the r 2 [124]. The LD was estimated as a correlation coefficient (r 2 ) between all pairwise comparisons of loci both genome-wide and at the chromosome level, using the Genetics R package available at http://cran.r-project.org/. The r 2 distribution of loci belonging to different chromosomes was used to calculate a threshold of r 2 for LD which was taken from the parametric 95 th percentile of that distribution [125]. The genetic distance corresponding to that r 2 threshold was determined with nonlinear regression by plotting the genetic distance over which LD decayed, using R code written by F. Marroni that is available at http:// fabiomarroni.wordpress.com/.

Marker imputation and haplotype construction
Prior to haplotype construction, missing calls were imputed using the RF regression procedure [126] as implemented in the R package "randomForest" [127,128]. The RF procedure has been described in detail for imputing missing genotypes for genomic selection [129] and has been successfully used for genetic diversity analysis [130] and genome-wide association studies [131][132][133].
For haplotype construction, redundant information known to introduce bias [134] was first filtered out using an in-house Ruby script. When two or more SNPs had the same genotype across all breeding lines along the same chromosome, they were represented by a single genotype. Thus, a total of 8,581 SNPs were used for the analysis. The SNPs were sorted by position along each chromosome based on the durum high-density SNP-based consensus map [3]. Those SNPs spanned all 14 chromosomes of durum wheat with an average density of one marker per 0.3 cM (S2 Table). Then, SNPs within a window size of 5.3 cM (estimate of average LD decay) on the same chromosome were combined to form a haplotype block and assigned to the same locus. Loci for each chromosome were named as combination of the prefix 'hap', the chromosome and an index that is the incrementing number (1 to N, N being the total number of haplotypes) of the haplotype along the chromosome (e.g., hap_1A_1 and hap_1B_2 designate the first haplotype on chromosome 1A and the second haplotype on chromosome 1B, respectively). Only 17 haplotypes appeared to be rare (MAF < 0.05) and were excluded from further analyses.

Association analysis
Marker-trait associations were carried out using the general linear model (GLM) and the mixed linear model (MLM) as implemented in TASSEL software version 3 [135]. In order to control spurious associations, population structure and/or relatedness between individuals were taken into account in both GLM and MLM procedures. The Q matrix was based on the four groups from the discriminant analysis of principal components and the kinship (K) matrix was calculated using TASSEL. To control for experiment-wise error, nominal P-values were adjusted according to Storey-Taylor-Siegmund's adaptive step-up procedure [136] as implemented in the Mutoss R package [137]. A false discovery rate (FDR) of 5% was used for computation and only SNPs and haplotypes having an adjusted P-value less than 0.05 were declared significant. The allelic effect of haplotypes and SNPs was estimated as the difference between the mean value of the lines carrying these haplotypes and SNPs, and the mean value of the entire population for each trait. Thus, only SNPs and haplotypes having relatively strong allelic effect were reported.

Analysis of phenotypic data
Large phenotypic variation was observed among the breeding lines for all of the traits (Table 1). In particular, pasta a Ã and semolina pigment values ranged from 1.66 to 5.79 and 6.0 to 12.05, respectively. Significant differences were observed between subpopulations ( Table 1). The correlation among colour traits is presented in Table 2. Pasta a Ã was significantly (P < 0.001) correlated with all of the traits, and ranged from r = 0.40 (pigment loss) to 0.69 (semolina pigment). Semolina a Ã was correlated with only pasta a Ã . However, semolina pigment exhibited strong correlation with semolina b Ã , pasta a Ã and pasta b Ã . The highest correlation (r = 0.96) was observed between semolina pigment and semolina b Ã .

Population structure and LD decay
Four subpopulations among the breeding lines were inferred using discriminant analysis of principal components (Fig 1). The accessions list with their subpopulations is shown in S1 Table. The total amount of genetic variation explained by the first 56 eigenvectors was 80%. Breeding lines were differentiated according to pedigree, source breeding program, and testing year. Subpopulation 1 is largely AC Avonlea [111] and/or Strongfield [113] heritage and comprised on average the most recent lines in the trial. Subpopulation 2 is based on Kyle [138] heritage, with the majority of the lines from the Agriculture and AgriFood Canada, Swift Current program and representing an earlier era of testing than subpopulation 1. Subpopulation 3 contained lines with diverse ancestry from CIMMYT, University of North Dakota, Agriculture and AgriFood Canada, Winnipeg and Swift Current, and University of Saskatchewan. Subpopulation 4 was similar to subpopulation 3 but without the Swift Current component and represented the oldest era of testing of the four groups.
A total of 12,234 polymorphic SNPs were used to estimate the LD across all chromosomes. The critical r 2 value from which the genome-wide LD decayed was estimated at 0.2 (Fig 2). The average genetic distance at which LD across all chromosomes decayed (r 2 < 0.2) was 5.3 cM. Nonetheless, that distance varied among chromosomes, from 3.0 (chromosome 4A) to 9.4 cM (chromosome 5B). The LD pattern of all chromosomes is presented in S1

Allele diversity as revealed by SNPs and haplotypes
After imputation, a total of 8,581 SNPs having a minor allele frequency greater than 5% and located on the high-density consensus map were used for analyses. Only 14.2% (1,222/8,581) of the SNPs showed almost equal allele frequencies between their two alternative alleles. The average PIC for these 8,581 SNPs was 0.27, ranging from 0.10 to 0.38 (Fig 3).
A total of 406 haplotype blocks containing 2 to 60 SNPs were generated. Of these haplotype blocks, 4.9% contained two SNPs, 47.5% contained three to nine SNPs and 47.6% had more than 10 SNPs. Haplotype blocks showed a higher level of allele diversity; the average PIC was 0.50, ranging from 0.10 to 0.93 (Fig 3). The number of allele combinations varied from 2 to 161 among haplotype blocks. Loci associated with pigment colour As shown by the quantile-quantile plots (S2 Fig), the MLM (K) and MLM (Q+K) models were significantly better than the GLM naïve and GLM (Q) models in reducing spurious associations. Only the MLM (Q+K) model was kept for the analyses because in general it performed a little better than the MLM (K) model. Single marker-based analysis identified five loci associated with colour components ( Table 3, Fig 4). The number of loci varied depending on the trait. Most of the loci revealed by the single marker-based analysis were associated with at least two traits, Tdurum_con-tig51688_681 on 4B with pasta a Ã , pasta b Ã and pigment loss; Tdurum_contig54634-815 on 2A with pasta b Ã and pigment loss; BobWhite_c41527_201 on 2A and Tdurum_contig54832_139 on 7A with semolina b Ã and semolina pigment. Three loci associated with pigment loss were detected on chromosome 2A and 4B, explaining 11.9 to 26.2% of the phenotypic variation. A total of three loci, located on 2A and 4B, were associated with pasta b Ã , explaining 9.5 to 26.2% of the variation.
Haplotype-based analysis identified a total of 12 loci associated with pigment colour components (Table 3, Fig 4). Detailed information (number and list of SNPs) on these haplotype loci are presented in S3 Table. Most (8/12) of the loci were associated with at least two colour components. In particular, hap_4B_6, hap_4B_7, hap_4B_12 and hap_5B_25 were associated with pasta b Ã and pigment loss while hap_2A_18 and hap_7A_32 were associated with semolina b Ã and semolina pigment. For pasta a Ã , a total of four loci were detected, located on chromosomes 2A, 3B and 4B. For pigment loss, six loci were detected on chromosome 2A, 3B, 4B and 5B. Six loci were detected for pasta b Ã , located on 2A, 4B, 5B and 7B. Three haplotypes (hap_2A_18, hap_7A_32 and hap_7B_36) were associated with both semolina b Ã and semolina pigment. Over all pigment traits, the percentage of variance explained ranged from 8.5 to 40.2%.

Comparison of loci identified by single marker-and haplotype-based analysis
The haplotype-based analysis identified a total of 12 loci associated with grain pigment colour traits, including all of the five loci identified by the single marker-based analysis. In particular, the haplotype-based analysis detected at least one additional locus for each trait. The loci not detected by the single marker approach explained in general a relatively small amount of the phenotypic variation.
Haplotype-based analysis improved the amount of the phenotypic variance explained and the allelic effect (Table 3). Overall, there was substantial increase in the phenotypic variance explained (50.4% on average) and allelic effect (33.7% on average). For instance, the locus hap_4B_6 showed an increase of 87.9% for the phenotypic variation of pasta a Ã ; and the allelic effect of the locus hap_7A_32 was 64.3% greater than that of the associated SNP for semolina pigment. The associated haplotype loci consisted of 2 to 10 SNPs although the number of SNPs ranged from 2 to 60 among the 406 haplotype blocks.

Population structure and LD decay
In this study, the discriminant analysis of principal components [139] clustered the 169 breeding lines into four subpopulations. This population structure is in agreement with known differences in pedigree, breeding program source and era of testing in the trials.
The discriminant analysis of principal components successfully unraveled the population structure in germplasm such as cultivated sweet potato [140], rice [141], acacia [142] and sweet cherry [143]. The presence of genetic structure within a population can lead to spurious   association signals [134,[144][145][146][147][148]. Understanding the actual population structure of the durum breeding panel was intended to limit the false discovery rate in the association analysis. The average genetic distance results suggest that the LD mapping using our breeding panel can achieve a resolution of < 5 cM. Few (4%) markers showed very high LD (r 2 > 0.8). Our results are congruent with those reported in bread wheat [149] and a geographically diverse durum wheat panel where the LD decayed within 5 cM on average [83]. However, a relatively higher (10 cM) LD decay distance was reported in a durum elite collection [150].

Association mapping based on single marker and haplotypes
We used the 3-SNP sliding windows method and came up with a total of 8,537 haplotype blocks (data not shown) that is markedly greater than the 406 LD-based haplotype blocks we generated and used for analyses. A large number of haplotypes increases the degree of freedom for a test statistic [151]. Intuitively, the type I error rate would be higher for haplotypes derived from the 3-SNP sliding windows compared to the LD-based haplotypes. In addition, the sliding windows approach raises the question of the optimum number of markers to be included in the haplotype. A large window may include too many non-informative markers while a small window may ignore informative markers, both of which will lead to a reduction in testing power [152]. Alternatively, variable-sized sliding windows approaches have been proposed [73, [153][154][155][156][157]. However, most of the variable-sized methods require some computationally intensive phasing program to account for uncertain haplotype phases [158]. Because the optimal window size is always influenced by the underlying LD pattern [154,159], we constructed haplotypes based on the average LD extent in our material. It is well known that LD patterns are variable across a large genomic region or the whole genome; therefore we also built haplotypes using chromosome-based LD. However, we found no substantial difference in size or number of haplotypes, using the chromosome-based LD distance rather than the average distance of LD decay (5.3 cM), suggesting that taking the average distance is reasonable for analysis. Similarly, the average LD distance has been used to build haplotypes in many studies when LD extent varied among chromosomes (e.g. [58,75]). An advantage of using the LD-based method is that it avoids taking an arbitrary or suggestive number of markers to be included in the haplotype. This method is relatively easy to implement although it requires a pre-computation of the LD extent in the material under investigation. Haplotype blocks defined according to the LD usually reflect the variation patterns of the genome better than haplotype blocks artificially outlined by a fixed number of SNP [61].
The haplotype-based analysis was superior to the individual SNP analysis because it identified seven more loci associated with colour components. The same loci (hap_2A_18, hap_7A_32 and hap_7B_36) detected for semolina pigment and semolina b Ã were not surprising because these traits showed the highest correlation (r = 0.96) amongst traits. Furthermore, the haplotype-based analysis resulted in a substantial increase (68.3% on average) in the phenotypic variance explained. The improvement ranged from an 87.9% increase of phenotypic variance explained for pasta a Ã by haplotype hap_4B_6 to 27.8% for pasta b Ã by hap_2A_18 compared to the associated single markers. Increases in the amount of phenotypic variance explained attributed to haplotype-based analysis were also reported in other crop species such as barley [57] and maize [75]. Similarly, haplotypes explained up to 80% more of the phenotypic variance for genes in cattle [53]. The increased allelic effect (e.g., 64.3% increased for semolina pigment attributed to hap_7A_32) from combining SNPs into haplotypes demonstrated an increase in power over the single marker method. However, no single allelic combination within any haplotype locus was able to select all of the lines having the desirable phenotype. Moreover, in general each haplotype carried more than one favorable allelic series. For example for pasta a Ã , in addition to the most favorable allelic series (effect = 1.66) of hap_4B_6, two other allelic combinations showed good allelic effect on the trait, 1.41 and 1.37. Combinations of several allelic series within each haplotype, as well as the aggregation of the best haplotypes improved ability to select lines having the desirable phenotypes. These results confirm the complex genetic architecture of colour trait in durum.
Haplotype-based analysis was reported to increase the power of detecting QTL compared to single-marker analysis, based on simulated data [43]. Including more marker alleles in haplotypes leads to a higher proportion of the QTL variance being explained [52, 160] and provides additional power to the analysis [45,161]. However, the haplotype loci detected in this study were not those having the highest number of SNPs. Thus, the power of haplotypes in increasing the variance explained could not be attributed mainly to the number of markers. The informativeness of markers within the haplotypes is more likely to be of greater importance. As functional nucleotide polymorphism (sequence variations responsible for alterations in gene function) databases are becoming available, including the most informative markers in haplotypes could enhance the potential utility of haplotype-based studies [21,162]. In contrast, Zhao et al. [74] found no apparent advantage of haplotype-based analysis over individual SNP analysis in their simulation study that was designed to resemble the demography and population history of livestock. Lorenz et al. [163] reached similar conclusion but they noted that their conclusion may not be valid under different models relating genotype to phenotype or under different demographic scenarios. Despite of these contradictory results, haplotype-based analysis could play a critical role in association mapping studies in crop plants as recently discussed by Gupta et al. [164].  [166]. Therefore, we evaluated how these loci relate to Lpx-B1.1 and Rht-B1b genes. Two of the three loci we identified on 4B associated with pigment loss, and explaining 28.9 to 33.6% of the phenotypic variation, were associated with Rht-B1b and Lpx-B1. The locus hap_4B_6 showed strong LD (r 2 = 0.86) with Lxp-B1.1 gene with the locus hap_4B_7 was strongly associated (r 2 = 0.92) with the semidwarf height locus Rht-B1b. Both Lxp-B1.1 and Rht-B1b are known to reside on chromosome 4B [101,167]. Because Lxp-B1 and Rht-B1b are both on 4BS, there could be undesirable linkage. However, these two loci showed relatively weak (r 2 = 0.31) LD, suggesting an independent segregation in our material. Pozniak et al. [101] reached a similar conclusion based on DArT marker assessment of this breeding panel.

Comparison with QTL for pigment from previous reports
Carotenoid degradation (pigment loss) during pasta processing is controlled by lipoxygenases, polyphenol oxidases and peroxidases. The wheat genes isoforms Lpx-1 and Lpx-3 are located on chromosome 4, whereas the Lpx-2 gene is located on chromosome 5 [97, 119,120,[168][169][170][171]. In developing durum kernels, different transcript levels have been reported, with Lpx-1 transcripts being the most abundant in mature grain [169]. This suggests that the Lpx-1 gene might have a major role in oxidation of carotenoid pigments during pasta processing. In support to this hypothesis, a major QTL for total lipoxygenase activity, with three copies of the Lpx-1 gene (Lpx-B1.1, Lpx-B1.2 and Lpx-B1.3) has been mapped on chromosome 4BS [97,99,120,168,172]. Selection for and fixing this allele in all breeding lines could contribute to significantly reduced pigment loss during pasta processing and, consequently, to improve the aesthetic and nutritional qualities of the pasta products.
For pasta a Ã , the four loci detected on chromosomes 2A, 3B and 4B suggest complex genetic control of pasta redness in durum wheat. To our knowledge, this is the first study of association mapping for pasta a Ã . Half of the total number of loci associated with pasta a Ã were located on chromosome 4B. In particular, locus hap_4B_6 explained 35.7% of the phenotypic variance. This locus also showed strong association with pasta b Ã and pigment loss. Pasta a Ã (redness) and pasta b Ã (yellowness) being correlated (r = 0.65), much effort should be put on breaking the LD between them to facilitate selecting against red colour pasta.

Conclusion
Our results clearly showed that genome-wide association studies could benefit from haplotype-based analysis. The haplotype approach substantially increased the polymorphism information content and detected more loci associated with semolina and pasta pigment. The amount of phenotypic variance explained and the allelic effect were also improved over single marker analysis. In particular, the locus hap_4B_6 on chromosome 4B was associated with pasta a Ã , pasta b Ã and pigment loss; and explained up to 40% of the phenotypic variation. This locus could be a good candidate for tagging the Lpx-B1 gene. On the other hand, combinations of several allelic series within each haplotype locus, as well as the aggregation of the best haplotypes improved ability to select lines having the desirable phenotypes. The use of haplotypebased analysis in comparison with single marker analysis will provide more insight about the potential of combining SNPs into haplotypes in genome-wide association studies.