Genetic Diversity and Population Structure Analysis of European Hexaploid Bread Wheat (Triticum aestivum L.) Varieties

Progress in plant breeding is facilitated by accurate information about genetic structure and diversity. Here, Diversity Array Technology (DArT) was used to characterize a population of 94 bread wheat (Triticum aestivum L.) varieties of mainly European origin. In total, 1,849 of 7,000 tested markers were polymorphic and could be used for population structure analysis. Two major subgroups of wheat varieties, GrI and GrII, were identified using the program STRUCTURE, and confirmed by principal component analysis (PCA). These subgroups were largely separated according to origin; GrI comprised varieties from Southern and Eastern Europe, whereas GrII contained mostly modern varieties from Western and Northern Europe. A large proportion of the markers contributing most to the genetic separation of the subgroups were located on chromosome 2D near the Reduced height 8 (Rht8) locus, and PCR-based genotyping suggested that breeding for the Rht8 allele had a major impact on subgroup separation. Consistently, analysis of linkage disequilibrium (LD) suggested that different selective pressures had acted on chromosome 2D in the two subgroups. Our data provides an overview of the allele composition of bread wheat varieties anchored to DArT markers, which will facilitate targeted combination of alleles following DArT-based QTL studies. In addition, the genetic diversity and distance data combined with specific Rht8 genotypes can now be used by breeders to guide selection of crossing parents.


Introduction
Hexaploid bread wheat (Triticum aestivum L.) is one of the most important cereal crops in the world, covering an area of 217 mill ha in 2010 [1]. It developed through two natural hybridizations of diploid wheat grass species. First, Triticum urartu (containing the A genome) and most probably Aegilops speltoides (containing the B genome) intercrossed [2,3]. Second, about 6000 BC, the domesticated subspecies Triticum dicoccum, cultivated emmer containing the AB genomes, intercrossed with Aegilops tauschii (goat grass, containing the D genome), resulting in hexaploid wheat (containing all three genomes, ABD) [3,4]. During the last 20 years, the increase in wheat yield in Europe has faced stagnation. Simultaneously, the demand for food is increasing due to the growing world population and the dietary changes in countries with rapidly growing economies [5,6]. In order to meet these challenges, genetic improvement based on exploitation of genetic resources is required. Nevertheless, investigation of the wheat genome has faced difficulties due to the large genome size of bread wheat (,17,000 Mb) and the high proportion (,80%) of repetitive sequences [7,8]. Therefore, adequate tools for the investigation of the bread wheat genome are essential.
Several types of molecular markers have been used for wheat genetic studies. Application of Restriction Fragment Length Polymorphism (RFLP), Randomly Amplified Polymorphism DNA (RADP), Simple Sequence Repeat (SSR) and Single Nucleotide Polymorphism (SNP) markers have provided effective genotyping [9]. One example is a study with 512 whole-genome microsatellite loci, representing a mean marker density of 5.1 cM [10]. Efficient population structure analysis requires markers to be well-distributed across the whole genome, and Diversity Array Technology (DArT) provides whole-genome fingerprints, generally with a high marker density [11][12][13]. DArT markers are biallelic dominant markers [14], hence the homozygous and heterozygous states cannot be distinguished. Knowledge about the germplasm of European bread wheat is essential to enrich genetic diversity in order to increase yield and improve other relevant traits such as disease resistance. Strong bottlenecks during domestication and intensive breeding of bread wheat have resulted in a genetically narrow germplasm [3,15,16]. Roussel et al. [16] demonstrated an increase in the genetic similarity of European varieties and a qualitative variation of allelic composition of European wheat lines over time. Furthermore, differences in allelic  composition were found between different geographical regions in Europe. These differences could be caused by different breeding practices and requirements, one being that intensive selection pressure in wheat breeding began earlier in Northern and Western Europe [16]. Additionally, differences along chromosomes can be caused by the introduction of certain germplasm in specific geographical regions. One example is the 1B/1R translocation that provided new resistance sources for wheat, and has been widely used across Europe [17,18]. Likewise, altered dwarfing genes differentiate the European wheat varieties, and dwarfing genes were crucial for the green revolution [19,20]. For example, Worland et al. [19] found the dwarfing gene Rht8 to be widespread in Southern Europe. A number of studies highlight the importance of investigating the genetic structure of a population for exploitation of genetic diversity; thereby broadening the genetic base of modern cultivars via a purposeful selection of parents [21,22]. Knowledge about genetic diversity and population structure is key to further improvements, and evaluation of diversity in germplasm is essential for the effective use of genetic resources in breeding programs. Reasons for the presence of subgroups within larger germplasm populations can include differences in geographical origin, human or environmentally driven selection or genetic drift [23,24]. To investigate the detailed genetic makeup of population subgroups, linkage disequilibrium (LD) analysis is an important tool. LD reflects the degree of linkage between loci referring to the nonrandom association of alleles at different loci. LD is affected by the number of effective recombinations between loci (when measured as D' or r 2 ) and by the number of the mutations at those loci (only when measured as r 2 ) in the history of the respective genotypes [23]. Recombinations between loci of similar allelic state have no effect on LD.
Hao et al. [10] characterized allelic diversity, LD, and population structure in a collection of 250 Chinese wheat genotypes. Their genetic structure analysis revealed two groups; one consisting of modern varieties, and one of landraces. They found that LD decayed less rapidly in modern varieties than in landraces and that the modern varieties showed a lower degree of allelic diversity, presumably because of the selection imposed by breeding. As the level of LD differs between and along chromosome arms [14,22,25] characterization of LD within subgroups could help to identify differences in selective pressures acting on specific genomic regions.
In this study, DArT markers were used for population structure analysis of a population mainly consisting of European commercial bread wheat varieties. We identified two main subgroups within our population, and their genetic diversity and LD patterns were analyzed on a genome-wide scale. Our study revealed marked variation in allele diversity as well as LD levels both within and between the two subgroups. Breeders can now take advantage of this information when choosing crossing parents for breeding, either by trying to maximize genetic diversity across whole genomes or by focusing on specific agronomic traits linked to DArT markers.

Plant Materials
A collection of seeds from 94 hexaploid wheat (Triticum aestivum L.) genotypes from 16 different countries was obtained ( Year refers to the first year in official trials. Empty cells: year unknown. ID refers to ID in PCA-plot ( Figure 4). NG refers to no genotype data. *Breeders varieties and lines have not been assigned accession number. Seeds are available from breeders on request. Some varieties were obtained from IPK gene bank (Gene bank Information System of the IPK Gatersleben) and accession-ID is given in brackets.
A Spring variety. doi:10.1371/journal.pone.0094000.t001 Table 1). Since wheat is a self-pollinated crop, the varieties were assumed to be homozygous. The genotypes were selected to allow association studies of wheat tissue culture response. Therefore, the population includes varieties with tissue culture responses ranging from poor to excellent. Since tissue culture responses were only known for a few genotypes, a number of randomly selected genotypes were also included, prioritizing European winter wheat varieties (Table 1).

DNA Extraction and Genotyping by Diversity Array Technology
Genomic DNA from the 94 different genotypes was extracted using the CTAB procedure (Cetyl Trimethyl Ammonium Bromide; [26]). Plant material was harvested at the seedling stage and freeze dried. After precipitation of the DNA with isopropanol, the DNA pellet was transferred into a new 1.5 ml reaction tube, and the DNA was washed two times with cold 75% ethanol and air dried. The final DNA was diluted with TE buffer (pH 8.0) to a concentration of 100 ng DNA per ml. The DNA was sent to Triticarte Pty Ltd (Canberra, Australia; http://www.triticarte. com.au/). Diversity array technology, version 3 wheat DArT array profiling, was performed with an array including 7,000 markers from wheat, durum and various Triticum sp (Results in Dataset S1). The marker quality was evaluated according to the individual marker related statistics as indicated by Triticarte Pty Ltd. The mean marker P-value, reproducibility and call-rate was 81, 100 and 97, respectively, thus indicating high overall marker quality. Genotyping of one accession, 'Pavone', was unsuccessful and thus the accession was excluded from the analysis. A total of 1,849 polymorphic markers were obtained from the version 3 wheat DArT array profiling of the 93 wheat genotypes. Only 1,435 of these polymorphic markers were mapped (data obtained from Triticarte Pty Ltd) and included in the dataset.

Population Structure Analysis by Bayesian Clustering
The program STRUCTURE (version 2.1) was used to estimate the number of hypothetical subpopulations (K) and to estimate the membership probability of each genotype to the subpopulations [27]. A model-based (Bayesian clustering) clustering approach was performed, where the hypothesis of one to ten subpopulations was set and a Markov chain Monte Carlo (MCMC) of 9,999 burn-in phases followed by 9,999 iterations was run independently 10 times using an admixture model. The log-likelihood of the observed data Pr (X|K) for each value of K was obtained from the structure output (described in Pritchard et al. [28]). An ad hoc quantity analysis, based on the second order rate of change of the likelihood function, was performed (method described in Evanno et al. [29]) ( Figure 1A). In the Bayesian clustering approach, Delta K peaked at a K-value (population number) of two. Thus, the population could be separated into two subgroups. Since the algorithms in STRUCTURE assume independent loci measured on randomly sampled unrelated individuals, another Bayesian clustering was done with a reduced marker set, where tightly linked loci were reduced to single haplotypes. The marker set was reduced by excluding markers with significant LD to markers already present in the set. The reduced marker set consisted of 695 markers and was used for structure analysis, AMOVA and gene diversity statistics. Also in the model-based (Bayesian clustering) clustering approach with the reduced marker set, two groups were indicated ( Figure 1B). A possible separation into seven groups was also indicated; with the two major groups were divided into three and four groups, respectively. Since only 2% more of the total molecular variance was explained (using analysis of molecular variance (AMOVA) by seven compared to two groups, further analysis was based on two groups. The AMOVA was calculated between and within groups using GenAlEx v. 6.5 [30]. Next, the kinship-matrix of Jaccard's distances [31] for the 93 genotypes was calculated based on the DArT markers using a Microsoft (Redmond, USA) Excel package programmed at the University of Copenhagen. Jaccard's dissimilarity index was calculated as follows: J'~M 01 +M 10 M 01 zM 10 zM 11 M 01 represents the number of markers where accession i has no band (0) and accession j has a band (1); M 10 represents the total number of markers where accession i has a band (1) and accession j has no band (0). M 11 represents the total number of markers where both i and j have a band (double presence of the same allele). In Jaccard's distance the case where both i and j are (0) is ignored, because this allelic state cannot be differentiated from missing data due to the dominant nature of the DArT markers. Using the same Microsoft Excel package, a Principal Coordinate Analysis (PCoA) based on the kinship distance matrix was carried out to visually check for genetic structure and genetic outliers. The variety, 'Paroli', was identified as an outlier from the PCA-plot (results not shown). Since 'Paroli' is a variety developed in Germany, it was not expected to be an outlier. Therefore, it has been assumed that the DArT analysis of this line failed, and the genotype has been discarded from further analysis. Principal component analysis (PCA) was used to visualize the subpopulations found in STRUCTURE. All graphics were done using the program R version 2.15.2 (http://cran.r-project.org). To detect markers with high influence on the grouping in the PCA, 5% of the markers (35 markers) having the highest PCA loadings were used. The 35 detected markers were confirmed by performing a General Linear Model (GLM) association between markers and the subgroup divisions. The separation into two groups found in the population structure analysis was confirmed by these two independent analyses. Both analyses revealed the same markers to influence the separation, except for two markers which were found in the GLM and not in the PCA loading analysis.

PCR Conditions and Primers
The 93 genotypes were evaluated with three different primer sets ( Table 2). Two of the PCR products were resolved using standard agarose gel electrophoresis. The SSR marker with M13 tailing (6-FAM) was separated using capillary electrophoresis on an ABI3130 Genetic Analyzer (Applied Biosystems, Foster, CA,USA).

Linkage Disequilibrium Analysis
LD between the DArT markers was estimated using the TASSEL 2.1 software. The analysis comprised the pairwise estimated squared allele-frequency correlations (r 2 ) and the significance of each pair of loci. The r 2 -values were calculated with comparison-wise significance computed using 1000 permutations. Breseghello and Sorrells [25] suggested comparison of all marker-pairs in the linkage analysis, including both intra-(pairs on the same chromosomes) and inter-(pairs between chromosomes) chromosomal pairs. In our LD analysis, only intra-chromosomal comparisons were included. To estimate the LD decay, significant r 2 -values (with p-values ,0.05) were plotted against genetic distance (cM) between the loci-pairs, and a second-degree smoothed loess curve was fitted using the program R (www.Rproject.org). The interception of the loess curve and background LD was considered as an estimate of LD decay [25]. The critical r 2 -limit in our study was not based on the method described by Breseghello and Sorrells [25], but on the assumption that all marker-pairs with a distance above 50 cM were un-linked [32]. Since only the intra-chromosomal pairs were computed, the background LD was determined by the 95th percentile of unlinked r 2 -values, referring to a LD between marker-pairs greater than 50 cM apart (described in Zhou et al. [32]).

The Density of Polymorphic DArT Markers Differs between the A, B and D Genomes
The DArT marker array included 7,000 markers of which 4,570 were placed on the genetic map ( Figure 2). Among the 4,570 informative markers, 48% were located on the B genome, 33% on the A genome and 19% on the D genome (Table S1). For all three genomes, more than 97% of the markers were mapped with a distance shorter than 10 cM, and neither the A nor B genome had gaps larger than 50 cM. However, the D genome had one large gap of 67 cM on the 4D chromosome. The B genome had one gap larger than 10 cM, while the A genome had 13 gaps larger than 10 cM. In summary, the largest and the highest number of gaps between DArT markers were found on the D genome. Of the 4,570 markers anchored on the genetic map, 1,435 were polymorphic ( Figure 3). Most of these markers were located on the B genome, followed by the A and D genomes (Table S2). Polymorphism information content (PIC) values were computed by Triticate Pty Ltd and ranged between 0.16 and 0.38 with a mean value of 0.30. Nine markers were found to be unique (private) for single genotypes (Table S3). A total number of 150 alleles occurred in 5 or less of the lines, and most of these rare

Separation into Subgroups by Structure Analysis was Correlated with Geographical Origin
To analyze the genetic diversity within the population, the relatedness of the genotypes was investigated using population structure analysis. Based on the hypothesis of two subgroups, a Qmatrix was calculated using the output of the Bayesian clustering in Structure 2.1 [28]. The Q-values indicate the level of relatedness of each genotype to the two defined subgroups ( Figure  S1). Most of the genotypes from Hungary belonged to the smaller GrI subgroup, while varieties from Western and Northern Europe were associated with the second and largest subgroup (GrII). Varieties from Italy were part of GrI, while most varieties from  France were part of GrII. Principal Component analysis (PCA) was used as an alternative way of visualizing the genotype data ( Figure 4A). The first and second principal components explained 14% and 5% of the variation, respectively. Overall, GrI and GrII were clearly separated by the PCA. AMOVA showed a significant separation of the GrI and GrII (p-value 0.001), and 14% of the total variation was accounted for by differences between groups (Table S4). Additionally, the Fixation index (F ST ) was calculated to measure the differentiation of the population in relation to genetic structure, and the results are shown in Table 3. Furthermore, the genetic diversity (H p ) levels are indicated (Table 3) and were 0.384 for the entire population, while the fixation index was 0.127.

Linkage Disequilibrium Decayed Over 23 cM
LD was calculated for the entire population and for the two subgroups. The number of intra-chromosomal marker-pairs and the number of significant marker-pairs can be seen in Table 4. A total number of 43,549 intra-chromosomal pairs were detected in the population. Mean r 2 -values for the entire population and for the two subgroups were calculated for all 21 chromosomes ( Table 5). The mean r 2 -value for the total population was found to be 0.080 for all marker-pairs and 0.271 for significant markerpairs. The LD decay is illustrated in Figure 5 (right), including the second-degree smoothed loess curve. In order to determine the LD decay, the LD threshold was assessed by estimating a background LD (or critical r 2 -limit) for each subgroup and for the total population. The intercept of the loess curve was considered as the LD decay. The LD decay for the total population was found to be 23 cM. In GrI the LD decay did not intercept the background LD; however the loess curve extended to approximately 35 cM. GrII had a LD decay of 19 cM. Furthermore, the LD analysis revealed differences between the genomes, and the D genome had the highest r 2 -value, while the B genome showed the lowest ( Table 5).
The LD analysis showed a difference in the r 2 -values of the chromosome groups between the two groups ( Table 6). Both structure groups had a higher average r 2 -value than the entire population (Table 5). GrI had the highest average r 2 -value and higher r 2 -values for the A and B genomes than GrII, while GrII had a higher r 2 -value for the D genome. To identify differences in LD between the total population and the two structure groups along the chromosome arms, the r 2 -values of all adjacent Table 4. Number of intra-chromosomal marker-pairs in the total population and in the two groups, respectively.  Table 5. Mean r 2 -value for the significant and non-significant intra-chromosomal marker-pairs in the total population plus the two groups (GrI and GrII) found in the population structure analysis. marker-pairs of all chromosomes were plotted on the genetic map ( Figures S2-S4).

Breeding for Rht8 Appears to have Contributed to Subgroup Separation
To investigate possible influences of breeding for agronomic traits on population structure, we examined the 35 markers that contributed the most to both subgroup and PCA separation. A number of these were located near markers for known traits. Specifically, 12 of the 35 markers were found on chromosome 2D near the Rht8 locus (Table S5), suggesting that breeding for Rht8 could have had a major impact on the genetic separation of the two subgroups. To verify this, the population was genotyped using a Rht8 marker located on chromosome 2D (Xgwm261, [19]). The marker produced 162, 174 and 192 bp bands. Eleven of the thirteen varieties displaying the 192 bp band belonged to GrI, and 40 of the 47 varieties showing the 174 bp band belonged to GrII ( Figure 4B and Table S6), supporting the idea that breeding for specific Rht8 alleles has contributed significantly to the genetic structure observed within our population.

The Distribution of DArT Markers Indicated that the D Genome is the Least Polymorphic
Marker-based population structure analysis requires well distributed and informative markers, which reflect the overall diversity of the genome. DArT markers have previously been found to cluster in particular regions [13], and in our study, gaps and marker clusters were also revealed. As an example, chromosome 1A, between 0 and 25 cM, showed a high density of markers (Figure 3). Nevertheless, the 4,570 mapped markers were distributed along all three wheat genomes. The number of markers was highest on the B genome and lowest on the D genome; thus, the largest marker-gaps were found on the D genome. The same distribution tendency was observed among the polymorphic markers. These observations are consistent with previously published results on the distribution of DArT markers between genomes [12,13,35]. Fewer polymorphic markers on the D genome indicate a lower frequency of effective recombinations due to a lower diversity of this genome [4,15]. This was expected, since hexaploid wheat gathered a larger proportion of genetic diversity from its tetraploid ancestors than from A. tauschii (containing the D genome) during domestication, resulting in a higher number of effective recombinations in the A and B genomes relative to the D genome [15,36]. Consequently, the highest average marker distances would also be expected on the D genome, which our study confirms (S2). The proportion of polymorphic markers is relatively low in wheat, (approximately 34% in [37]), and in our study only 26% of the DArT markers were polymorphic. The low fraction of polymorphic markers suggests a relatively narrow wheat gene pool in Europe.
Breeding for traits of agronomic importance such as yield, quality, and disease resistance has influenced allele-richness across the wheat genome [38,39]. Four parameters can indicate specific allele-richness of chromosome parts: marker density, clustering of markers, and distribution of rare or polymorphic alleles. High marker densities were previously revealed on the 2A, 3B, 3D, 4A and 7A chromosomes [38,40], and likewise clusters on the same chromosomes were seen in our study (Figure 2-3). Most rare alleles were found on the chromosomes 1A, 2B, 3B and 6A. Furthermore, the most polymorphic markers were found on 1A, 6A and 3B (data not shown). Near the location of rare alleles in our population, several disease resistance genes have been localized [41,42]. As an example, a quantitative trait locus (QTL) in bread wheat for resistance towards Septoria tritici blotch (STB) was detected on the 3B chromosome [43]. Moreover, the leaf rust (Puccinia triticina) resistance gene Lr10 has been mapped to chromosome 1A in bread wheat ( [41]). On the 2B chromosome the genes Yr32 and Yr17 for yellow rust (Puccinia sriiformis) resistance have been detected [42].

Long-range LD Likely Reflects Strong Breeding-driven Selection of European Varieties
For our population, LD decayed over 23 cM, whereas both Chen et al. [40] and Zhang et al. [22] found faster decays of 2.2 and 10 cM, respectively. The difference may be due to differences between populations and marker type and distribution. Chen et al. [40] investigated a Chinese wheat population including both modern varieties and landraces, while Zhang et al. [22] included both hard and soft winter wheat from a wide range of all regions of the United States. In contrast, our population represents a relatively limited geographical area. Furthermore, both studies used SSR markers [40,44], while DArT markers were used in our study. In both GrI and GrII, the average r 2 -value was higher than for the total population ( Table 5), indicating that more alleles are in LD in smaller population with limited geographical origin. Likewise, an extended LD decay was found in GrI compared to the total population ( Figure 5). The higher LD of GrI may be attributed to the fact that GrI holds most of the Hungarian genotypes and within these blocks of alleles with high LD may be frequent. The LD decay of GrII (containing 69% of all lines) was similar to the LD decay of the total population. This corresponds to the findings of Neumann et al. [14], where LD decay of the largest Table 6. Mean r 2 -value for the significant and non-significant intra-chromosomal marker-pairs of each chromosome group.

Variations in LD Patterns Suggest that Different Selective Pressures have Acted on the Two Population Subgroups
Previous results showed that LD differs between the chromosomes [10,22,25]. In line with the findings of Chen et al. [40] and Hao et al. [10], we found the highest levels of LD for markers located on the D genome (Table 5). It has been suggested that high LD levels can be a result of selection, which can increase the correlation among alleles at specific loci [45]. Hence, characterization of differences in LD levels between chromosomes can help to identify genomic regions, which have been subject to genetic selection. For example, relatively low r 2 -values between adjacent marker pairs on some chromosomal areas of 3A, 3B, and 2D ( Figure 6 and Figures S3-S4) were found in GrII. In contrast, a greater number of high r 2 -value marker pairs were found in GrI on these chromosomes, indicating that varieties in the two subgroups could have been subject to different selective pressures in these genomic regions.

Markers on Chromosome 2D Near the Rht8 Locus had a Major Impact on Population Structure
These candidate regions for differential selection could represent specific chromosomal areas that were targets in intensive breeding [16,46,47], and could also have an impact on the separation of population subgroups. In our study, the separation was controlled by a number of chromosome regions (Table S5). To identify agronomic traits that are linked to these chromosome areas, our results were compared to the study of Crossa et al. [46]. This comparison indicated the potential influence of selection based for a number of important agronomic traits (Table S5) including the 1B/1R locus, wheat dwarfing genes and day-length insensitivity and flowering time genes (Ppd1 and Vrn). Strikingly, one third of the markers with largest contributions to the genetic separation of subgroups were located on chromosome 2D near the Rht8 dwarfing locus (Table S5), and it is well known that dwarfing genes have been used to improve yield as one of the main strategies in modern European bread wheat breeding [48]. The strong impact on population structure of breeding for Rht8 was validated using PCR-based markers, confirming that different Rht8 alleles were found in GrI and GrII. The two PCR markers for Lr34 and the wheat-rye translocation also showed some correlation with the subgroup separation, but did not distinguish as clearly between the groups as the Rht8 marker. Lr34 is linked to Yr18 and these have provided durable resistance to leaf rust (caused by Puccinia triticina) and stripe rust (Puccinia striiformis), respectively. Moreover, the same chromosomal area has been associated with powdery mildew resistance against Blumeria graminis (DC) EO Speer f. sp. Tritici [49]. Lr34 is rare in western European wheat, and it has been used extensively in spring wheat grown in the United States [49]. This corresponds well to our results, since we detected Lr34 in a spring wheat variety from the United States (no. 93, Table 1), one from Ukraine (no. 80) and in two Hungarian lines (no 69 and 72). Several rye chromatins have been introduced into wheat lines to increase resistance to pests and pathogens [18], and are now present in some modern wheat varieties. Our results indicated a wheat-rye translocation in the variety Sleipner (No. 81, Table 1), where it has also previously been detected [50].
The microsatellite marker Xgwm261 has been widely used to detect the dwarfing gene Rht8 [19,[51][52][53], and three major alleles generating products of 165, 174 or 192 bp have been described [53,54]. These band sizes correspond well with band sizes found in our study (Table S6), except for the 162 bp band, which might be equivalent to the 165 bp band described in other studies [54]. Compared to the 174 bp variant, 162 bp alleles increase and 192 bp alleles reduce plant height [54]. The 192 bp band is associated with Rht8 dwarfing alleles, which are widespread in varieties from South-East Europe. This corresponds well with our results where we found that the majority of lines with a band size of 192 originated from Hungary and belonged to GrI ( Figure 4B). Most of the varieties with 174 bp alleles originated from Western Europe and belonged in GrII, which is in line with previous results describing 174 bp alleles in varieties from Western and Central Europe [54]. Hence, Rht8 might not be responsible for reduced plant height in most Western and Northern European bread wheat varieties. Rht8 is mapped to chromosome 2D, which was identified in our LD analysis as a candidate region for differential selection between the two population subgroups, as mentioned above. This is reflected by higher LD between adjacent marker pairs on chromosome 2D for GrI, where most varieties contain the Rht8 allele causing reduced plant height, than for GrII ( Figure 6).

Population Structure and Genotype Data can Facilitate Selection of Crossing Parents
Knowledge about population structure and underlying selection for specific traits can assist the selection of crossing parents in order to combine diverse germplasm in a breeding program. The data presented here can be exploited in several ways. Crossing parents can be selected based on their genetic distance, simply to maximize overall genetic diversity and potential for genetic gain in the progeny. It also provides an overview of the allele composition of bread wheat varieties anchored to DArT markers, which will facilitate targeted combination of alleles following DArT-based QTL studies. Finally, the identification of the Rht8 locus as a major contributing factor to the separation between the two genetic subgroups suggests the possibility of introducing Rht8 dwarfing alleles into Northern and Western European bread wheat varieties. For this purpose, the varieties Xi-19 (Table 1 no. 52, UK) and Kosack (Table 1 no. 82, Sweden) may be of particular interest. They are the only GrII varieties with the 192 bp Rht8 allele, and our PCA analysis revealed that there is a relatively large genetic distance between them and the bulk of the GrII varieties (Figure 4). Using these varieties as crossing partners for Rht8 introduction into other high yielding Northwestern European varieties could thus be a way to add genetic diversity and novel dwarfing alleles without resorting to varieties bred for very different climates. Figure S1 Genetic diversity structure of the 92 hexaploid wheat genotypes. Based on the output of a Bayesian algorithm implemented in the program STRUCTURE using the reduced set of 695 markers. Population memberships for each genotype is shown based on K being two. Bars indicate relation to GrI. Vertical line represents separation between the two groups. (TIFF) Figure S2 Linkage disequilibrium (r 2 -values) versus chromosome position for adjacent marker-pairs for the total population. The LD between pairs of adjacent loci is plotted at the locus nearest to zero. (TIFF) Figure S3 Linkage disequilibrium (r 2 -values) versus chromosome position for adjacent marker-pairs for GrI. The LD between pairs of adjacent loci is plotted at the locus nearest to zero. Shown for GrI found in population structure analysis. (TIFF) Figure S4 Linkage disequilibrium (r 2 -values) versus chromosome position for adjacent marker-pairs for GrII. The LD between pairs of adjacent loci is plotted at the locus nearest to zero. Shown for GrII found in population structure analysis. (TIFF)