Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

From landraces to improved cultivars: Assessment of genetic diversity and population structure of Mediterranean wheat using SNP markers

  • Rubén Rufo,

    Roles Data curation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing

    Affiliation Sustainable Field Crops Programme, Institute for Food and Agricultural Research and Technology (IRTA), Lleida, Spain

  • Fanny Alvaro,

    Roles Conceptualization, Resources, Writing – review & editing

    Affiliation Sustainable Field Crops Programme, Institute for Food and Agricultural Research and Technology (IRTA), Lleida, Spain

  • Conxita Royo,

    Roles Funding acquisition, Writing – review & editing

    Affiliation Sustainable Field Crops Programme, Institute for Food and Agricultural Research and Technology (IRTA), Lleida, Spain

  • Jose Miguel Soriano

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Sustainable Field Crops Programme, Institute for Food and Agricultural Research and Technology (IRTA), Lleida, Spain

From landraces to improved cultivars: Assessment of genetic diversity and population structure of Mediterranean wheat using SNP markers

  • Rubén Rufo, 
  • Fanny Alvaro, 
  • Conxita Royo, 
  • Jose Miguel Soriano


Assessment of genetic diversity and population structure in crops is essential for breeding and germplasm conservation. A collection of 354 bread wheat genotypes, including Mediterranean landraces and modern cultivars representative of the ones most widely grown in the Mediterranean Basin, were characterized with 11196 single nucleotide polymorphism (SNP) markers. Total genetic diversity (HT) and polymorphic information content (PIC) were 0.36 and 0.30 respectively for both landraces and modern cultivars. Linkage disequilibrium for the modern cultivars was higher than for the landraces (0.18 and 0.12, respectively). Analysis of the genetic structure showed a clear geographical pattern for the landraces, which were clustered into three subpopulations (SPs) representing the western, northern and eastern Mediterranean, whereas the modern cultivars were structured according to the breeding programmes that developed them: CIMMYT/ICARDA, France/Italy, and Balkan/eastern European countries. The modern cultivars showed higher genetic differentiation (GST) and lower gene flow (0.1673 and 2.49, respectively) than the landraces (0.1198 and 3.67, respectively), indicating a better distinction between subpopulations. The maximum gene flow was observed between landraces from the northern Mediterranean SPs and the modern cultivars released mainly by French and Italian breeding programmes.


Wheat is one of the most widely cultivated crops in the world, covering an area of 219 million ha with a production of nearly 772 million t in 2017 ( Of the daily intake of humans, wheat provides 19% of the calories and 21% of the protein ( It is generally accepted that to match the global population demand wheat production will need to increase by 1.7% per year by 2050 [1]. Additionally, the increasing unpredictability of the weather conditions imposed by climate change will require the release of cultivars with very high yield potential that are able to maintain acceptable yield levels and stability under a broad range of environmental conditions. In the Mediterranean Basin, the climatic regions vary greatly, including both favourable lands and drylands that are subject to frequent drought episodes and high temperature stress, particularly during grain filling [2].

Wheat is estimated to have originated around 10000 years BP in the Fertile Crescent. From there it spread to the west of the Mediterranean Basin and reached the Iberian Peninsula around 7000 years BP [3]. During this migration, domestication and selection by humans resulted in the development of landraces that were very well adapted to local environments [4]. From the middle of the 20th century, the cultivation of local landraces was progressively abandoned, as a consequence of the Green Revolution, and they were replaced by the improved and more productive semi-dwarf cultivars. Mediterranean landraces are an important group of genetic resources because of their documented resilience to abiotic stresses, their resistance to pests and diseases, and their genetic diversity [4, 5].

Knowledge of genetic diversity provides valuable information for understanding the relationships between cultivars and facilitates their characterization and classification, determination of population structure, etc., thus enriching breeding strategies for crop improvement, for example helping breeders to develop new cultivars reducing pre-breeding activities. In the last few decades, several markers have been used for genetic studies [6]. However, the high-density genome coverage provided at low cost in recent years by new high-throughput genotyping technologies such as single nucleotide polymorphism (SNP) arrays or genotyping-by-sequencing (GBS) have made them the procedures of choice for wheat genetic analysis [711].

The aim of this study was to explore the existence of genetic and/or geographic structures and genetic diversity in collections of wheat landraces from the Mediterranean Basin and modern cultivars representative of the ones most widely cultivated in the region.

Material and methods

Plant material

With the aim of representing the past and current cultivated variability in the Mediterranean Basin, the plant material consisted of a germplasm collection of 354 bread wheat (Triticum aestivum L.) genotypes, the MED6WHEAT IRTA-panel, of which 170 correspond to landraces from 24 Mediterranean countries and 184 to modern varieties cultivated in 19 countries in the region (S1 File). The landraces were selected from a larger collection comprising 730 accessions of different origins on the basis of phenology, spike characteristics and passport data. Care was taken to include enough genotypes to represent the different climatic regions existing within the Mediterranean Basin [12]. Landrace populations were provided by public gene banks from Germany (IPK, Gatersleben), Italy (ISC, S. Angelo Lodigiano), Romania (Suceava GenBank, Suceava), Russia (VIR, St. Petersburg), Spain (CRF-INIA, Madrid), the Netherlands (CGN-WUR, Wageningen) and the USA (NSGC-USDA, Aberdeen, ID). Modern cultivars were provided by public institutions (CIMMYT, ICARDA, INRA and the University of Belgrade), breeding companies and the germplasm collection of the IRTA breeding programme. Accessions were bulk-purified during two cropping cycles to select the dominant type and seed was increased in plots in the same field to ensure a common origin for all lines.

Molecular characterization

DNA isolation was performed from lyophilized leaf samples at Trait Genetics GmbH (Gatersleben, Germany). Accessions were genotyped with 13177 SNPs from the Illumina Infinium 15K Wheat SNP Chip at Trait Genetics GmbH (Gatersleben, Germany). Markers were ordered according to the SNP map developed by Wang et al. [13].

Data analysis

Polymorphic information content (PIC) values were calculated following the formula described by Botstein et al. [14] using the Cervus software v3.0.7 [15] (available at Genetic diversity was estimated as total diversity (HT) [16] using POPGENE v1.32 [17]. The coefficient of genetic differentiation, i.e. the proportion of total variation that is distributed between populations (GST), was calculated as GST = DST/HT, where DST is the genetic diversity between populations. DST was calculated as DST = HT—HS, where HS is the mean genetic diversity within populations. Gene flow was estimated as Nm = 0.5(1- GST)/GST according to McDonald and McDermott [18].

Linkage disequilibrium (LD) was estimated as the square of marker correlations (r2) for markers with known map positions using TASSEL 5.0 [19] at a significance level of P<0.001 with a sliding window of 50 cM. The intra-chromosomal r2 values were plotted against the genetic distance and a LOESS curve was fitted to determine the distance at which the curve intercepts the line of a critical value of r2 in order to estimate how fast the LD decay occurs for each chromosome. The critical value of r2 was determined as the mean r2 for each genome.

The genetic structure of the collection was estimated using the Bayesian clustering algorithm implemented in the STRUCTURE software v2.3.4 [20] using an admixture model with burn-in and Monte Carlo Markov chain for 10000 and 100000 cycles, respectively. A continuous series of K were tested from 1 to 10 in seven independent runs. The Evanno method [21] was used to calculate the most likely number of subpopulations with STRUCTURE HARVESTER software [22]. Principal coordinates analysis (PCoA) based on genetic distance was performed using GenAlEx 6.5 [23].

Diversity analysis between accessions was determined by simple matching coefficient [24] implemented in DARwin software v.6 [25]. The un-rooted tree was calculated using the neighbour-joining clustering method [26]. The tree is divided in:

  • Clusters: Main divisions of the tree
  • Branches: Division within clusters
  • Groups: Genotypes from the same subpopulation within a branch


SNP polymorphism and diversity

A total of 11196 polymorphic markers were located in the map developed by Wang et al. [13]. In order to reduce the risk of errors in further analyses, markers and accessions were analysed for the presence of duplicated patterns and missing values. For the landrace collection, 8 markers with more than 25% of missing values as well as 730 markers with minor allele frequency (MAF) lower than 5% were excluded from the analysis, leaving a total of 10458 SNPs (Table 1). For the modern collection, 3 markers with more than 25% of missing values and 487 markers with an MAF lower than 5% were excluded, leaving 10706 polymorphic markers. The total number of polymorphic markers was 11074, of which 10090 (91%) were polymorphic in both collections, 368 only in landraces and 616 only in modern cultivars.

Table 1. Number of SNP markers (N), gene diversity (HT), polymorphic information content (PIC) and linkage disequilibrium (LD) (r2, % of markers in LD at P<0.001, and LD decay in cM) for all of the chromosomes and genomes in both types of germplasm.

The D genome had the lowest number of markers, whereas the B genome showed the best coverage (Table 1). Genetic diversity (HT) and PIC were estimated for each chromosome. For the landraces, HT ranged from 0.39 (1A, 4B) to 0.25 (4D) (mean 0.36) and PIC ranged from 0.33 (1A) to 0.21 (4D) (mean 0.30). For the modern collection, HT ranged from 0.40 (2D, 6D) to 0.29 (7D) (mean 0.36) and PIC ranged from 0.33 (1D, 2D) to 0.25 (7D) (mean 0.30) (Table 1). A summary statistics is reported in S2 File.

Linkage disequilibrium

LD was determined (r2) for single chromosomes. In the landraces, LD ranged from 0.07 in chromosome 7D to 0.34 in chromosome 1D, with a mean among chromosomes of 0.14. The percentage of locus pairs showing a significant LD at P<0.001 ranged from 16% to 59% for chromosomes 7D and 2D, respectively, with a mean of 33% (Table 1). In modern cultivars the mean value of r2 was 0.18, with 45% of the locus pairs in LD. Chromosome 7D, as reported for landraces, showed the lowest LD (0.08), with 21% of the locus pairs in LD at P<0.001 (Table 1). The maximum LD was found in chromosome 1D (0.41), showing 60% of the locus pairs with a significant LD at P<0.001. However, as reported for the landraces, the chromosome with the most locus pairs with a significant LD was 2D (68%, r2 = 0.4).

The extent of LD was also investigated for the three genomes. The highest r2 was found in the D genome for both landraces and modern cultivars (0.22 and 0.20, respectively). The D genome had the most markers with significant LD at P<0.001 (39%) in the landraces, whereas in the modern cultivars the B genome had the most, with 51% of the locus pairs showing a significant LD at P<0.001 (Table 1). The decay of LD varied for each chromosome. For the A and B genomes, LD decay ranged from 1 to 9 cM, whereas for the D genome it ranged from 1 to 10 cM (Table 1, S3 File).

Population structure

The first analysis of the population structure of the 354 accessions was carried out using 557 common SNPs markers between landraces and modern cultivars evenly distributed across the genome. The markers were selected according to the distance of the LD decay for each chromosome in order to avoid the use of markers with a significant LD. The Bayesian clustering method using the Evanno test [21] to infer the most likely number of structured subpopulations (ΔK) revealed the presence of two distinct subpopulations, one including landraces and the other including modern cultivars (data not shown). On the basis of this result, a subsequent analysis of population structure was performed independently for the landraces and modern cultivars.

The highest value of ΔK for the landraces was observed for K = 2 (2861), followed by K = 3 (1031) (Fig 1A). Population structure for both K = 2 and K = 3 showed a geographical pattern. For K = 2 the landraces were separated following an east-west pattern within the Mediterranean Basin, whereas for K = 3 landraces mainly from the Balkan Peninsula were separated either from eastern Mediterranean ones or from French and Italian ones included in the western Mediterranean. The subpopulations (SPs) were classified as western (SP1), northern (SP2) and eastern Mediterranean (SP3) (Fig 1B) according to the geographical region of the countries most represented in the SP. The inferred population structure for K = 3 showed that 45% of the accessions (77 out of 170) showed a strong membership coefficient (q-value) to one of the SPs (q>0.7), whereas using a moderate q-value (>0.5) the number of accessions within an SP increased to 144 (85%), leaving 26 as admixed (S1 File). The western Mediterranean SP (SP1) included 43 accessions, of which 28 corresponded to western countries (Spain, Portugal, France, Morocco, Algeria and Tunisia), with 23 of them having a mean q-value greater than 0.8 (Table 2). SP1 also included 10 landraces (23%) from eastern countries and 6 (14%) from northern countries. The northern Mediterranean SP (SP2) included 59 accessions, with 68%, 24% and 8% corresponding to northern (Albania, Bosnia and Herzegovina, Croatia, France, Greece, Italy, Macedonia, Romania and Serbia), western and eastern Mediterranean countries, respectively (Table 2 and Fig 1B and 1C). Finally, the eastern Mediterranean SP (SP3) included 42 accessions, of which 86% corresponded to eastern countries (Cyprus, Egypt, Jordan, Iraq, Lebanon, Libya, Syria and Turkey) and 12% and 2% to western and northern countries, respectively (Table 2). As shown in Table 2 and Fig 1, the northern Mediterranean SP included accessions from Balkan countries, but also French, Italian and Spanish landraces. On the other hand, three accessions from Israel belonged to the western Mediterranean SP, but their q-values (average q = 0.605) were lower than those of the accessions from western Mediterranean countries. Also, 40% of the accessions from Tunisia belonged to SP2. Accessions from Portugal and Spain were distributed in SP1 and SP2. As shown in Table 2, the inconsistencies observed for these countries could be related to the low q-values for the most represented SP. The three Portuguese accessions included in SP2 had a mean q-value of 0.592, and the Spanish and Greek accessions showed q-values slightly higher than 0.5 for the two SPs (Table 2), denoting a high level of admixture in all of them.

Fig 1. Genetic structure of Mediterranean landraces.

(A) Estimation of the number of subpopulations (SPs) according to the Evanno test. (B) Inferred structure of the landrace collection based on 170 genotypes. Each individual is represented by a coloured bar with length proportional to the estimated membership to each of the three SPs. (C) Geographic distribution of the wheat subpopulations within the Mediterranean Basin. Circles indicate the proportion of each SP in the country. Red, SP1 (western Mediterranean); green, SP2 (northern Mediterranean); blue, SP3 (eastern Mediterranean); grey, admixture. Map source:

Table 2. Mean membership coefficients across the landraces from each country included in each SP.

Only structured genotypes (q>0.5) are shown in the table. q-values >0.7 are indicated in bold.

Following the Evanno test, as previously reported for landraces, the most likely number of structured SPs (ΔK) for modern cultivars was K = 2 (728) (Fig 2A). The first cluster grouped mainly cultivars developed by French and Italian breeding programmes, and the second one grouped mainly cultivars from North Africa, the Middle East and Spain, with evident CIMMYT and ICARDA genetic background (S1 File). A set of cultivars mainly from Serbia remained admixed. If the second highest value for the structured subpopulations was chosen (K = 3) (105), a third SP clustered all Serbian accessions, plus 2 from Macedonia and 1 from Hungary (Fig 2B and 2C, S1 File). According to the classification into three structured subpopulations, 79% of the accessions (145 out of 184) showed a q >0.7. The first SP (SP4) included 73 cultivars from France, 10 from Italy and 1 from Serbia. The second SP (SP5) included cultivars from eastern Europe, mainly Serbia (21 out of 24). The third SP (SP6) included 33 cultivars from Spain, 15 cultivars from North African countries, 12 from the Middle East and Central Asia (Afghanistan), and finally 4 from northern Mediterranean countries (France and Italy). Finally, 12 cultivars remained as admixed.

Fig 2. Genetic structure of the modern cultivars.

(A) Estimation of the number of subpopulations (SPs) according to the Evanno test. (B) Inferred structure of the collection based on 184 genotypes. Each individual is represented by a coloured bar with length proportional to the estimated membership to each of the three subpopulations. (C) Proportion of cultivars from the different countries/regions within each SP. Yellow, SP4; blue, SP5; violet, SP6.

The relationships between landraces and modern cultivars were also analysed by PCoA as a complementary way to visualize their clusters. In agreement with the results shown by STRUCTURE, the first two coordinates of the PCoA clearly separated the landraces from the modern cultivars, and within each group accessions were clustered matching the results of STRUCTURE (Fig 3). Landraces from the Balkan Peninsula ‘TRI 1667’ and ‘TRI 1671’ (Albania), ‘Moriborska’ (Bosnia and Herzegovina), ‘41-II/4-B’ and ‘Legan Bezosja’ (Serbia), and ‘Solonetu Nou’ (Romania) were positioned on the positive side of the PCoA1 close to the origin of the axes, together with the modern Serbian cultivars. Additionally, two landraces from Italy (‘TRI 16900’ and ‘TRI 16516’) and three from France (‘Mounton a Epi Rouge’, ‘TRI 14046’ and ‘TRI 17938’) were located within the modern cultivars from SP4 that grouped modern French and Italian cultivars.

Fig 3. Principal coordinates analyses based on genetic distance.

A) Landraces: red, SP1; green, SP2; blue, SP3; dark grey, admixed; light grey, modern cultivars. B) Modern cultivars: yellow, SP4; dark blue, SP5; violet, SP6; dark grey, admixed; light grey, landraces.

The six SPs showed a total genetic diversity (HT) ranging from 0.2651 for modern cultivars from the Balkans and eastern Europe (SP5) to 0.3690 for northern Mediterranean landraces (SP2) (Table 3). The genetic diversity among SPs (DST) was low (0.0706), resulting in a genetic differentiation (GST) among SPs of 0.1838. This means that only about 18% of the variability observed was due to differences between SPs. The estimation of GST also allowed us to estimate the gene flow (Nm) among SPs. The value of this estimate (2.22) indicates a high level of gene exchange, which denotes a low genetic differentiation among the SPs. When analysed by type of accessions, the landraces showed a GST = 0.1198 and an Nm = 3.67, whereas the modern cultivars showed a GST = 0.1673 and an Nm = 2.49, indicating higher gene exchange between the landrace SPs (Table 3). When comparisons were made between two SPs, gene flow ranged from 2.53 between western Mediterranean landraces (SP1) and modern Balkan cultivars (SP5) to 9.39 between northern Mediterranean landraces (SP2) and cultivars developed mainly by French and Italian breeding programmes (SP4) (Table 3).

Table 3. Genetic diversity and variation between the MED6WHEAT subpopulations (SPs).

Cluster analysis

To better detail the kinship among accessions, a neighbour-joining tree was built using the common SNPs markers between the landraces and modern cultivars. The dendrogram showed two main clusters with a robust separation between them (Fig 4). Within each of these clusters, accessions were mainly grouped in agreement with the groups obtained previously by STRUCTURE and PCoA analysis (Fig 4).

Fig 4. Un-rooted neighbour-joining dendrogram.

Colours of branches correspond to the SPs obtained by STRUCTURE analysis. Landraces (LR): red, SP1; green, SP2; blue, SP3. Modern cultivars (M): yellow, SP4; dark blue, SP5; violet, SP6.

The first cluster of modern cultivars (M_Q1) included cultivars from France and Italy (SP4), in addition to three landraces: ‘Bahatane’ and ‘TRI17938’ from France and ‘TRI16516’ from Italy. The second cluster of modern cultivars (M_Q2) represented mainly the accessions carrying CIMMYT and ICARDA genetic background (SP6), including cultivars developed in Turkey, Spain, Egypt, Syria, Morocco, Algeria, France, Italy, Afghanistan, Sudan and Tunisia. This cluster also included the French cultivar ‘Boticelli’ classified in SP4 but including 37% of the genetic background from SP6. Finally, the third cluster of modern cultivars (M_Q3) contained the group of all elite cultivars from the Balkan Peninsula and eastern Europe (SP5): Serbia (23), Macedonia (2) and Hungary (1). This cluster also included the Syrian landrace ‘Salamuni-A’, classified as admixed by STRUCTURE.

The clustering of landraces suggested a more complex distribution among the SPs according to the higher frequency of admixture revealed by STRUCTURE. The first cluster (LR_Q1) showed three branches, the first one with landraces mainly from the western Mediterranean (SP1), with Morocco as the best-represented country of the branch (45% of the accessions). The second country represented in this branch was Algeria (14%), and the remaining accessions from Cyprus, Egypt, Greece, Iraq, Portugal, Spain, Syria, Tunisia and Turkey were represented in a lower frequency (3%-6%). The second and third branches included mainly eastern Mediterranean landraces, with 81% of the accessions coming from Egypt, Iraq, Jordan, Syria and Turkey. The remaining accessions were from western (Algeria, 11% and Tunisia, 4%) and northern (France, 4%) Mediterranean countries.

The second cluster (LR_Q2) was represented by landraces from the three SPs in two main branches. The first branch included a division between genotypes from the eastern (SP3) and western (SP1) Mediterranean SPs. The eastern group was represented mainly by Turkish landraces (46%). An additional 27% was represented by landraces from Cyprus, Iraq, Jordan and Lebanon, and finally the remaining cultivars were previously grouped by structure analysis in the northern Mediterranean SP (SP2). The second group of the branch was composed mainly of landraces from the eastern Mediterranean countries Cyprus (10%), Israel (50%), Jordan (20%) and Syria (10%), although the Bayesian clustering determined that their structure was more similar to that of landraces from the western Mediterranean SP. Only one landrace from Morocco was included in this group. The second cluster included landraces from the western and northern Mediterranean SPs. The first branch included most of the accessions from western Mediterranean countries (Morocco, Portugal, Spain and Tunisia) and one accession from Libya. In this branch, 50% of the accessions showed high levels of admixture. The second branch of the cluster was divided into two groups, the first including mostly landraces classified as western Mediterranean by STRUCTURE. The group included cultivars from Algeria, Greece, Italy, Morocco and Turkey. The second group was mainly composed of northern Mediterranean landraces (71%) from Albania, France, Italy, Macedonia and Serbia. Although included by structure analysis in this SP, Portugal, Spain and Tunisia from the western Mediterranean and Lebanon from the eastern Mediterranean were also included in this group.

Finally, the remaining landraces were included in four branches (LR_Q3) of the main clusters examined above, with 76% corresponding to northern Mediterranean countries (Albania, Bulgaria, Bosnia and Herzegovina, France, Greece, Croatia, Italy, Romania and Serbia). Within these branches, five modern cultivars were also included: the Spanish cultivars ‘Montcada’ and ‘Montserrat’, the Turkish cultivars ‘Ata 81’ and ‘Cumhuriyet 75’, and the Serbian cultivar ‘KG 100’.


Genetic diversity

Genetic diversity is essential for plant breeding because it provides new knowledge for improving cultivars. In wheat, the genetic diversity was narrowed down during the second half of the 20th century as a consequence of the introduction of high-yielding improved semi-dwarf cultivars. Several studies, reviewed in Lopes et al. [27], have considered landraces to be a source of lost variability that can provide favourable genes to improve modern cultivars. However, in a recent study of the genetic structure of durum wheat Mediterranean landraces, Soriano et al. [6] reported that the great genetic distance estimated between modern cultivars and landrace populations denotes a low use of durum landraces by durum wheat breeding programmes. The knowledge of the genetic diversity within landrace populations will be of special interest for designing new crosses with commercial varieties in order to widening the variability of the new genotypes. As it is reported in Lopes et al. [27] the monitoring of the genetic diversity of landraces is an approach to increase the frequency of rare alleles in breeding programs to find new allelic variation of genes of interest.

Among the 11196 polymorphic markers in the MED6WHEAT panel with a known genetic position according to the map of Wang et al. [13], the B genome had the highest number of SNPs and the D genome the lowest, as reported by Alipour et al. [7] and Eltaher et al. [11]. The collection of Mediterranean landraces showed the lowest PIC and gene diversity values for the D genome, according to the findings of Lopes et al. [28]. Alipour et al. [7] and Eltaher et al. [11] related the low polymorphism in the D genome to the recent evolutionary history of this genome in comparison with the A and B genomes [29]. However, when the collection of modern cultivars was analysed, we found no differences between PIC values for the three genomes, while the D genome showed a slightly high gene diversity. Trethowan and Mujeeb-Kazi [30] and Jia et al. [31] concluded that higher diversity in the D genome may provide new elite and desirable alleles controlling important traits for dealing with climate change.

The average PIC value for both the landraces and the modern cultivars was 0.30. This value is in agreement with previous studies using bi-allelic markers such as SNP or DArT in either common or durum wheat. In common wheat, Lopes et al. [28] found a PIC value of 0.24 for the WAMI population genotyped with the 9K SNP array. Novoselović et al. [32] characterized a Croatian collection with 1229 DArT markers with an average PIC value among the populations of 0.30. Alipour et al. [7] genotyped a diversity panel of 369 Iranian landraces using 16506 GBS-based SNPs, reporting an average PIC of 0.172. El-Esawi et al. [33], using Austrian and Belgian wheats, found PIC values of 0.33 and 0.29, respectively, with 1052 DArT markers. Finally, Eltaher et al. [11], in an F3:6 Nebraska winter wheat population genotyped with 25566 SNPs generated by GBS, found a PIC value of 0.25. In durum wheat, similar PIC values have been reported (Baloch et al. [8], 0.26 and 0.30 using DArTseq and SNPs, respectively; Kabbaj et al. [10], 0.32). The SNP markers in our panel were moderately informative according to the classification of average PIC values into the three categories proposed by Botstein et al. [14]: highly informative (PIC>0.5), moderately informative (0.25<PIC<0.5) and slightly informative (PIC<0.25). Previous studies using SSR markers revealed higher levels of polymorphism. In a previous study of our group, Soriano et al. [6], using a panel of 192 durum wheat genotypes (mainly Mediterranean landraces) genotyped with 44 SSR markers, found an expected heterozygosity of 0.71. Similar results with SSRs have been reported in durum wheat [34, 35] and bread wheat [3638]. The lower PIC value obtained with SNPs or DArTs than with SSR markers may be explained by their bi-allelic nature, which means that the maximum attainable PIC is 0.5 when the two alleles have the same frequency [11, 39].

Linkage disequilibrium and population structure

Linkage disequilibrium is defined as the non-random association of alleles at different loci and decays rapidly with genetic distance. Thus, determining the LD decay over physical and genetic distance within a panel of genotypes is an important step for determining the resolution and marker density required for association studies. Moreover, LD is influenced by population structure due to stratification and unequal distribution of alleles within groups of genotypes, which can result in false associations [20].

In the current study, the mean r2 calculated for intra-chromosomal loci was 0.12 and 0.18 with 32% and 46% of the locus pairs in LD for landraces and modern cultivars, respectively. It is well known that crops gradually lose their genetic variability through domestication and breeding, resulting in more uniform cultivars, reducing the recombination rate and affecting LD [40, 41]. The D genome showed the highest r2 for both landraces and modern cultivars. Similar results were reported by Chao et al. [42] and Lopes et al. [28], who explained that higher LD in the D genome was linked to recent introgressions and population bottlenecks in the origin of hexaploid wheat.

The first attempt to dissect the genetic structure of the MED6WHEAT panel showed a clear separation based on historical breeding periods, i.e. landraces vs modern cultivars. Thus, for subsequent analysis of population structure, independent analyses were carried out for the two groups of germplasm. According to a Bayesian-based analysis, without a priori assignment of accessions to populations, the landraces showed a geographic structure according to the eastern and northern zones of the Mediterranean Basin, whereas accessions classified as western Mediterranean showed a high level of admixture. This classification denoted a migration from the centre of wheat domestication in the Fertile Crescent to the west of the Mediterranean Basin, as reported by Moragues et al. [43] and Soriano et al. [6] in durum wheat. The higher admixture found in western Mediterranean landraces may be due to the incorporation and fixation of favourable alleles from eastern and northern genetic pools during the migration process. By contrast, for modern cultivars the separation was mainly based on the pedigree of the accessions: CIMMYT/ICARDA, cultivars obtained mainly by French breeding programmes, and accessions from the Balkan Peninsula. These groups may have originated through the sharing of germplasm from different breeding programmes with similar growing conditions, particularly from the shuttle breeding carried out by international centres. For the landrace collection, only 45% of the accessions showed a strong q-value (>0.7), suggesting high levels of admixture among SPs, whereas for the modern cultivars 79% of the accessions showed a strong q-value. Oliveira et al. [44] suggested that admixture is the result of the incorporation of alleles from more than one gene pool because of the spread of wheat from different ancestral populations. Moragues et al. [43] proposed as a possible cause of admixture the exchange of germplasm between different Mediterranean regions during the expansion of the Arabian Empire. The low level of admixture between modern cultivars could be due to the development by breeding programmes of cultivars with specific adaptation to the local environments and the use of different genetic resources.

Based on the defined SPs, the results of PCoA and neighbour-joining were in agreement with those reported by STRUCTURE, showing first a robust separation between the landraces and modern cultivars, and within these main clusters a separation into three genetic SPs.

The origin of the axes in the PCoA showed a mixture between landraces and modern cultivars, as also reported by Oliveira et al. [45]. Landraces from the Balkan Peninsula co-localized with modern cultivars from the same region, and two landraces from Italy and three from France were located close to modern cultivars from those countries. A possible cause of this mixture could be the presence of these landraces in the pedigree of the modern cultivars, as reported for the French landrace ‘Mounton a Epi Rouge’, which according to Bonjean [46] played an important role in the pedigrees of improved French cultivars between 1965 and 1975.

According to the admixture revealed by STRUCTURE analysis, the modern cultivars showed well-defined clusters with differentiation among SPs in the neighbour-joining tree, whereas for the landraces the branches included accessions from different SPs. Only four genotypes from a given SP were misclassified within the clusters of modern cultivars. Within M_Q1, formed mainly by French and Italian cultivars, two French and one Italian landraces were also included. Cluster M_Q2, which grouped most cultivars carrying CIMMYT and ICARDA genetic background, also included the French cultivar ‘Boticelli’, which, although belonging to SP4, has an important genetic background (37%) from SP6. Cluster M_Q3 included a landrace from Syria, classified as admixed, with 30% of its genetic background from the northern Europe SP. The presence of landraces within modern cultivars was reported in a global durum wheat panel by Kabbaj et al. [10], who concluded that the simplest explanation was that they were not true landraces, but old tall cultivars wrongly labelled during the collecting mission by the gene banks. This seems a plausible explanation, considering that the first breeding attempts made by pioneer breeders or entrepreneurial Mediterranean agriculturalists consisted in identifying and isolating the best lines already existing within original wheat landraces [47]. Alternatively, the grouping within elite cultivars was probably due to the fact that they were used in breeding programmes to enlarge the genetic diversity, as reported for grain legumes by Sharma et al. [48].

When landraces were analysed by hierarchical clustering, a higher level of admixed genotypes was found on the basis of the STRUCTURE classification. Although the main groups within the clusters were formed mostly by members of specific SPs, a discrepancy between the classifying methods was observed among groups of the same cluster. As reported for clusters LR_Q1 and LR_Q2, some of the landraces misclassified by STRUCTURE according to their country of origin or with a high level of admixture were grouped by neighbour-joining into clusters containing accessions from the same geographical region. In LR_Q3, five modern cultivars were also grouped with landraces, probably due to the presence in their pedigree of genetic background from landraces or closely related accessions. However, the closed pedigree of most commercial cultivars did not allow us to clarify this.

These results highlight the importance of using different approaches to determine the genetic structure of a germplasm collection. Although the different methods are coincident for the genotypes with strong genetic membership to a given group, they are useful to complement the information provided when accessions show large admixture or gene flow among different geographical regions, as in the case of Mediterranean landraces.

Gene flow

Genetic differentiation and gene flow provide information about population differentiation. Gene flow homogenizes populations by genetically decreasing variance among populations and increasing variance within populations. In our study, the analysis of genetic differentiation and gene flow indicated that the majority of the genetic variation was explained by differences among cultivars within genetic SPs. Differentiation of modern cultivar SPs was higher than that of landrace SPs, indicating a lower level of gene exchange among cultivars from different origins. These results are in agreement with population structure and neighbour-joining clustering, in which a higher level of admixture was found for landraces from different geographical regions. Accordingly, it has also been suggested that the low genetic differentiation among SPs is due to seed exchange by farmers, mainly influenced by geographic distances [49, 50]. When two SPs were compared, in general gene flow between landrace SPs was also higher than between modern cultivars. Within landraces, the highest gene flow was found between the northern Mediterranean SP and the western and eastern Mediterranean SPs. The value was lower for the exchange between the western and eastern Mediterranean SPs, supporting the hypothesis of geographic distance. When the modern cultivars were analysed, gene flow showed a higher value between SP4 (France and Italy accessions) and SP5 (Balkan accessions). Cultivars with a CIMMYT/ICARDA origin (SP6) had lower values of gene exchange with the other SPs, probably because of the delivery of improved inbred lines to be released by local programmes through the nurseries that these international centres distribute globally. In the case of modern cultivars, the SPs reflected similarities between the genetic pools managed by the breeding programmes conducted in each specific country. The highest gene flow value was reported between SPs from different periods, i.e. between landraces from the northern Mediterranean (SP2) and modern cultivars released by French and Italian breeding programmes (SP4), suggesting the presence of the genetic background of landraces or old cultivars in the improved modern varieties.

Concluding remarks

The current study aimed to explore the presence of genetic and geographic structures in a collection of bread wheat landraces and modern cultivars representing the variability existing for the species in the Mediterranean Basin. The results demonstrated the usefulness of the methodologies employed for achieving this goal. The structure for landraces showed a geographical pattern with different levels of admixture, mainly justified by physical distances between the territories where they were collected, whereas the structure for modern cultivars reflected differences and similarities between the genetic pools managed by the breeding programmes operating in the region.

The results reported in the current study may be of special interest for driving the development of new cultivars with desirable traits for the climatic conditions of the Mediterranean Basin, and for identifying useful molecular markers through genome-wide association studies to assist breeding programmes.

Supporting information

S2 File. Summary statistics for HT, PIC and LD for each one of the chromosomes.



This study was funded by project AGL2015-65351-R of the Spanish Ministry of Economy and Competitiveness. RR is a recipient of a PhD grant from the Spanish Ministry of Economy and Competitiveness. JMS was hired by the INIA-CCAA programme funded by INIA and the Generalitat de Catalunya. The authors acknowledge the contribution of the CERCA Program (Generalitat de Catalunya). Thanks are given to Marta Lopes (previously at CIMMYT, Ankara, Turkey), Miguel Sánchez (ICARDA, Rabat, Morocco) and Dejan Dodig (Maize Research Institute, Zemun Polje, Belgrade, Serbia) for providing part of the modern germplasm used in the study, and the different gene banks for providing landrace populations.


  1. 1. Leegood RC, Evans JR, Furbank RT. Food security requires genetic advances to increase farm yields. Nature. 2010; 464: 831. pmid:20376125
  2. 2. Royo C, Maccaferri M, Álvaro F, Moragues M, Sanguineti MC, Tuberosa R, et al. Understanding the relationships between genetic and phenotypic structures of a collection of elite durum wheat accessions. Field Crops Res. 2010; 119: 91–105.
  3. 3. Feldman M. Origin of cultivated wheat. In: Bonjean AP, Angus WJ (eds) The world wheat book: a history of wheat breeding. Lavoisier Publishing, Paris. 2001; pp 3–56.
  4. 4. Nazco R, Villegas D, Ammar K, Peña RJ, Moragues M, Royo C. Can Mediterranean durum wheat landraces contribute to improved grain quality attributes in modern cultivars? Euphytica. 2012; 185: 1–17.
  5. 5. Moragues M, Zarco-Hernández J, Moralejo MA, Royo C. Genetic diversity of glutenin protein subunits composition in durum wheat landraces [Triticum turgidum ssp. turgidum convar. durum (Desf.) MacKey] from the Mediterranean Basin. Gen. Res. and Crop Evol. 2006; 53: 993–1002.
  6. 6. Soriano JM, Villegas D, Aranzana MJ, García del Moral LF, Royo C. Genetic structure of modern durum wheat cultivars and Mediterranean landraces matches with their agronomic performance. PLoS ONE. 2016; 11(8): e0160983. pmid:27513751
  7. 7. Alipour H, Bihamta MR, Mohammadi V, Peyghambari SA, Ba G, Zhang G. Genotyping-by-Sequencing (GBS) revealed molecular genetic diversity of Iranian wheat landraces and cultivars. Front. Plant Sci. 2017; 8: 1293. pmid:28912785
  8. 8. Baloch FS, Alsaleh A, Shahid MQ, Çiftçi V, Sáenz de Miera LE, Aasim M, et al. A whole genome DArTseq and SNP analysis for genetic diversity assessment in durum wheat from Central Fertile Crescent. PLoS ONE. 2017; 12: e0167821. pmid:28099442
  9. 9. Joukhadar R, Daetwyler HD, Bansal UK, Gendall AR, Hayden MJ. Genetic diversity, population structure and ancestral origin of Australian wheat. Front. Plant Sci. 2017; 8: 2115. pmid:29312381
  10. 10. Kabbaj H, Sall AT, Al-Abdallat A, Geleta M, Amri A, Filali-Maltouf A, et al. Genetic diversity within a global panel of durum wheat (Triticum durum) landraces and modern germplasm reveals the history of alleles exchange. Front. Plant Sci. 2017; 8: 1277. pmid:28769970
  11. 11. Eltaher S, Sallam A, Belamkar V, Emara HA, Nower AA, Salem KFM, et al. Genetic diversity and population structure of F3:6 Nebraska winter wheat genotypes using genotyping-by-sequencing. Front. Genet. 2018; 9: 76. pmid:29593779
  12. 12. Royo C, Nazco R, Villegas D. The climate of the zone of origin of Mediterranean durum wheat (Triticum durum Desf.) landraces affects their agronomic performance. Genetic Resources and Crop Evolution. 2014; 61: 1345–1358.
  13. 13. Wang S, Wong D, Forrest K, Allen A, Chao S, Huang BE, et al. Characterization of polyploid wheat genomic diversity using a high-density 90000 single nucleotide polymorphism array. Plant Biotechnology Journal. 2014; 12: 787–796. pmid:24646323
  14. 14. Botstein D, White RL, Sholnick M, David RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 1980; 32: 314–331. pmid:6247908
  15. 15. Marshall TC, Slate J, Kruuk LEB, Pemberton JM. Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology. 1998; 7: 639–655. pmid:9633105
  16. 16. Nei M. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA. 1973; 70: 3321–3323. pmid:4519626
  17. 17. Yeh FC, Boyle TJB. Population genetic analysis of co-dominant and dominant markers and quantitative traits. Belgian Journal of Botany. 1997; 129: 157.
  18. 18. McDonald BA, McDermott JM. Population genetics of plant pathogenic fungi. Bioscience. 1993; 43: 311–319.
  19. 19. Bradbury P J, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007; 23: 2633–2635. pmid:17586829
  20. 20. Pritchard J, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000; 155: 945–959. pmid:10835412
  21. 21. Evanno G, Regnaut S, Goude J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005; 14: 2611–2620. pmid:15969739
  22. 22. Earl DA, von Holdt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet. Resour. 2012; 4: 359.
  23. 23. Peakall R, Smouse PE. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics. 2012; 28: 2537–2539. pmid:22820204
  24. 24. Sokal R, Michener CA. statistical method for evaluating systematic relationships. Science bulletin. 1958; 38: 22.
  25. 25. Perrier X, Flori A, Bonnot F. Data analysis methods. In: Hamon P, Seguin M, Perrier X, Glaszmann JC (Eds) Genetic diversity of cultivated tropical plants. Enfield Science Publishers, Montpellier. 2003: pp 43–76.
  26. 26. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987; 4: 406–425. pmid:3447015
  27. 27. Lopes MS, El-Basyoni I, Baenziger S, Singh S, Royo C, Ozbek K, et al. Exploiting genetic diversity from landraces in wheat breeding for adaptation to climate change. Journal of Experimental Botany. 2015a; 66: 3477–3486. pmid:25821073
  28. 28. Lopes MS, Dreisigacker S, Peña RJ, Sukumaran S, Reynolds MP. Genetic characterization of the wheat association mapping initiative (WAMI) panel for dissection of complex traits in spring wheat. Theor. Appl. Genet. 2015b; 128: 453–464. pmid:25540818
  29. 29. Berkman PJ, Visendi P, Lee HC, Stiller J, Manoli S, Lorenc MT, et al. Dispersion and domestication shaped the genome of bread wheat. Plant Biotechnol. J. 2013; 11: 564–71. pmid:23346876
  30. 30. Trethowan R, Mujeeb-Kazi A. Novel germplasm resources for improving environmental stress tolerance of hexaploid wheat. Crop Sci. 2008; 48: 1255–1265.
  31. 31. Jia J, Zhao S, Kong X, Li Y, Zhao G, He W, et al. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature. 2013; 496: 91–95. pmid:23535592
  32. 32. Novoselović D, Bentley AR, Šimek R, Dvojković K, Sorrells ME, Gosman N, et al. Characterizing Croatian wheat germplasm diversity and structure in a European context by DArT markers. Front. Plant Sci. 2016. 7; 184. pmid:26941756
  33. 33. El-Esawi MA, Witczak J, Abomohra AE, Ali HM, Elshikh MS, Ahmad M. Analysis of the genetic diversity and population structure of Austrian and Belgian wheat germplasm within a regional context based on DArT markers. Genes. 2018; 9: pii: E47. pmid:29361778
  34. 34. Achtar S, Moualla MY, Kalhout A, Roder MS, MirAli N. Assessment of genetic diversity among Syrian durum (Triticum ssp. durum) and bread wheat (Triticum aestivum L.) using SSR markers. Russ. J. Genet. 2010; 46: 1320–1326.
  35. 35. Le Couviour F, Faure S, Poupard D, Flodrops Y, Dubreuil P, Praud S. Analysis of genetic structure in a panel of elite wheat varieties and relevance for association mapping. Theor. Appl. Genet. 2011; 123: 715–727. pmid:21667038
  36. 36. Zhang DD, Bai GH, Zhu CS, Yu JM, Carver BF. Genetic diversity, population structure and linkage disequilibrium in U.S. elite winter wheat. The Plant Genome. 2010; 3: 117–127.
  37. 37. Hao CY, Wang LF, Ge HM, Dong YC, Zhang XY. Genetic diversity and linkage disequilibrium in Chinese bread wheat (Triticum aestivum L.) revealed by SSR markers. PLoS ONE. 2011; 6: 1–13. pmid:21365016
  38. 38. Chen X, Min D, Yasir TA, Hu YG. Genetic diversity, population structure and linkage disequilibrium in elite Chinese winter wheat investigated with SSR markers. PLoS ONE. 2012; 7: e44510. pmid:22957076
  39. 39. Chesnokov YV, Artemyeva AM. Evaluation of the measure of polymorphism information of genetic diversity. Agricultural Biology. 2015; 5: 571–578.
  40. 40. Flint-García SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annual Review of Plant Biology. 2003; 54: 357–374. pmid:14502995
  41. 41. Laidò G, Marone D, Russo MA, Colecchia SA, Mastrangelo AM, De Vita P, et al. Linkage disequilibrium and genome-wide association mapping in tetraploid wheat (Triticum turgidum L.). PLoS ONE. 2014; 9(4): e95211. pmid:24759998
  42. 42. Chao S, Dubcovsky J, Dvorak J, Luo MC, Baezinge SP, Matnyazov R et al. Population- and genome-specific patterns of linkage disequilibrium and SNP variation in spring and winter wheat (Triticum aestivum L.). BMC Genomics. 2010; 11: 727. pmid:21190581
  43. 43. Moragues M, Moralejo M, Sorrells ME, Royo C. Dispersal of durum wheat landraces across the Mediterranean basin assessed by AFLPs and microsatellites. Genetic Resources and Crop Evolution. 2007; 54: 1133–1144.
  44. 44. Oliveira HR, Campana MG, Jones H, Hunt HV, Leigh F, Redhouse D, et al. Tetraploid wheat landraces in the Mediterranean basin: taxonomy, evolution and genetic diversity. PLoS ONE. 2012; 7: e37063. pmid:22615891
  45. 45. Oliveira HR, Hagenblad J, Leino MW, Leigh FJ, Lister DL, Peña-Chocarro L et al. Wheat in the Mediterranean revisited–tetraploid wheat landraces assessed with elite bread wheat Single Nucleotide Polymorphism markers. BMC Genetics. 2014; 15: 54. pmid:24885044
  46. 46. Bonjean AP. French wheat pool. In: Bonjean AP, Angus WJ (eds) The world wheat book: a history of wheat breeding. Lavoisier Publishing, Paris. 2001; pp 127–165.
  47. 47. Royo C, Soriano JM, Álvaro F. Wheat: a crop in the bottom of the Mediterranean diet pyramid. In: Fuerst-Bjelis B (Eds) Mediterranean Identities—Environment, Society, Culture. Intechopen, London. 2017: pp 381–399.
  48. 48. Sharma S, Upadhyaya HD, Varshney RK, Gowda CII. Pre-breeding for diversification of primary gene pool and genetic enhancement of grain legumes. Front. Plant Sci. 2013; 4: 309. pmid:23970889
  49. 49. Parzies HK, Spoor W, Ennos RA. Inferring seed exchange between farmers from population genetic structure of barley landrace Arabi Aswad from Northern Syria. Genet. Resour. Crop Evol. 2004; 51: 471–478.
  50. 50. Ben-Romdhane M, Riah L, Selmi A, Jardak R, Bouajila A, Ghorbel A, et al. Low genetic differentiation and evidence of gene flow among barley landrace populations in Tunisia. Crop Sci. 2017; 57: 1585–1593.