Chromosomal Passports Provide New Insights into Diffusion of Emmer Wheat

Emmer wheat, Triticum dicoccon schrank (syn. T. dicoccum (schrank) schÜbl.), is one of the earliest domesticated crops, harboring a wide range of genetic diversity and agronomically valuable traits. The crop, however, is currently largely neglected. We provide a wealth of karyotypic information from a comprehensive collection of emmer wheat and related taxa. In addition to C-banding polymorphisms, we identified 43 variants of chromosomal rearrangements in T. dicoccon; among them 26 (60.4%) were novel. The T7A:5B translocation was most abundant in Western Europe and the Mediterranean. The plant genetic resources investigated here might become important in the future for wheat improvement. Based on cluster analysis four major karyotypic groups were discriminated within the T. dicoccon genepool, each harboring characteristic C-banding patterns and translocation spectra: the balkan, asian, european and ethiopian groups. We postulate four major diffusion routes of the crop and discuss their migration out of the Fertile Crescent considering latest archaeobotanical findings.

Emmer wheat was a staple crop of Neolithic agriculture and it was widely cultivated for over 7,000 years, until the second half of the third millennium BP, when it began to be replaced by higher-yielding and free-threshing bread wheat (T. aestivum L.) and durum wheat (T. durum DESF.) [1,2,6]. Several studies suggested that the initial domestication of emmer occurred independently at several sites in the Fertile Crescent [3,7,8]. Favorable alleles, which enabled adaptation to human-made habitats and husbandry as well as to new climate conditions, were selected under cultivation [1,9,10]. Hybridization events between domesticated emmer and its wild progenitor, as well as selection and enrichment of naturally occurring mutations provided a relatively wide genetic basis as the crop evolved [2]. Tetraploid domestic emmer wheat (genomic formula BBAA), which already harbored certain genome plasticity [11], was able to tolerate a wide range of biotic and abiotic stresses and to thrive under cultivation, well outside the Fertile Crescent.
Molecular studies of T. dicoccoides clearly distinguished two wild emmer races or genepools, which are further subdivided: a wild western/southern race comprised materials sampled in Israel, Jordan, Lebanon and Syria; and a wild eastern/northern race consisted of samples collected in Turkey, Iraq and Iran [3]. The authors of this publication summarized the current knowledge on the geographic distribution, population structure and domestication history of wild emmer wheat. According to their data, domesticated emmer originated from the eastern wild emmer race which can be due to, for example, (a) late ripening or (b) easier threshing due to thinner glumes & shorter and weaker awns that confer them the adaptive advantages for cultivation.
Geographical expansion of domestic emmer was intimately associated with human migrations [28]. It was a long and complex process in which wheat genotypes became adapted to new habitats and climates. The genetic structure of local wild and domesticated emmer populations was affected, among other factors, by exchange of seed stock during migration and by gene flow between wild and domesticated wheats or between different locally adapted domestic emmer populations [10,19,[29][30][31].
Beside molecular markers, genetic diversity of wheat species or populations can be assessed using cytogenetic approaches. The C-banding technique proved to be powerful in phylogenetic studies of cereal crops [32]. This is because the C-banding patterns are chromosome-and cultivar-specific and independent of environmental cues [33]. C-banding patterns are highly polymorphic, although the sense of this phenomenon is not known. High diversity of the Cbanding patterns creates the uniqueness of karyotypes of each form. Despite this, C-banding experimental part of the study. Currently he is an employee of the commercial company Bayer CropScience NV, Innovation Center; however this does not alter the authors' adherence to PLOS ONE policies on sharing data and materials. The authors have declared that no competing interests exist.
patterns were never considered in population analysis due to impossibility of direct application of chromosomal images for statistical testing. This problem can be circumvented by using the recently developed approach of chromosomal passportization-a description of karyotypes in a digital form by comparing all chromosomes of each line with the generalized idiogram [34].
Earlier, using the C-banding technique, we observed some peculiarities of European emmer, in particular the abundance of the 5B:7A translocation [35,36]. Our previous studies however did not cover the entire geographic range of emmer cultivation. None of the accessions studied by us earlier were included in experiments by other laboratories using molecular markers [3,16,30,37,38], thus precluding a direct comparison.
The aim of this study was to perform a comprehensive cytogenetic analysis using chromosomal passports based on the C-banding patterns of karyotypes in a broad collection of T. dicoccon lines. We also aimed to include wild emmer wheats and other tetraploid wheat taxa in order to unravel phylogenetic relationships among domestic emmer populations and to obtain new insights into emmer domestication history by suggesting possible diffusion routes of the crop.

Plant material
In this study, we followed the wheat classification system of [13]. For the spelling "dicoccon", see [39,40].
In total, karyotypes of 486 accessions of four tetraploid hulled wheat taxa and one freethreshing taxon were analyzed using the C-banding technique (S1 Table). Of them, 431 accessions were cytogenetically stable (thus consisting of one genotype only), whereas 36 accessions segregated into 2, eleven-into 3, six into 4, and two into 5 distinct genotypes that differed from one another in the C-banding patterns and/or in the presence of chromosomal rearrangements. All genetically distinct genotypes were treated as separate entities and designated as "lines".
Each position was numbered by an odd numeral from 1 (centromere) to 3-23 depending on the chromosome. Each band can be described by a simple formula that includes chromosome designations (1A-7B), chromosome arm (short, S or long, L) and band position on the arm. For example, the perinucleolar C-band on the chromosome 1B was defined as 1BS7 (Fig 1).
Among 147 positions of C-bands revealed for all lines we investigated, several bands were excluded from the analysis for either of the following reasons: (I) they were monomorphic, like C-bands constituting pericentromeric heterochromatin complexes of the B-genome chromosomes; or (II) bands were too small or inconsistent to score.
The size of all 112 informative bands (Fig 1, red numbers) was estimated as follows: "0"absent; "1"-small, "2"-medium, and "3" large or very large (for example, band 1BS7 on CHR 1B of the BAL (= 1), TRC (= 2), and IRN (= 3) emmer, respectively (Figs 1 and 2)). Owing to the frequent occurrence of pericentric inversions on chromosome 4B, this parameter was considered in the chromosomal passports as an additional variance. It was estimated as the position of the centromere within four blocks of the centromeric heterochromatin complex: "0"-all Generalized idiogram of wheat A and B genome chromosomes. A and B-wheat genomes; 1-7-homoeologous groups; S and L-short and long arm of a chromosome, respectively. Large permanent C-bands are shown as solid blocks; polymorphic C-bands are shown as shaded blocks; small inconsistent C-bands are indicated by dashed lines. Positions of C-bands are numbered at the right-hand side of each chromosome. The 112 C-bands considered in "chromosomal passports" are shown in red; 38 C-bands most essential for discrimination of karyotypic groups are indicated with *.
four blocks are located on the long arm (inv4B); "1"-one block is located on the short and three on the long arm; "2"-two blocks are located on the long and two on the short arm; "3"three or all four blocks are located on the short arm.
All these numbers were entered into an Excel spread sheet containing a matrix of 113 columns corresponding to the number of informative characters plus one column with the accession code, and 545 rows, each row corresponding to one line (S2 Table). This dataset was then used to infer the genetic clustering.

Network Reconstruction and Genetic Clustering
First, we used the chromosomal passport data of T. dicoccon lines for estimating the number of main clusters in the collection. Based on this data, we utilized k-medoids using Hamming distance and compute the gap statistic [44,45]. For determining the number of main clusters, we utilize the criteria defined by Tibshirani et al. [44], but asked for twice standard error to be more strict. Shepard plot as well as R 2 values indicated that it will be reasonable to use NMDS statistics.
NMDS plot and Neighbor-joining tree were computed based on Hamming distance between all 545 lines using R project [45]. Accessions were colored according to the main cluster components of T. dicoccon lines determined by k-medoids, and one additional color (black) for other taxa. In the Neighbor-joining tree, we colored an edge with the unique color of all leaves in this subtree and otherwise grey (S12 Fig).

Chromosomal polymorphisms of domesticated emmer
Karyotypes of domesticated emmer showed high diversity of C-banding patterns. B-genome chromosomes were more polymorphic than A-genome chromosomes. The lowest diversity of C-banding patterns was found for chromosome 3A (3 polymorphic variants), while chromosomes 2A and 4A proved to be most variable among the A-genome chromosomes (37 and 38 variants, respectively). On the B-genome, the lowest polymorphism was observed for chromosome 4B (25 variants) and the highest-for chromosomes 3B and 7B (78 and 53 variants, respectively) (S1-S6 Figs).
The comparison of emmer lines from different geographical regions enabled the detection of "region-specific" C-banding patterns (and thus karyotypes) (Fig 2). Visual karyotype comparison revealed that this regional specificity was mainly associated with variation out of 38 of 112 C-bands scored (Fig 1, indicated with Ã ). Polymorphism of the remaining bands determined the uniqueness of karyotype of individual lines.
Emmer wheats from the Balkans provide a first good example for genotypes with "regionspecific" banding patterns (Fig 2 and S1A-S1D Fig). Unique banding patterns were observed on CHR 4A, 1B, and 7B, which either contained additional C-bands (e.g. 1BL13), or C-bands with reduced (4AL7) or increased (7BS3, 7BL3, 7BL15) sizes (Figs 1 and 2). Based on high chromosome similarity among emmer from the Balkans, as well as based on "region-specific" polymorphic bands, these lines were assigned to the BALKAN karyotypic group and designated as BAL chromosomal type. Similar peculiarities of C-banding patterns on CHR 4A, 2B, 3B, and 7B were observed in many emmer lines from the Volga region (Fig 2 and S2A, S2F, S2J, S2K, S2M, S2N and S2Q Fig). Therefore, these lines can be attributed to the BALKAN group. However, none of them carried the third, additional C-band 1BL13 and also the C-banding patterns of CHR 2A, 6A, 7A, 5B were different (Fig 2 and S2 Fig). Based on these observations, we designated these lines as chromosomal VOL type.   Table). Three major chromosomal types were identified among the EUROPEAN karyotypic group: the Spanish (WEM-SP) type prevailing in Spain (S3A, S3B and S3D Fig), and two more widely distributed types: WEM-1 (predominantly, but not obligatorily carrying the 7A:5B translocation; S3E, S3K-S3N, S3P, S3Q and S3S Fig) and WEM-2 (S1K, S1M, S1S and S1T Fig; S3H-S3J, S3O, S3T and S3V Fig; S5M Fig; S6D and S6K Fig). The WEM-1, WEM-SP and WEM-2 types can be distinguished by C-banding patterns on CHR 3B, CHR 5B and CHR 7B (Fig 2 and S3 Fig).
Lines from Transcaucasia and Iran showed region-specific banding patterns that defined the ASIAN group. Although karyotypically similar to the BALKAN group, this group can be discriminated based on C-banding patterns of CHR 4A, 3B, 5B, and 7B (Fig 2 and S4 Fig). The ASIAN group can be divided into two chromosomal types: i) the Transcaucasian (TRC), and ii) the Iranian (IRN) type. The IRN type differed from the TRC type in the larger size of some interstitial and pericentromeric C-bands, especially on CHR 2B ( Highly specific C-banding patterns, especially on CHR 2A, 1B, and 2B, defined the ETHIOPIAN group (S5O, S5P, S5R-S5S, S5V and S5X Fig). In particular, CHR 2A was characterized by a prominent C-band 2AS5, the marker C-band 1BS5 on CHR 1B was missing, while CHR 2B contained an additional C-band 2BL3 between the pericentromeric and proximal blocks (Figs 1 and 2).
Domestic emmer from North Africa was highly heterogeneous and belonged to at least three chromosomal groups. Twelve lines were assigned to a separate group, which was defined as MOR chromosomal type. It differed from other emmer lines found in North Africa in the Cbanding patterns of 2A, 2B, 4B, 5B and 7B chromosomes (

Chromosomal rearrangements in domesticated emmer
According to our C-banding analysis, 334 of the 446 lines of T. dicoccon had normal karyotypes. Forty-three chromosomal rearrangements were identified in the remaining 112 lines (25.1%) (S3 and S4 Tables), from which only 15 were listed in previous catalogues [46,47], and two other translocations were described in [36]. All hitherto known translocations as well as all 26 novel variants identified in our study are shown in Most translocations (27 variants) were found in single lines, and only eight occurred with higher frequencies. The translocation T7A:5B detected in 48 lines from 20 countries was most frequent (S3 Table). Although T7A:5B was especially common in Europe but, interestingly, also detected in few domestic emmer lines from Turkey, Iran, Algeria and Russia (S2I Fig; S4B and   [35,46] are shown in red and novel translocation variants found in this study are indicated in blue. Red arrows define translocation lineages, i.e. a series of related translocations occurring one after another. A detailed description of all translocation variants is given in S3 and S4 Tables.

Chromosomal polymorphisms identified in Isfahan emmer and Colchic emmer
In addition to T. dicoccon, Dorofeev et al. [13] recognized two other tetraploid domesticated hulled wheat species: I) the Isfahan emmer and II) the Colchic emmer. In our study T. karamyschevii showed low chromosomal polymorphisms and all accessions had very similar banding patterns (S7D-S7F Fig Free-threshing durum wheat was included in the analysis as an out-group. Our examination of 10 representative durum cultivars also revealed a low level of C-banding polymorphisms. We detected taxon-specific banding patterns on CHR 3B, 4B, 5B and 7B. Chromosomal rearrangements were not detected (S7G- S7L Fig).

Chromosomal polymorphisms of wild emmer, T. dicoccoides
In order to assess C-banding polymorphisms in wild emmer, 105 lines from all known areas of the current species distribution range were analyzed. Wild emmer wheat showed an extremely high diversity compared to domesticated emmer based on C-banding patterns and translocation polymorphisms (S8 and S9 Figs, S5 Table). We identified population-specific and regionspecific polymorphisms. Populations sampled in the eastern part of the species' distribution range were less diverse than those collected in the Levant, which corresponded to the results obtained using molecular and biochemical markers [3,30,[48][49][50].
According to C-banding patterns, the eastern group can be subdivided into two major chromosomal races: a race comprising populations from the Turkish provinces Kahramanmaraş The former race was characterized by highly specific C-banding patterns on CHR 2B, 5B and 6B that were not detected in T. dicoccon. On the contrary, the same chromosomes in accessions from eastern Turkey, Iran and Iraq resembled the respective chromosomes of domesticated emmer. Several types of chromosomal rearrangements were identified (S5 Table); interestingly, one of them was the translocation T7A:5B found in one Iraqi line (S8R

Inferring population structure of domesticated emmer
We utilized "chromosomal passport" data to infer the population structure of T. dicoccon. However, the following peculiarities of chromosomal datasets must be considered. First of all, the unit of segregation and inheritance is the entire chromosome, not a single band. Polymorphic variants of a chromosome could occur due to amplification/ elimination of nucleotide sequences in particular position(s), recombination between heteromorphic chromosomes in hybrids and minor chromosomal rearrangements. Thus, these polymorphic variants can be treated as "chromosomal haplotypes". Secondly, C-banding analysis is not a proper method for detecting intra-chromosomal recombination. The third constrain lies in the nature of C-banding polymorphism. Variation of C-bands is not a discrete, but gradual process, and this complicates the assessment of C-band sizes. Algorithms allowing working with such datasets are not available. Therefore we adopted statistical approaches that were developed to estimate population structure based on polymorphic markers for the analysis of our results.
The population structure of domesticated emmer was first estimated on a collection of 421 T. dicoccon lines using k-medoids and gap statistic (S10 Fig). Shepard plot as well as R 2 values indicated that it will be reasonable to use NMDS statistics (S11 Fig). Four clusters can be distinguished among T. dicoccon lines (Fig 4).
The first cluster (Fig 4, orange color (tan1)) included 78 lines from Ethiopia (41), Yemen (9), Oman (6), India (5), Turkey (3) and 14 lines from 11 other countries. Most lines (>75%) were also assigned to ETHIOPIAN karyotypic group based on visual karyotype analysis. Eight lines were considered hybrids between ETHIOPIAN and other karyotypic groups (S1 Table), and six lines, predominantly from the Middle East, were visually classified as the WEM-2 type.
The second cluster (green1 color) consisted of 127 lines; of them 102 (>80%) belonged to ASIAN karyotypic group (85 to TRC-type and 17 to IRN type). Five lines were visually classified as VOL type, whereas 11 lines were considered hybrids between different karyotypic groups.
Cluster # 3 (blue1 color) included 129 lines predominantly of European origin. Based on visual karyotype comparison, 53 lines were assigned to WEM-1 type, 24 to WEM-2 and 17 to WEM-SP types (altogether >72%). Besides EUROPEAN emmer, this cluster included 5 lines from ASIAN or BALKAN karyotypic groups and 19 lines with hybrid or uncertain origin. Noteworthy,  Table). Each point represents a single accession. The four colors blue, red, green and orange are based on k-medoids and represent the EUROPEAN, BALKAN, ASIAN, and ETHIOPIAN, respectively. Additionally, black is used for DICOCCOIDES and other taxa. nine MOR lines grouped together with EUROPEAN lines, despite their obvious karyotypic (Fig 2) and morphological [13,26] differences.
Cluster # 4 (Fig 4, red1 color) consisted of 87 lines, all from the BALKAN karyotypic group. Sixty one lines were visually assigned to the BAL chromosomal type, 24 to VOL-type and two lines were hybrids between these types.
To get an insight on phylogenetic relationships between five tetraploid wheat taxa a comprehensive collection of 545 lines was analyzed using NEIGHBOR-JOINING method. On the resulting tree, the T. dicoccon gene pool divided into five clusters, and nine "chromosomal types" were defined visually (S12 Fig). The MOR emmer, which was assigned by k-medoid analysis to the EUROPEAN group, was positioned on the NJ 545 tree within DICOCCOIDES cluster, as a separate branch. Importantly, there was a good coincidence (87.7%) in the composition of clusters determined by NJ and NMDS-analyses (67 lines ETHIOPIAN group, 85 lines of the Balkan group, 106 lines of the Asian group and 114 lines of the EUROPEAN group were assigned to one and the same clusters). Most of the 52 T. dicoccon lines with inconsistent clustering probably represented hybrids between karyotypic groups or between wild and domestic taxa.
All T. ispahanicum lines grouped together within the ASIAN clusters, while T. karamyschevii clustered together with EUROPEAN emmer. All T. durum lines fell in the DICOCCOIDES cluster and grouped together with MOR forms.
Wild emmer clustered separately from domestic forms. Lines from Southern Levant, Northern Levant and subsp. judaicum tended to group distinctly within DICOCCOIDES. Only three domestic lines, all from Jordan were positioned within DICOCCOIDES cluster on the NJ 545 tree (S12 Fig, red arrows). All of them are probably hybrids between wild and domestic forms. In turn, three wild emmer lines fell in different clusters of domestic wheat (S12 Fig, green arrows). Importantly, one of these lines, PI 355455 from Lebanon was also placed into domestic gene pool by Luo et al. [30], who used molecular markers.

Discussion
Chromosomal passports discriminate four major karyotypic groups within the T. dicoccon gene pool Domestication and diffusion of emmer were complex and dynamic processes tightly associated with human history and resulted in the formation of locally adapted populations.
Four major karyotypic groups were discriminated by k-medoid analysis, in accordance with taxonomical [13,26] and molecular studies [30]. Importantly, our data coincided with the result of molecular analysis [30] in the assessment of a set of 49 shared accessions. Thus, chromosome analysis can be used for elucidating phylogenetic relationships among closely related taxa and the reconstruction of diffusion routes of crops. Interestingly, despite unique karyotypic features, the MOR emmer was assigned to the EUROPEAN group. This result, can be caused by i) the relatively small sample size of MOR emmer, constituting of only 2% of the whole population, or ii) the possible hybrid nature of MOR forms, which can be deduced from their karyotypic similarity with durum wheat.
Although characteristic for certain geographic regions, populations of domesticated emmer usually included representatives of more than one karyotypic group at different frequencies. This mixture of karyotypic groups probably came about due to multiple crop introductions by successive waves of colonizing civilizations, which swept across Europe, the Mediterranean and Asia during the second half of the Holocene. However, doubtful origins due to genebank handling errors may also contribute (see S1 File).
Lines belonging to the ASIAN group were collected, for instance, in north-eastern Turkey, Iran and Transcaucasia, where emmer continued to be cultivated until recent times. Genotypes of the BALKAN group were collected from northern Anatolia (Kastamonu province), the Balkans and the Volga region. Our data suggested that these two groups represented the closest domestic emmer forms to the eastern race of wild emmer. In other words, based on C-banding patterns of 2A, 7B, and especially 5B chromosomes, the BALKAN and ASIAN groups probably originated from the eastern wild emmer race. Closest wild relatives were found in the Turkish provinces Diyarbakır, Mardin, and Şanlıurfa (S8 Fig), thus largely supporting the results of Özkan et al. [3].
The EUROPEAN and ETHIOPIAN groups can be discriminated from the ASIAN and BALKAN groups by the C-banding pattern of CHR 2A, which lacks the C-band in the 2AS3 position, but Chromosomal rearrangements played an important role in the intraspecific divergence of T. dicoccon, especially in North Africa. We found 43 different translocation variants, which were mainly restricted to geographic regions. Most translocation variants were found in single individuals and probably originated randomly in particular genotypes, but did not spread out from the area of their emergence. However, translocations, like T7A:5B or T3B:6B may confer selective or adaptive advantages to the respective genotypes. These translocations, especially T7A:5B, are found with high frequency in a broad geographical range. Alternatively, if favorable genes pre-existed in a line before the emergence of translocation, they will be fixed in translocated genotypes due to reproductive isolation and will drag translocation along. The T7A:5B was found in 10% of all emmer lines studied and which was especially frequent in WEM.
Although breeders usually avoid translocations because of decreased hybrid fertility [52] and reduced chromosome recombination [53,54], these genetic resources might be utilized in the future for wheat improvement.
The current geographical distribution of chromosomal groups allows us to suggest four possible major diffusion routes for domesticated emmer (Fig 5).
The first-the Balkan route-migration of the BALKAN group from southeastern Anatolia to the Balkans and subsequently to Eastern Europe (red route in Fig 5) Karyotypically, the Turkish population from Kastamonu province was very similar to emmer from the Balkans. Following emmer introduction into the Balkans by early farmers [55], the crop probably gradually spread to neighboring territories (Fig 5). A similar migration route has been proposed for common wheat [56,57] and agrees with the archaeological evidences. Subsequently, the spread of the BALKAN karyotypic group in the Volga and Ural regions of Russia may have been accompanied by hybridization with local wheat genotypes from other chromosomal groups, in particular with the ASIAN group. Most lines from Volga region (26) fell in the BALKAN cluster (S1 Table, cluster# 4): 25 lines were grouped together with ASIAN emmer (S1 Table, cluster # 2), and two lines carrying reciprocal translocations-with EUROPEAN. On the NJ 545 tree the VOL type clustered within the BALKAN group as a separate branch (S12 Fig). The second route-migration of the ASIAN group from southeastern Anatolia via Transcaucasia to the Volga region, via the Bosporus to Europe and via Iran to South Asia and India (green route) Based on our chromosomal passports, we suggested that a second major route (ASIAN group) may have also started in southeastern Anatolia.
(I)-Via Transcaucasia to the Volga. The divergence of Transcaucasian forms from the putative original population of Turkish domesticates was associated with only minor karyotype changes (Fig 2 and S4 Fig). From another side, ASIAN emmer was karyotypically distinct from  S1 Table). Square dots designate lines carrying the T7A:5B and round dots designate lines lacking the 7A:5B. Solid lines designate migration routes supported by our data, while dashed lines designate hypothetical migration routes, which were not confirmed in our study. Thick lines correspond to major migration routes, thin lines to secondary migration routes; 1a -possible migration route of BALKAN emmer from Anatolia to the Balkans; 1b -possible migration route of VOL emmer from the Balkans to the Volga region; 4a -possible migration route of ETHIOPIAN emmer from Zagros to Ethiopia via the Arabian trading route and to India by maritime route; 4b -possible migration route of ETHIOPIAN emmer from Zagros through India and Oman to Ethiopia. other wheat taxa growing in the Caucasus region [58] and exhibited low intra-population diversity [16,37]. These facts suggested that the ASIAN group probably derived directly from a domesticated Turkish emmer population and that its migration was not associated with hybridization events. According to archeological data, farming was introduced to the southern Caucasus from Eastern Anatolia or northern Iran approximately 7,500 BP, however hexaploid naked wheat became the dominant crop after a few millennia and T. dicoccon appeared to have survived as minor crop in this area [59]. Indications of farming appeared in archeological sites of the North Caucasus and in north-western plains of the Caspian shore dating to the second half of the 6 th millennium BP. These ancient farmers had trading and cultural links with South Caucasian and Middle Eastern communities [6,60]. Relationships between T. dicoccon lines from southern and northern Transcaucasia are supported by karyotype similarity of the Kabardino-Balkaria and Dagestan forms (S4H Fig) with lines from Armenia and Georgia (S4F, S4G and S4I Fig and S4C Fig).
The presence and the distribution pattern of ASIAN emmer in the Volga and Ural regions suggested that it was probably introduced from Transcaucasia northwards, along the Volga River. Emmer was an important crop in the Middle Volga region and the Ural, and was cultivated in Bashkiria, Tatarstan, Chuvashia, and Perm regions of Russia till the beginning of the 20 th century [15,18]. Twenty five of the 51 T. dicoccon lines from the Volga and Ural regions belonged to the ASIAN group. They co-occurred with the BALKAN lines (26 lines) in the region from Samara to Nizhny Novgorod (Fig 5). Upstream of the Volga, the ratio of ASIAN / BALKAN lines decreased gradually. The BALKAN lines were probably the descendants of emmer that was brought into middle Volga from the Balkans, whereas the ASIAN lines probably originated from Transcaucasian emmer. According to visual karyotype comparison, ASIAN and BALKAN lines often produce hybrids with various combinations of parental chromosomes (S2H, S2L, S2R and S2V Fig), which further complicates the analysis of the ancestry of Volga emmer.
(II) Via the Bosporus to Europe. The ASIAN type also occurred in Bulgaria, Greece and Albania (Fig 5, S1E-S1H, S1J, S1Q and S1V Fig and S3W and S3X Fig). We suggested that their ancestral forms probably reached south-eastern Europe directly from Anatolia, probably admixed with other karyotypes, especially the BALKAN type.
(III) Via Iran to India. The ASIAN karyotypic group was broadly distributed in Iran and also found in Afghanistan and India (S4P, S4R and S4W Fig). Several lines from north-western Iran that we visually assigned to the IRN type were distinct from the remaining ASIAN lines by a prominent centromeric and relatively small 2BS5 block (Fig 2). On the NJ 545 tree they tended to group together within a common ASIAN cluster (S12 Fig). The IRN type might have originated as a result of gene flow from Iranian wild emmer to the ASIAN group of domesticated emmer in the eastern part of the Fertile Crescent.
Unfortunately, collection site information was missing for most domestic Iranian lines and only few emmer accessions were available from countries further east. This impeded tracing emmer wheat dispersion east of Iran. However, our data suggested that this migration was associated with cytogenetic changes probably due to hybridization between different seed stocks along ancient trading routes.
Third route-from the southern Levant to Europe and via the Iberian Peninsula or from Near East to Africa (EUROPEAN group, blue route) Our data suggested that the domestic EUROPEAN emmer group was not a direct descendant of wild emmer wheat from Anatolia. Instead, EUROPEAN emmer could have been domesticated independently from ASIAN and BALKAN groups, and probably originated directly from western wild emmer from within the Levant (S6A and S6B Fig and S9Q Fig).
We here proposed that the dispersion of EUROPEAN emmer to the Mediterranean and Western Europe started in the southern Levant (Fig 5). Archaeologists have identified two diffusion routes of agriculture into Europe from the Fertile Crescent, and in both cases the spread of agriculture involved several waves of colonists but also local populations, who adopted farming with introduced crops [61]. The first route was going through Turkey, the Balkans, Central Europe and into Western Europe following major river systems with emmer and einkorn [62,63]. So one may expect that Neolithic European emmer will be of the BALKAN type. Indeed, the BAL-KAN and ASIAN-type lines rarely occurred in West European countries (Fig 5), and these forms could be the remnants of the earliest migration wave. However, the majority of WEM lines belong to EUROPEAN karyotypic group. They were probably introduced later, as a result of migration going as a maritime route into southern Europe, Italy, France, Spain and Portugal, where emmer wheat was found mixed with free-threshing wheat. An independent migration route of common wheat to the Mediterranean and West-Europe has been suggested [56,57] based on analysis of microsatellite allelic diversity. The geography of these countries and the specific climatic conditions may have played a major role in the differentiation of germplasm [57]. Most probably, emmer genotypes belonging to EUROPEAN group may confer some selective/adaptive advantages and therefore replaced the BALKAN emmer in these territories.
The maritime route is supported by the similarity of T. dicoccon from the Near East with the EUROPEAN group (S5 and S6C- S6K Fig) and by the presence of the 7A:5B translocation, which occurred at high frequency in EUROPEAN emmer and was found in few wild emmer accessions from the Levant (S9M and S9V Fig). Wild emmer from the Southern Levant was characterized by higher karyotype variation compared to Anatolian T. dicoccoides (S8 and S9 Figs). This might have facilitated the broader genetic background of the EUROPEAN group compared to all other karyotypic groups of domestic emmer. Karyotype diversity of the EUROPEAN group could have been additionally broadened by hybridization and recombination with lines from other karyotypic groups that grew sympatrically.
Vast majority of emmer lines from WEM countries (63 of 77) were attributed to group # 3 based on the K-medoid analysis (Fig 4, S1 Table). Visual chromosome comparison allowed dividing them into three chromosomal types, WEM-1, WEM-2, and WEM-SP, which tended to group together within the EUROPEAN cluster on the NJ 545 tree (S12 Fig). Lines carrying the 7A:5B translocation positioned within the WEM-1 group. High karyotypic similarity of these lines was probably caused by reproductive isolation, because translocations can lead to hybrid depression [64] and thus create genetic barriers between genotypes growing in one population. In addition, translocations prohibit chromosome recombination [53], and translocated chromosomes will not be modified compared to the original type (S1G, S1L, S1N, S1R and S1U Fig). T7A:5B probably originated in the western race of wild emmer in the Levant. In turn, this translocation could also have occurred in domestic emmer in Southern Levant and then been introgressed into local wild populations due to gene flow.
According to our study, domestic emmer in Spain belonged to EUROPEAN chromosomal group, and among them we visually discriminated the WEM-SP type (S3A-S3E Fig). This type was indigenous for Spain and was found in 18 out of 29 lines investigated-mostly from Asturias and Navarra. Three lines carried a T4B:6B-1 translocation (S3 Table,  The North African genepool. Archaeological finds suggested that agriculture was initially introduced into Morocco from the Iberian Peninsula [65], but there may have been many other later introductions. Most lines from Algeria and Morocco showed unique banding patterns defining the MOR emmer, which were attributed to EUROPEAN karyotypic group by k-medoid analysis, but occupied a distinct position on the NJ 545 tree. Translocations played an especially important role in the evolution of MOR emmer, as a series of four consequent translocations has been identified. Two lines carried T7A:5B and two lines T4A:1B. Genotypes with significantly rearranged karyotypes prevailed in Morocco (6 out of 10 lines), and all of them were derivatives of T4A:1B. Based on our findings, we suggested that Northwestern African emmer was probably formed by introduction from southern Spain followed by hybridization with local durum wheat and subsequent chromosomal rearrangements. This might have happened much later, when durum wheat was introduced into Africa. Few lines belonging to the ASIAN group could have been brought in by the Phoenicians as Carthage was a center of commerce in North Africa.
Fourth route-from the Fertile Crescent via Oman to Ethiopia and to India (ETHIOPIAN group, orange route) ETHIOPIAN emmer was karyotypically well separated from other karyotypic groups and possessed several diagnostic chromosomes. It was similar to the EUROPEAN group in the C-banding patterns of CHR 2A and 7B, however these groups differed in diagnostic features on all other chromosomes. The combination of two polymorphic variants of CHR 1B lacking the marker Cbands 1BS5 but harboring the additional proximal band 2BS3 was found to be unique for the ETHIOPIAN group. These chromosomal variants were never found combined in wild emmer lines. However, either one or the other chromosomal variant occasionally occurred in wild lines from the southern Levant (S9T, S9U and S9W Fig), but also from Iraq and Iran (S8R, S8S, S8U and S8V Fig). Thus, our chromosomal passport data as well as other data available from literature [30], do not provide direct evidences, when and where the ETHIOPIAN group originated. However, based on i) the geographical distribution of rare polymorphic chromosomal variants (2A, 1B and 2B chromosome heterochromatin patterns), as well as ii) considering the geographic distribution of the ETHIOPIAN group, we propose that this group could have originated in Northern or Northeastern Mesopotamia. Additionally, gene flow from western wild emmer has probably contributed to the formation of this chromosomal group, thus supporting the reticulate domestication model by Civan et al. [29].
Arguments in favor of wild emmer having been independently domesticated in the Zagros region of western Iran are quite strong. Wild emmer was known to have been exploited in this area at the site of Chogha Golan about 11,400 years ago and by 9,800 years ago it was domesticated at the same site [66]. Emmer finds from southern Pakistan at the sites of Mehrgarh possibly date back to 9,000 years ago [67]. By 8,200 years ago emmer is found at the site of Jeitun in Turkmenistan [68]. Finds of wheat from India are much later and date to about 4,500 BP. However, these are often naked type wheats, but Fuller reports on some Neolithic wheat finds including those from southern India [69].
We have identified the ETHIOPIAN group in few domesticates collected in the Near East, and in all lines from Saudi Arabia, Oman, and most lines from Yemen and (S4O, S4S and S4U Fig  and S6O-S6S and S6U Fig). Our data thus provide further evidence that domestic emmer could have reached Ethiopia through Yemen via ancient trade routes (Fig 5, orange routes 4a,  b). Also botanical characters supported the close relationships between T. dicoccon lines from Ethiopia and Yemen. These races were described as subsp. abyssinicum VAVILOV [13,26].
Further support comes from domesticated barley, where the same migration route into Ethiopia was postulated by Pomortsev et al. [70]. These authors also showed that only few "founder" genotypes gave rise to the current limited diversity of barley in Ethiopia. We also observed very limited C-banding polymorphisms among Ethiopian T. dicoccon materials compared to samples from other geographic regions. Nucleotide diversity based on partial DNA sequences of single-copy nuclear genes also pointed to a limited genetic background of ETHIOPI-AN emmer [71]. As one karyotypic variant significantly dominated among domestic emmer lines, we suggested that Ethiopian emmer first underwent a severe genetic bottleneck, followed by local adaptation. Few Ethiopian lines belonging to the ASIAN or the EUROPEAN types could have been introduced recently, or they were probably mistaken during ex situ maintenance.
Earlier studies suggested that ETHIOPIAN emmer was probably introduced to Ethiopia through Egypt and Sudan some 5,000 years ago [72]. From here domestic emmer probably reached the Arabian Peninsula [6]. A possibility of emmer introduction through Egypt cannot be ruled out, because we had access to just one emmer accession from this country. According to Stoletova [15], emmer that was still cultivated by very few Egyptian farmers in the beginning of the 20 th century was distinct from the remnant of emmer plants excavated from Egyptian pyramids, being more similar to modern Palestinian or Iranian forms. Therefore, the direct ancestry of emmer forms that were grown in Egypt in the recent past from Neolithic emmer is improbable.
The close morphological and genetic relationship between Ethiopian and Indian domestic emmer is well documented [13,26,30,31]. As hypothesized earlier, emmer was introduced into India during the 4 th or 3 th millennia BC [6]. It was suggested that emmer could have reached Kashmir: I) from the Middle East through Iran [73], and II) from Ethiopia via maritime trade [30]. We here provide evidence that both routes are possible: We found two lines in Afghanistan belonging to the ETHIOPIAN group (S4O Fig), although they were absent from Iran. On the basis of phenological and morphological examinations, Hammer et al. [39] assigned emmer sampled in Oman to the Asian eco-geographic group. However these lines also shared some morphological similarities with Abyssinian forms (K. Hammer, personal communication), thus providing support for a close relationship between lines collected in Ethiopia, Oman and India.
The origin of T. ispahanicum and T. karamyschevii In contrast to T. dicoccon, Colchic emmer and Isfahanian emmer had a very limited geographical distribution and are currently only maintained ex situ in genebanks, with 71 and 53 accessions in 25 and 16 genebanks, respectively [25]. Both taxa differ from T. dicoccon in several morphological traits.
Triticum ispahanicum was found in few villages from the Vazak Canton, Faridan district of the Isfahan province, Iran at an altitude of 2,000-2,500 meters. Isfahanian emmer was collected by Kuckuck in 1952Kuckuck in -1954, a French expedition in 1957 [75], and a Japanese expedition [76]. Afterwards, despite intensive searches, T. ispahanicum was never found again in cultivation [41]. Close similarities in chromosomal characteristics suggested that T. ispahanicum cannot be treated as an independent taxonomical unit; it should be considered a variety of T. dicoccon that was probably derived from ASIAN domesticated emmer as a result of gene mutation affecting the glume length (P 2 gene, [77]).
Triticum karamyschevii was cultivated in Western Georgia, often in mixed stands with T. macha. The origin of T. karamyschevii is less clear. This taxon has a dense spike, which is prone to branching, and plants are morphologically similar to T. macha [13]. Some authors suggested that T. karamyschevii was domesticated locally from wild emmer populations [13,78]. However this is unlikely, because T. dicoccoides does not occur naturally in Georgia [3]. Alternatively, the taxon could have originated from T. dicoccon and then been selected for Western Georgia (Dekaprelevich 1941, cit. in [13]). However, the following facts contradicted this assumption: Although the remnants of domesticated emmer have been found in many archeological sites from Transcaucasia [78], this taxon was not recorded in Western Georgia, where T. karamyschevii is distributed [79]. Our data showed that these two wheats significantly differed in their C-banding patterns, and T. karamyschevii, like Georgian T. dicoccon, displayed low karyotypic polymorphism. Karyotypic differences and low polymorphism due to geographic isolation suggested that T. karamyschevii is not a direct descendant of Georgian emmer. Some karyotypic similarities of T. karamyschevii with the EUROPEAN group of domesticated emmer, especially in the C-banding pattern of CHR 2A, indicated that Colchic emmer could have originated from the EUROPEAN group, which was introduced into Georgia in the past.
New insights into the origin of T. dicoccoides subsp. judaicum In our analysis, 102 of 105 wild emmer lines formed a common cluster on the NJ 545 tree (S12 Fig, DICOCCOIDES),which was further subdivided corresponding to the wild emmer races defined by Özkan et al. [3], which are in good agreement with Luo et al. [30].