Genetic variability and population structure of Ethiopian chickpea (Cicer arietinum L.) germplasm

Evaluation of the genetic diversity and an understanding of the genetic structure and relationships of chickpea genotypes are valuable to design efficient germplasm conservation strategies and crop breeding programs. Information is limited, in these regards, for Ethiopian chickpea germplasms. Therefore, the present study was carried out to estimate the genetic diversity, population structure, and relationships of 152 chickpea genotypes using simple sequence repeats (SSR) markers. Twenty three SSR markers exhibited polymorphism producing a total of 133 alleles, with a mean of 5.8 alleles per locus. Analyses utilizing various genetic-based statistics included pairwise population Nei’s genetic distance, heterozygosity, Shannon’s information index, polymorphic information content, and percent polymorphism. These analyses exemplified the existence of high genetic variation within and among chickpea genotypes. The 152 genotypes were divided into two major clusters based on Nei’s genetic distances. The exotic genotypes were grouped in one cluster exclusively showing that these genotypes are distinct to Ethiopian genotypes, while the patterns of clustering of Ethiopian chickpea genotypes based on their geographic region were not consistent because of the seed exchange across regions. Model-based population structure clustering identified two discrete populations. These finding provides useful insight for chickpea collections and ex-situ conservation and national breeding programs for widening the genetic base of chickpea.


Introduction
Chickpea (Cicer arietinum L.) belongs to the family Fabaceae (formerly Leguminosae) and subfamily Faboideae. It is a diploid self-pollinated crop species having chromosome number of 2n = 2x = 16 [1,2] with a comparatively small genome size of 740 Mbp [3]. There are two main types of chickpeas, namely desi and kabuli, however, rarely pea-shaped chickpea types are available. Kabuli seed types (Macrosperma) are large, round or ram head, and cream- study genetic diversity and relationships to identify genetically diverse germplasm with beneficial traits for use in chickpea genome analysis, germplasm characterization, phylogenetic analysis and genetic diagnostics [14,22,25,26]. The use of SSR markers for characterizing Ethiopian Chickpea has been implemented, however, the number of genotypes characterized so far [25,28] were small in number as compared to the total number of genotypes conserved in the Ethiopian Biodiversity Institute (EBI) gene bank. The aim of this study was, therefore, to assess the patterns of genetic structure and the level of genetic diversity and relationships within and between Ethiopian Chickpea genotypes, improved chickpea varieties and breeding lines by using SSR markers.

Plant materials
One-hundred fifty-two chickpea genotypes were considered for this study (Table 1). One-hundred thirty-eight are Ethiopian genotypes (landraces), eight are nationally released varieties from Ethiopian agricultural research centers and six were breeding lines accessed from the  International Center for Agricultural Research in the Dry Areas (ICARDA). The geographical origin of the Ethiopian chickpea germplasm used in the study is indicated in Fig 1. Fig 1 was done using the software DIVA-GIS software [29] using the GPS coordinates of the collection sites (S1 Text). The genotypes were grown at Bakelo Research Station of Debre Brehan Agricultural Research Center in the 2018/2019 cropping seasons. Two weeks after planting, approximately equal amounts of bulk leaf samples were collected from five plants of each genotypes as suggested by Gilbert et al. [30] and then the leaves were stored in plastic Ziplock bags containing Silica gel.

DNA extraction and quantification
Genomic DNA was extracted using the cetyltriethylammonium bromide (CTAB) technique [31] with slight adjustments. The leaf samples were ground into fine powder using pestle and mortar using 250 μl DNA extraction buffers (0.35M sorbitol, 0.1M Tris-HCl (pH 7.6), 0.005M EDTA, and 0.2M Tris-HCl, 0.05M EDTA, 2M NaCl and 2% CTAB, mixed in equal amounts). About 100 mg of ground leaf sample was transferred to 2 ml microcentrifuge tubes and 750 μl of extraction buffer was added. Tubes were maintained at 65˚C for one hour followed by chloroform-isoamyl alchohol (24:1) extraction. The DNA pellet was air dried and dissolved in 100 μl of 1× TE buffer. The quality and quantity of all DNA samples were checked using Nano Drop Spectrometer (ND-2000). In addition, agarose gel (0.8%) was used for checking the quality of the DNA by taking 30 genotypes selected systematically based on the result from the NanoDrop. The working DNA sample was diluted to obtain a final DNA concentration of 30-50 ng μL -1 .

Polymerase chain reaction (PCR) and gel electrophoresis
Twenty-five SSR markers were used to carry out amplifications ( based on polymorphic information content (PIC), allelic richness and herozygosity reports from various scientists [14, 24-26, 32, 33]. These SSR markers were developed from sequence information obtained by various authors [4,24,[33][34][35][36][37][38][39][40]. The description of the primers is indicated in Table 2. PCR reaction was performed with a Hybaid PCR express thermal cycler (Hybaid, UK) after optimizing the amplification conditions for each primer pair in a total volume of 10μl containing 50 ng of DNA,1.5mM MgCl 2 , 0.2mM dNTPs, 0.4 mM each of the forward and reverse primers and 0.05U/μlt Taq polymerase. The PCR was programmed at an initial denaturation step of 3 min at 94˚C followed by 35 cycles of 20 s denaturation at 94˚C, annealing at 55 to 60˚C (depending on the primer) for 50 s, initial extension at 72˚C for 50 s, and final extension at 72˚C for seven mins. Before determining polyacrylamide gel staining, the amplified products were checked for the reproducibility of PCR products using a 2% agarose gel stained with ethidium bromide in a TBE buffer and were visualized on a UVITEC gel doc (UVITEC, UK). The resolution of PCR products was done on 6% polyacrylamide gel in 0.5x TBE buffer with a 6x DNA loading dye. Electrophoresis was carried out on a vertical electrophoresis set up using a standard DNA ladder (100 bp, Solis Biodyne, Estonia). The vertical electrophoresis was run with 100V for two hrs and 30mins, and stained using silver staining developed by Huang et al. [41]. Then gel pictures were taken using digital camera. The band sizes were determined using UVITEC (UVITEC, Cambridge, UK) software. Primer bands that were unclear or absent were sorted and repeated. Non-polymorphic, missing, faint and distorted gels were disregarded at scoring and only records of 23 primers with clear polymorphic bands were considered for statistical analysis.

Scoring SSR data and statistical analysis
Allelic data were recorded for each of the microsatellites markers for each genotype with the help of UVITEC software as well as visually. The allelic data scores locus-based diversity indices including the number of alleles (Na), effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), Shannon's information index (I), number of privet alleles (NPA), fixation index, percent polymorphism and unique alleles were recorded using GenAlEx v.6.502 [43]. Estimates of genetic differentiation were computed by analysis of molecular variance (AMOVA) to partition total genetic variation into within and among population subgroups using GenAlEx 6.502 [43]. PowerMarker 3.25 [44] was used to estimate major allele frequency (MAF), Gene Diversity (GD), and polymorphic information content (PIC). The allelic data scored was used to analyze principal coordinate analysis (PCoA) using Gen-AlEx v.6.502 [43]. Dendrogram tree was constructed based on Nei's genetic distance using

Microsatellite repeats locus diversity
The polyacrylamide gel electrophoresis pictures and the estimated genetic diversity parameters of the SSR locus diversity are indicated in Fig 2 and Table 3, respectively. The Visual observations on the gels of the amplification products of the respective markers revealed the existence of low (Fig 2C and 2D) to high (Fig 2A and 2B) level of polymorphism in the Ethiopian genotypes and exotic genotypes depending on the types of primer used. Among 152 chickpea genotypes a total of 133 alleles with an average value of 5.8 alleles per SSR were recorded. The allelic richness (Na) per locus varied widely among markers, ranging from two (CESSRDB 45, SSR22, and SSR 5) to 16 (TR 1). The number of effective alleles (Ne) ranged between 1.3 (CESSRDB 45) and 7.6 (TR 29), with an overall mean of 3.2. Shannon's information index (I) was ranged from 0.4 (CESSRDB 45) to 2.1 (TR 1 and TR 29) with mean of 1.2. The average observed heterozygosity (0.4) was lower than the expected heterozygosity (0.6) and unbiased expected heterozygosity (0.6). The inbreeding coefficient (Fis) and fixation index (Fit) values

Genetic diversity in chickpea genotypes and population
The genetic diversity indices for chickpea genotypes based on geographic origins is summarized in Table 4. The observed numbers of alleles (Na) were in the range of 3.7 (Exotic Genotypes) to 5.3 (East Gojjam2). The number of effective alleles (Ne) ranged from 2.7 (Exotic Genotypes) to 3.6 (East Gojjam2 and North Shewa). Shannon's information index (I) ranged

Analysis of molecular variance (AMOVA) and partitioning genetic diversity
The AMOVA showed that 88% of the allelic variation was attributed to individual genotypes within populations, while only 12% was distributed among populations ( Table 5). The local population contributed 7% (West Shewa) to 14% (East Gojjam2), while the exotic genotypes contributed 7.6% of the total variation. The value of pairwise comparisons of population

Principal coordinates analysis (PCoA)
The multivariate principal coordinate analysis (PCA) of the molecular data showed that the first 3 coordinates were important and accounted for 26.6% of the variation; PCs 1 (14.0%), 2 (6.9%), and 3 (5.7%). The PCA plots of PC 1 versus PC 2 using factorial analysis of GenAlEx showed the exotic genotypes were clustered in quadrant I entirely, while a wide dispersion of Ethiopian genotypes across the four quadrants (Fig 3) were observed without considering their geographic origin. Genotypes collected from East Gojjam1 clustered in quadrant III (eight genotypes), and IV (13 genotypes) forming small sub-clusters in both quadrants. A single genotype from this zone falls in quadrant I. East Gojjam2 collections clustered in quadrant I (8 genotypes), III (3 genotypes), and IV (12 genotypes) sowing a tendency of forming sub-clusters in each quadrant. Genotypes of North Gondar clustered in quadrant I (5 genotypes) and IV (9 genotypes). The remaining two genotypes were grouped in cluster II and III. The majority of the genotypes collected from Central Gondar were clustered in quadrant III (8 genotypes). The remaining five genotypes and one genotype fall in quadrant II and I, respectively. Genotypes of North Shewa collection were clustered in quadrant I (3 genotypes), II (7 genotypes), III (10 genotypes), and IV (1 genotype). Genotypes from North Wollo formed two sub clusters in quadrant II (12 genotypes). The remaining one and four genotypes clustered in quadrant I and III, respectively. Genotypes from West Shewa were clustered in quadrant I (4 genotypes) and II (8 genotypes). Genotypes of Arsi Bale appeared be widely distributed in all quadrants, I (4 genotypes), II (3 genotypes), III (2 genotypes), and IV (4 genotypes).

Genetic distance
The pairwise Nei's unbiased genetic distances (above diagonal) and unbiased genetic identity values (below diagonal) for all the chickpea populations representing the growing regions are shown in Table 7. The matrix of pairwise Nei's unbiased genetic distances between populations showed a close genetic distance between North Shewa and Central Gondar populations (0.09), North Wollo and North Shewa (0.09), and West Shewa and Arsi-Bale (0.09). On the other hand, the largest genetic distance (0.37) was obtained between population of Arsi-Bale and exotic genotypes. Generally, genetic distances between Ethiopian chickpea population and exotic genotypes were greater than any other combinations of paired populations within Ethiopia. The highest genetic identity value (0.92) was recorded between North Shewa population and Central Gondar population and the lowest genetic identity value (0.68) was recorded between Arsi-Bale and exotic genotypes. The genetic identity pairwise comparisons within genotypes of Ethiopian origins were more than the comparison between exotic with genotypes of Ethiopian origins.

Cluster analysis
A dendrogram tree based on Nei's genetic distances was constructed using PowerMarker V3.25 software. The result from UPGMA based dendrogram shows that nine chickpea populations from different geographic origins were grouped into two major clusters (Fig 4). The first cluster contained the exotic genotype population, while cluster II consisted of the Ethiopian populations. Cluster II was divided into three sub-clusters showing the tendencies of grouping the neighboring regions together. The 152 genotypes were divided into two major clusters ( Fig  5). Cluster I had 14 genotypes which were exclusively from the exotic genotypes. Cluster II was further sub divided into six distinct sub-clusters with variable number of genotypes in each sub-cluster. Sub-cluster 1 consisted of 27 genotypes with the following proportions, 22 (81.5%) from East Gojjam 1, two (7.4%) from East Gojjam 2, and three (11.1%) from Arsi-Bale. Subcluster 2 was composed of 25 genotypes of which 19 (76%) genotypes were from East Gojjam 2 and six (24%) genotypes from North Gondar. Sub-cluster 3 was composed of 14 genotypes of which 9 (64.3%) genotypes were from North Gondar, four (28.6%) from North Shewa and one (7.1%) from Arsi-Bale. Sub-cluster 4 contained 25 genotypes of which two genotypes (8%) were from East Gojjam 2, one genotype (4%) from North Gondar, 12 genotypes (48%) from Central Gondar, and 10 genotypes (40%) from North Shewa. Sub-cluster 5 included 9 (64.3%) genotypes from North Wollo, three (21.4%) genotypes from West Shewa, and two genotypes (14.3%) from Arsi-Bale. Sub-cluster 6 represented a heterogeneous group which constituted 33 genotypes of which two (6.1%) genotypes were from Central Gondar, 8 (24.2%) genotypes

Population structure
The population structure of the 152 chickpea genotypes was analyzed and the results showed that the highest peak was observed at K = 2 indicating the presence of two major clusters ( and Table 8). The result from STRUCTURE analysis further confirmed results of the UPGMA tree clustering. Based on the probable likelihood of each genotype to be grouped into any of the two distinct groups, a total of 85 genotypes (55.9%) were grouped into one of the two populations. The first cluster which was of 43 (28.3% of total genotypes) genotypes were grouped into population 1, the next 42 (27.6%) into population 2. The remaining 67 genotypes (44.1%) were placed in the admixture group (

Discussion
Efficient germplasm conservation and sustainable utilization requires a clear understanding of the genetic structure, diversity, and relationships among chickpea genotypes. This information is also helpful for breeders to identify new sources of germplasm harboring valuable alleles for improving yield, grain quality, and enhancing the level of resistance in cultivated varieties to various biotic and abiotic stresses [14,50]. Molecular diversity and population structure studies using SSR markers for Ethiopian chickpea are limited. Therefore, this work was initiated with the main objective of analyzing the genetic structure, diversity, and relatedness of Ethiopian chickpeas genotypes, improved varieties, and exotic chickpea genotypes received from ICARDA using 23 SSR markers.
Result from SSR analysis indicated the presence of considerable allelic richness per locus, relatively moderate to high PIC, Ho and He values, and the presence of private alleles. High level of genetic diversity indicates the existence of molecular variation among the analyzed chickpea genotypes. High PIC values were also reported by Sefera et al [26], Getahun et al. [28] and Ghaffari et al. [51] which is in agreement with the present study, however a lower number of effective alleles per locus was recorded in the present study in contrast to that of Sefera et al [26] and Getahun et al [28]. This happened because of the different number of accessions, different number of loci examined, and the nature of markers used in each study. However, comparable results were reported from Keneni et al. [25]. The high level of PIC values were an indicator of the efficiency of the markers for diversity studies in chickpea genotypes because a locus, with an estimated PIC value greater than 0.50, is considered to be highly diverse [52]. Nineteen markers had a score of 0.5 and above which indicates that these markers are highly informative SSR markers that could be employed in genetic diversity studies in chickpea. The ability of SSRs to detect intraspecific as well as interspecific variation in chickpea has been demonstrated by many authors [14,28,39,53]. In this study, loci CaSTMS 11, CESSR 42, CESSRDB 54, GA 11, GA 24, SSR 22, and SSR 5 exhibited low-level of observed heterozygosity compared to the expected heterozygosity. Moreover, the high associated fixation index, implies that high levels of inbreeding among the assessed chickpea genotypes, which is expected because chickpea is a self-pollinated crop, previously only 0 to 1.58% of outcrossing was reported [51]. Simultaneously, loci CESSR 62 CESSR 71, NCPGR 45, NCPGR 53, NCPGR 94, SSR 1, SSR 4, TA 18, TR 1, TR 2, and TR 29 had a high-level of observed heterozygosity and low associated fixation index. This indicates that these loci could be associated with the occurrence of higher mutation rates or inbreeding depression [14]. The low level of heterozyogosity observed for the majority of the SSR markers are in accordance with other studies [14,25]. However, higher level of heterozygosity was also reported for some SSR markers [24,28,54,55]. According to Ghaffari et al. [51], allelic frequency <0.03 is considered as low, 0.03-0.20 considered as common, and > 0.20 considered as most frequent. Based on this delineation, rare alleles comprised 7.5% (10 alleles) of all the detected alleles while intermediate alleles accounted for 63.9% (85 alleles). The remaining alleles accounted for 28.6% of the allelic frequency (38 alleles) (data are not included).
All of the nine chickpea populations had a high percentage of polymorphism among the populations with the range of 95.7% to 100% and average of 99.5%. Comparable values of Shannon's Information Index were recorded for all populations. A relatively high number of alleles, effective alleles, and Shannon's information index were recorded in East Gojjam 2, which implies that chickpea genotypes from East Gojjam 2 are more diverse than the remaining chickpea collections of other geographic regions. The low-levels of private alleles were recorded in East Gojjam 2, North Shewa, North Wollo and Arsi Bale. Matus and Hayes [56] suggested that the occurrence of unique alleles could be an indication of the relatively high rate of mutation and diversity at SSR loci. The occurrence of unique or rare alleles has the potential to serve as a source of novel alleles for plant breeding and also provides an immense opportunity for generation of comprehensive fingerprint database for establishing genotype identity [57]. The percentage of polymorphism among Ethiopian chickpea populations discovered by Keneni et al. [25] and Getahun et al. [28] were lower than the present finding. The differences in values for estimated genetic diversity parameters between studies may be explained by different types and numbers of genotypes, different numbers and types of loci examined and perhaps the nature of markers used in each study.
AMOVA results indicate much of the variation was accounted for by the variation within population rather than among populations, suggesting that individual variation was more important for chickpea breeding programs. The low-level of molecular variation among population indicates that the presence of a high number of shared alleles among populations collected from different origins [58]. The exotic genotypes contributed 7.6% to the total molecular variation which provided an opportunity to expand the chickpea gene pool of Ethiopian origin, if there is no complete replacement of local germplasm with the improved ones. A low-level of molecular variation among chickpea populations were also reported from Keneni et al. [25] and Getahun et al. [28] for Ethiopian genotypes and Valadez-Moctezuma et al. [50] for Mexican chickpea. According to Wright [59] the combination of Fst rating, Fst value of 0.00 to 0.05 indicates low, 0.05-0.15 indicates moderate, 0.15-0.25 indicates high, and 0.25 indicates a very high-level of differentiation. Based on this delineation, the Fst score for the present study could be rated as low to moderate level of differentiation among populations with an increased level of admixtures which is the possible reason for the existence of the lowlevel of molecular variation among populations. Similar observation was made in cowpea [60]. The lower level of variation among populations might be attributed to germplasm exchange among regions and this is further confirmed from the result of pairwise gene flow (Nm) values among populations which were scored within the range of 1.12 to 4.87 exhibiting gene exchange among populations. A Nm value greater than 1 is considered an indicator of adequate gene flow among populations [61].
The genetic distance results showed that the genetic distance between each of the Ethiopian populations (eight populations) and the exotic population was higher than any pair of combinations within Ethiopian populations. This indicates that the genetic similarity between the exotic genotype and the Ethiopian populations is low, implying that Ethiopian populations are distantly related to exotic genotypes. However, close distance was estimated among Ethiopian populations collected from different regions, indicating that the highest genetic similarity was existed among Ethiopian chickpea genotypes. These results are in agreement with findings from Keneni et al. [25] and Getahun et al. [28]. In addition, UPGMA dendrogram tree of nine chickpea populations based on origins showed tendencies to be grouped together which indicates that the patterns of genetic relationships are among proximity areas of collections.
PCoA result indicates that the Ethiopian genotypes were uniformly distributed in the four quadrants regardless of their geographic origins, while the exotic genotypes were grouped in quadrant I forming sub-clusters which are distinct from the local genotypes. Genotypes from East Gojjam 2, North Shewa, and Arsi-Bale were highly diverse because they were evenly distributed in the three quadrants regardless of their geographic origins. However, some Ethiopian genotypes and the exotic genotypes appeared to follow geographic origins from which the genotypes were obtained. This result is supported by earlier studies using SSR markers [14,25,42]. The distinct identity of the exotic genotypes could be a consequence of deliberate selection criteria followed by the breeders in the development of these varieties [14].
The dendrogram tree constructed using the UPGMA clustering algorithm, clearly delineated the genotypes into two major clusters, Cluster I and Cluster II. Cluster II sub-divided into six sub-clusters, each consisted of variable number of genotypes. The exotic genotypes grouped in a single cluster. Results generated from dendrogram were also in agreement with those of the PCoA result. The patterns of genotypes clustering based on their geographic region were not consistent because some genotypes were grouped together according to their geographical proximity. This situation implies genetic distance doesn't follow geographical distance. Similar trends were reported by earlier works in chickpea [24,28,50]. The most probable reason could be seed exchange, and/or trade between farmers, leading to gene flow across boundaries within those areas. The dendrogram did not indicate any clear divisions between desi and kabuli type chickpea in the exotic genotypes. This may be due to the markers used for this experiment were not directly related with the characteristics that differentiate kabuli from desi type chickpea [50]. However, various authors have reported that the clustering of chickpea genotypes appears to follow geographic distribution from where these germplasm lines were obtained [41, 51, 53, 55] and Sefera et al. [26] and Getahun et al. [28] showed SSR markers in discriminating kabuli genotypes with that of desi genotype.
Applications of model-based clustering methods in the STRUCTURE software is helpful to demonstrate the presence of population structure, identify distinct genetic populations, assign individuals to populations, and identifies admixed individuals [47]. In the present study, a structured population in chickpea was revealed, and was divided into two groups. The analysis of population structure revealed similarity with the results obtained from UPGMA clustering. The chickpea genotypes used for this study evolved from two population types showing varying degrees of introgression of the two types into respective genotypes. Structure is considered to be uniform when more than 80% of the accessions in one group have more than 80% membership of the group [14,47]. There were no genotypes showing uniform structure with 100% membership in their cluster, indicating that the existence of gene flow or introgression was apparent. According to Gemechu et al. [25] result Ethiopian chickpea germplasm of different collection of this study were grouped into five clusters of distinct genetic populations. They proposed that the genotypes resulted from independent evolutionary mechanisms (genetic drift, mutation, migration, selection, and in flux/out flux of genes in the form of germplasm exchange) that split them into discrete gene pools. Gene introgression is critical for breeders for variety development programs because it provides essential trait combinations such as improved agronomic features, high resilience to environmental challenges, diseases, and insects, as well as other benefits such as improved nutritional quality [14]. It is also applicable to broaden the genetic base of chickpea genotypes through crossing programs.

Conclusions
The magnitude and pattern of genetic variation was estimated, which indicated that a considerable genetic diversity existed in Ethiopia chickpea genotypes. The results also further confirmed the efficiency and effectiveness of SSR markers to study genetic diversity in chickpea. This result will have a direct applicability for efficient and systematic conservation and sustainable utilization of germplasm. This result can assist chickpea breeders in selecting diverse parental materials for crossing activities to take the advantage of heterosis value. The results are also helpful for genebank managers because there are large numbers of genotypes clustering in one group collected from the same locality and it seems that these genotypes are duplicated genotypes which are the major problems in germplasm conservations. To reduce the high amount of redundancy in germplasm collections, techniques including deliberate bulking and the establishment of core collections must be implemented. Though this work provided preliminary information regarding the existences of genetic diversity, studies related to marker traits association are required. Therefore, a comprehensive study to map the associations of the markers with agronomic traits of economic importance is required.
Supporting information S1 Text. Passport data for 152 chickpea genotypes used for the study. (XLSX)