Population structure and genetic diversity analyses of common bean germplasm collections of East and Southern Africa using morphological traits and high-density SNP markers

Knowledge of genetic diversity in plant germplasm and the relationship between genetic factors and phenotypic expression is vital for crop improvement. This study's objectives were to understand the extent of genetic diversity and population structure in 60 common bean genotypes from East and Southern Africa. The common bean genotypes exhibited significant (p<0.05) levels of variability for traits such as days to flowering (DTF), days to maturity (DTM), number of pods per plant (NPP), number of seeds per pod (NSP), and grain yield per hectare in kilograms (GYD). About 47.82 per cent of the variation among the genotypes was explained by seven principal components (PC) associated with the following agronomic traits: NPP, NFF (nodes to first flower), DTF, GH (growth habit) and GYD. The SNP markers revealed mean gene diversity and polymorphic information content values of 0.38 and 0.25, respectively, which suggested the presence of considerable genetic variation among the assessed genotypes. Analysis of molecular variance showed that 51% of the genetic variation were between the gene pools, while 49% of the variation were within the gene pools. The genotypes were delineated into two distinct groups through the population structure, cluster and phylogenetic analyses. Genetically divergent genotypes such as DRK57, MW3915, NUA59, and VTTT924/4-4 with high yield and agronomic potential were identified, which may be useful for common bean improvement.


Introduction
Common bean (Phaseolus vulgaris L. 2n = 2x = 22) is one of the principal grain legume in the world.In Africa, itis the most important source of dietary protein [1] and the third most important source of calories after maize (Zea mays L.) and cassava (Manihot esculenta Crantz), serving millions of low-income households [2].The global production of common bean is nearly 12 million tons per annum.The East and Southern Africa regions produces about 2.5 million tons per annum [3].Approximately 40 per cent of Africa's production is marketed for about 450 million US dollars [4], and small holder farmers account for the bulk of the cultivated crop.
The average yield for common bean in Southern Africa is very low (<200 kgha -1 ) compared to the global average of 2,000 kgha -1 [5,6].The low productivity of common bean is attributable to an array of biotic and abiotic constraints.Therefore, there is a need to develop high yielding and stress-tolerant cultivars to improve productivity.The successful development and deployment of improved cultivars depend upon available genetic diversity and appropriate breeding strategies.
Genetic variation in the common bean is derived from two major gene pools, which are primarily differentiated by their centres of diversity.These gene pools are from the Mesoamerica centre of diversity that extends from Colombia to Northern Mexico and the Andes covering the area from North-Western Argentina to Southern Peru [7].The differences between these two gene pools have been reported through several genetic and morphological studies in selected agro-ecologies [8,9].The accessions of Andean origin are described as large-seeded, while the Mesoamerican accessions are small-seeded types [8].
After the introduction of the common bean in the 16 th and 17 th centuries in Africa [10], the crop has undergone natural and human selection pressure resulting in genetic divergence compared with the original Andean and Mesoamerican gene pools.Gene flow among different gene pools and or within races through natural cross-pollination has resulted in the diversification of landraces in Southern Africa.Following years of selection and adaptation, the landraces found in Southern Africa have evolved as distinct types with distinguishable morphological features.The East and Southern African regions are now recognized as secondary centres of genetic diversity for common bean [11].Thus, germplasm from the East and Southern African regions complement the original gene pools and provide essential genetic variation for breeding.Assessing the genetic diversity among genotypes collected from different geographical locations is important to understand genetic composition and gene loci differentiation in common bean for cultivar development [12].
Knowledge of genetic diversity in plant germplasm and the interrelationship between genetic markers and phenotypic expression is vital for crop improvement.This will enhance efficiency during germplasm management, selection, and cultivar development [13,14].Diversity studies in common bean utilize both morphological and molecular markers [15][16][17].However, morphological markers are highly affected by environmental variance, which reduces selection efficiency during cultivar development.The use of molecular markers has gained prominence for genetic diversity assessment because they are not affected by environmental conditions.Their determination is mostly automated, which reduces human experimental errors.Molecular markers, including random amplified polymorphism DNA (RAPD), simple sequence repeats (SSR), amplified fragment length polymorphism (AFLP), and single nucleotide polymorphism (SNP) have been used widely in genetic studies on common bean [8,[18][19][20].Recently, SNP markers have gained prominence in genetic diversity studies in common bean [17,21].Their prominence has increased because SNP markers are more abundant across the genome, highly reproducible, and can be easily used in automated systems [17,21].The advent of the next-generation sequencing platform has enabled the discovery of more than a million SNP markers in common bean.These SNP markers have been used to develop linkage maps, mapping of quantitative trait loci (QTL), map-based gene cloning, marker-assisted selection, and exploration of genetic diversity [22][23][24][25].Common bean breeding programs in East and Southern Africa can benefit from assessing genetic diversity in different germplasm using SNP markers.This will enable effective genetic management and accelerated genetic advancement for cultivar development.
National and international germplasm exchange and informal trade have resulted in considerable gene flow among germplasm collections between East and Southern African countries over the last 30 years [26].Although both locally available germplasm and introductions have been used as cultivars in East and Southern Africa [21], the functional genetic diversity among these genetic resources is yet to be fully explored for efficient breeding.In Africa, the characterization of crop genetic resources has been mostly focused on phenotypic evaluation with limited use of genomic tools.Few studies assessed genetic diversity and deduced population structure based on sources of collection, races, and gene pools [12,27].Some studies sought to evaluate gene flow among different populations using molecular markers such as simple sequence repeats (SSR) [11,28].Assessing gene flow among germplasm collections from diverse geographical locations has been a proxy for estimating potential genetic diversity among common bean germplasm for breeding.The evolution of landraces, cultivars, and lines from different locations due to differences in selection pressure results in genetic divergence from the original Andean and Mesoamerican gene pools.Hence, a large proportion of common bean genetic resources remains uncharacterized and under-utilized [29].For instance, genetic variation for bean fly resistance has not been widely assessed, and genetic studies on bean fly resistance within East and Southern African common bean germplasm collections are scarce despite their importance as sources of genetic diversity.In addition, the genetic basis for adaptive traits against stresses such as bean fly infestation is still to be elucidated [30].This is partly attributable to phenotyping difficulties for bean fly resistance and a lack of systematic and efficient screening procedures [29].Thus, there is also a need to improve phenotyping procedures to generate complementary phenotypic data for genetic diversity studies.Therefore, the objectives of this study were to understand the extent of genetic diversity and population structure in 60 common bean germplasm collections from East and Southern Africa.

Germplasm
The germplasm used in this study consisted of 60 common bean genotypes collected from the Malawi Department of Agricultural Research Services (DARS/Malawi) and the International Centre for Tropical Agriculture (CIAT), (Table 1).Forty-five genotypes were obtained from DARS, Malawi, which included conserved landraces and released cultivars.

Phenotyping trials
The 60 genotypes were evaluated in the field for agronomic performance.The genotypes were established at the Lilongwe University of Agriculture and Natural Resources (LUANAR), Bunda Horticulture Research farm (33.46˚E and 13.10˚S) in two years (2017/2018 and 2018/ 2019) during the main production seasons (November and April).The average rainfall per annum is 950 mm.The summer rainy season starts in November and ends in May.The site's mean minimum and maximum annual temperatures were 16.5˚C and 22.4˚C, respectively.The site has dark loamy clay soils with soil pH of 5.8.The genotypes were planted in a 6 ×10 alpha lattice design with three replications.Each genotype was planted on a 3.00 m 2 plot consisting of two 4m long rows.The spacing between row to row was 0.75 m, and between plant to plant was 0.10 m.Standard common bean cultivation practices were followed.

Phenotypic data collection
Phenotypic data on qualitative and quantitative traits (Table 2) were collected following the International Board of Plant Genetic Resources [30].The assessed qualitative traits were leaf shape (LS), flower colour (FC), growth habit (GH), leaf hairiness (LH), pod colour (PD), seed pattern (SP), seed colour (SC) and seed size (SS).Eleven quantitative traits were recorded: the number of nodes on the main stem from the base to first flower (NFF), recorded as a mean of five randomly selected plants per plot.The internode length from the first to the fifth node (FIL) was recorded as a mean length between the first and fifth nodes on the main stems of the five plants per plot.The width (WTL) and length (LTL) of the fifth trifoliate leaf were recorded as averages of trifoliate leaves measured on the sampled five plants.The days to 50 per cent flowering (DTF) were recorded as the number of days from the date of planting and to the date when 50 per cent of the plants in a plot had visible flowers, while the days to 90 percent maturity (DTM) were counted from the date of planting to the date when 90 per cent of the plants in a plot had reached physiological maturity.The number of pods per plant (NPP) was recorded as the average number of pods counted on five randomly selected plants at harvest.The number of seeds per pod (NSP) was recorded as the total number of seeds divided by the number of pods from five randomly selected plants at harvest.The seed length (SL) was recorded as the average length of five randomly selected seeds.Grain yield (GYD) was the weight of shelled grain harvested from all plants in a plot and converted to kilograms per hectare after adjusting for 12 per cent moisture content and according to plot size following [33].
Where, GYD is grain yield per hectare in kilogram, and MC is the percentage moisture content of the grain at harvest.Hundred seed weight (HSW) was recorded as the weight of 100 randomly selected seeds after adjusting to 12 per cent moisture content.

Statistical analysis of phenotypic data
The frequency and significance tests of qualitative traits recorded among test genotypes were computed using the cross-tabulation procedure of SPSS version 26 [31].Data on quantitative phenotypic traits were subjected to analysis of variance in GenStat 18 th edition [32].Genotypes mean for quantitative traits were separated using the Fischer's Unprotected Least Significant Difference at 5 per cent significance level.Further, multi-variate traits relationships among genotypes were deduced using the categorical principal component analysis (CATPCA) based on principal components (PC) with Eigen values above 1.00 in R software [33].A communality value for each trait was calculated as the sum of squares of the PC loadings following [34] to identify well represented traits across the PCs.

Genotyping
DNA extraction and genotyping.The 60 genotypes were profiled using SNP markers.The genotypes were planted in a greenhouse in seedlings trays and raised to the three-leaf stage.Genomic DNA was extracted from fresh leaves of the seedlings following the plant DNA extraction protocol of the Diversity Array Technology (DArT) [35].After extraction, the DNA quality was checked for nucleic acid concentration and purity using a NanoDrop 2000 spectrophotometer (ND-2000 V3.5, NanoDrop Technologies Inc).The genomic DNA was shipped to Biosciences Eastern and Central Africa (BecA) hub of the International Livestock Research Institute (BecA-ILRI) in Nairobi, Kenya, for genotyping by sequencing.The DArTseq protocol was used to genotype samples using 17,190 silico DArT assigned to 11 chromosomes of the common bean.The quality of the SNP markers was determined by reproducibility and call rate [36].The SNP markers used were of high quality with reproducibility values of 1.00, polymorphic information content (PIC) values ranging from 0.020 to 0.50, a mean call rate of 0.93 per cent ranging from 0.84 to 1.00.After eliminating the SNP markers with unknown chromosome positions and filtering markers with more than 10 per cent missing data, a total of 16 565 DArT silico were recovered and used in the analysis.Genetic parameters and population structure analysis.Genomic data were imputed using the optimal imputation algorithm on the KDCompute server (https://kdcompute.igssafrica.org/kdcompute/).The polymorphic information content (PIC), minor allele frequency (MAF), observed heterozygosity (H o ), genetic distance (GD), inbreeding coefficient (F is ), and fixation index (F st ) were estimated using the R package "adegenet" [37].The population structure was determined by STRUCTURE2.3.4 software [38].The length of the burn-in period and Markov Chain Monte Carlo (MCMC) were set at 10,000 iterations, and the model was run by varying the number of clusters (K) from 1 to 10 with 10 alterations for each K.The appropriate K value was estimated by implementing the Evanno method using the STRUC-TURE Harvester program [39].
Analysis of molecular variance (AMOVA) and genetic diversity was performed using Power Marker V.3.25 [40] after grouping the accessions based on the gene pool and biological category as either landrace, breeding lines, or released varieties.
A joint analysis of phenotypic and genotypic data was conducted.A phenotypic distance matrix was generated based on Gower's distance, while the genotypic distance matrix was generated using Jaccard's coefficient.A combined matrix was developed from the summation of the genotypic and phenotypic matrices.The phenotypic, genotypic and combined matrices were used to generate hierarchical clusters using the package "cluster" in R software [43].A comparison of the hierarchical clusters was conducted using the tanglegram function in "dendextend" package in R software [41].

Phenotypic diversity and population structure analyses
Variation based on qualitative phenotypic traits.The frequencies of eight qualitative traits and significant tests among the 60 test genotypes are presented in Table 3. Highly significant differences (p<0.001) were detected among the test genotypes for all assessed qualitative traits.The majority of the accessions (38 per cent) had ovate shaped leaves, while 32 per cent possessed cordate shaped leaves, 19 per cent hastate and 10 per cent had rhombohedral leaves.Additionally, 55 per cent of the test accessions had smooth-surface leaves, while 33 percent of the accessions had partially smooth leaves.Only 13 percent of the accessions had hairy leaves.The frequency of accessions with determinate growth habit was 35 percent.In contrast, the remainder of the accessions were indeterminate types that were further classified into three sub-groups: type II, III, and IV with relatively similar frequencies (Fig 1A and 1B).There were two main types of flower colour (Fig 1C and 1D).Fifty-nine percent of the test genotypes had white flowers, and 41percent had purple flowers.The test genotypes exhibited four distinct pod colours; green, red striped, black striped, and brown striped with respective frequencies of 77, 13, 6, and 5 percent.The tested accessions showed prominent variation in seed colour, size, and shape (Fig 2A -2F).There were a total of 11 seed colour types, while the seed classes consisted of the small, medium, and large seed sizes.

Genetic diversity and population structure based on SNP markers
Population allelic diversity.The mean MAF was similar among Andean and Mesoamerican gene pools and breeding lines, landraces, and released varieties (Table 7).In addition, the SNP markers were moderately informative with a mean PIC value of 0.22, while the tested accessions were moderately heterozygous with a mean heterozygosity value of 0.45.The genotypes from the Mesoamerican gene exhibited higher heterozygosity (0.52) than the Andean genotypes (0.44).The varieties and landraces exhibited slightly higher than the breeding lines.The breeding lines exhibited the highest inbreeding coefficient of -0.68 compared to -0.60 exhibited by released varieties.
Population structure.The population structure analysis delineated the 60 common bean genotypes into two groups based on the highest ΔK at K = 2 following the Evanno method (Fig    backcross of CAL96/CAL96//G14519 for high iron and zinc content [42].Additionally, Nantupa (E15) and Nyambitira (E3) NARS lines were clustered in the same sub-cluster as expected.These lines were half-sib families bred for bruchid resistance by DARS, Malawi.Nyambitira was derived from a cross of KK03 x KK25, and Nantupa derived from a Nagaga x KK25.Within sub-cluster I-b; breeding lines and SMC41 (E6), SMC104 (E86) and SMC166 (E31) were clustered in one sub-group.These lines were advanced backcross selections of SMC47/SN40//SCR1/SMC21. Conversely, NUA45 (E27) bred for high iron and zinc was found in sub-cluster II-b with its parental line CAL96 (E51).
Genetic differentiation among populations.The accessions were grouped into their gene pools, Mesoamerican or Andean, and their biological categories defined as breeding lines, landraces or released varieties.These were subjected to molecular analysis of variance.Results revealed that the variation within gene pools and among gene pools was significant (P<0.001)(Table 9).The variation between the gene pools accounted for 51 percent, while within the gene pool variance accounted for 49 percent of the total variation.Further, the variance was partitioned among breeding lines, landraces, and released varieties, showing no significant variation among the biological types.Within biological variance accounted for the total variation exhibited by the biological types.The extent of genetic differentiation (F st )  among the biological categories ranged from -0.600 to -0.635 (Table 10).The highest F st value was observed between landraces, while the lowest F st value was between released varieties and breeding lines.Combined analysis of phenotypic and genotypic data.The hierarchical clusters based on phenotypic and genotypic data revealed that the genotypes could be clustered into heterogeneous clusters.The phenotypic cluster showed that the first cluster was dominated by red seed coated Andean landraces obtained from Malawi (Fig 5).The second cluster comprised a mixture of landraces, varieties and breeding lines with red or cream coloured seeds.The cluster also included genotypes of mixed colours.The genotype cluster dendrogram grouped the genotypes into six heterogeneous clusters (Fig 6).The clusters were irrespective of sources of origin or colour of seed coat.The joint matrix revealed three different sized clusters among the genotypes (Fig 7).The largest cluster comprised Andean red seed coloured genotypes, while the smallest cluster was made up of Andean genotypes with red mottled seed colour.The tanglegram revealed that a considerable number of genotypes (about 40 per cent) maintained their positions in both the phenotypic and genotypic hierarchical clusters (Fig 8).Only two genotypes, E9 and E10 (MW3969) maintained their clusters and positions.

Discussion
Significant genotypic variations were observed among the tested common bean genotypes across two testing seasons for quantitative traits such as DTF, DTM, NPP, HSWT and GYD (Table 4).This suggested that the test genotypes harbour a genetic diversity to select complementary lines for breeding purposes.Variation in phenotypic traits among genotypes reflects the underlying differences in their genetic constitution [43].The panel consisted of genotypes from the Mesoamerican and Andean gene pools, which evolved under different selection pressures and environmental adaptation resulting in morphological and physiological differentiation.These landraces exhibit intrinsic genetic variation for key quality traits compared with accessions introduced from CIAT.The variation suggests that differential selection pressures impacted their evolution, resulting in genetic diversity observed among the landraces.The differential selection pressure is attributable to variability in climatic conditions, agronomic practices, natural selection and artificial selection by farmers over a long agricultural history.For instance, genotypes MW3915 (entry number E70), MW3966 (E40), MW3241 (E44) and MW3955 (E11) sourced from smallholder farmers from Malawi attained higher yields than CIAT genotypes such as SER124 (E74), A344 (E89), A286 (E28), SUGAR134 (E39) and NUA45 (E27).This may be attributed to the differences in genetic constitutions, adaptation to the climatic conditions, and local production practices in Malawi.
Qualitative traits such as growth habit, seed size and seed colour are important traits to farmers and consumers and are critical determinants for cultivar adoption [44].For instance, a high frequency of accessions with smooth leaf types compared to non-smooth types suggests a long history of selection by farmers [45].Farmers and consumers are also known to have preferences related to seed size, colour and shape.In Malawi, varieties with large seed sizes are preferred over varieties with medium and small-sized seeds.The most preferred seed coat colours in the country include red, red mottled and red speckled [46].Traits such as seed coat colour, shape and size are usually controlled by a few major genes and present few challenges during selection [47].In contrast, traits such as grain yield and maturity are polygenic and more difficult to improve by direct selection [48].Therefore, to enhance varietal adoption among farmers, varietal development must incorporate both high grain yield potential and farmers-preferred quality traits through the recurrent selection for qualitative and quantitative traits [16,49].
The differences in agronomic traits provide opportunities to select accessions that are suitable for diverse environments.The extent of genetic variation among genotypes in a breeding population or germplasm collections maintained at gene banks is a fundamental requirement for any crop improvement program [15].For instance, farmers and breeders can select accessions with early maturity for environments with short rainy seasons as a mechanism to escape terminal drought stress.The significant variation in DTF and DTM observed among the accessions (Table 4) is important, especially for developing cultivars for drought-prone environments where early flowering and maturity contribute to drought escape.Earliness to flowering and maturity are desirable traits, especially in Southern Africa, where rainfall seasons are progressively becoming shorter due to climate change [16].Long maturity type genotypes such as MW227 (E62), MW3945 (E91) and MW4012 (E22) from the DARS, Malawi gene bank and breeding lines such as DRK57 (E52), NUA35 (E63) from CIAT are useful genetic resources for long season rainfall environments.
The first two PCs (Table 6) revealed low morphological variation (34 per cent) among the evaluated genotypes, which suggest that there was a need for a higher number of components to discriminate the genotypes adequately.The inclusion of qualitative traits with discrete categories reduced the effectiveness of the PCs to explain the variation.In addition, the inclusion of breeding lines and commercial cultivars with a narrow range of genetic diversity also reduces the effectiveness of PCs [50].Similarly, other studies have reported low variation for the first two principal components [14,51].The first two PCs explained only 33 percent of the total phenotypic variation in Brazilian common bean germplasm [51].All the traits exhibited high communalities values across all the important PCs showing that the traits exhibited wide variation important in discriminating the genotypes.However, the study identified NFF, PD, SC and DTF as the most important descriptors based on their communalities values and will be useful for germplasm characterization and breeding.Genetic variation in GYD implies that superior genotypes with high GYD could be identified for developing breeding populations for common bean improvement using the test population.
The highest delta K value occurred at K = 2, which indicated that the 60 genotypes could be delineated into two sub-populations (Fig 4A).Similarly, the dendrogram clustered the accessions into two main clusters with two sub-clusters each (Table 8).The population structure analysis grouped the accessions into Andean and Mesoamerican gene pools in general.The results were consistent with previous reports on common bean, which reported these two major groups [36,52].The population structure also revealed that there were admixtures of common bean genotypes, which could be attributed to the inclusion of landraces in the study.
The Eastern and Southern Africa regions are recognized as centers of genetic diversity for common bean [11], and the germplasm adapted to these regions may no longer conform to the large Andean or Mesoamerican gene pools.In Malawi and most Eastern and Southern Africa countries, varietal mixtures in the common bean are common due to cropping practices, limited knowledge on the pedigree of bean types, and a lack of preference for varietal purity among consumers [21].Varietal mixtures promote gene introgression through the natural crossing, thereby narrowing the genetic base [53].The consequences of a narrow genetic base include low genetic gains and crop vulnerability to biotic and abiotic constraints [54].The existence of admixtures requires fingerprinting to establish gene introgression and eliminate duplicate accessions to reduce the cost of germplasm management and facilitate the broadening of the genetic base in common bean.
Polymorphic information content values reveal the usefulness of particular markers in diversity studies [55].In the present study, the mean PIC value was 0.22 (Table 7), which indicated that the SNP markers used were considered to be less to moderately informative.This could be due to the bi-allelic nature of SNP markers, which restrict PIC values to � 0.5 [13,56] and the low mutation rate of SNP markers [57].Generally, SNP provides higher resolution in genetic studies, although they exhibit lower PIC values compared to other markers such as simple sequence repeats [58].
The mean observed heterozygosity (Table 7) in this study was 0.45, which was moderate and suggested that both recessive and dominant alleles were present in the germplasm.The similar heterozygosity values among the different types of genotypes showed that the genotypes contained both alternate alleles.The moderate heterozygosity also indicated that some of the accessions were possibly derived from uncontrolled outcrossing or were segregating at a number of loci.Common bean is naturally self-pollinating and would be expected to have lower heterozygosity estimates, as most loci would be homozygous [59].It is important to have both recessive and dominant alleles expressed in a population to select adapted genotypes, although high expression of recessive alleles may drag selection efforts [60].Variation in the magnitude of observed heterozygosity in common bean has been reported in several studies [26,28,61].The differences could be attributed to the different germplasm used during evaluation.Previous studies on African common bean germplasm only considered landraces, while in the current study the test germplasm included breeding lines, landraces and varieties adapted to different ecologies.
Allele frequency information is useful in establishing the level of genetic differentiation in populations [62].The low mean MAF of 0.24 found in this study for the whole population and low MAF values for breeding lines, landraces and varieties (Table 7) suggested a limited number of rare variants among the accessions, which indicate that the majority of genotypes shared common alleles.This implies that the successful use of the test population in a breeding program will depend on devising suitable selection strategies that can increase the expression of rare variants in the progeny and exploit their breeding value.Similarly, the mean MAF of 0.23 based on SNP markers was reported in Brazilian common bean core collection [38].
The low fixation index among the sub-populations in this study (Table 7) indicated low genetic variation among the populations and that the sub-populations were also genetically related.Fixation indices less than 0.05 indicate low genetic diversity between 0.05-0.15moderate and greater than 0.15 indicate high divergence of genotypes [63].In common bean, Fst values as low as -0.02 have been reported previously [64].The main contributor to the high similarity among these populations is high gene introgression through artificial and natural outcrossing of common bean in improvement programs and farmers' fields, respectively [8,11].The lowest fixation index recorded between breeding lines and landraces are concomitant to their shared ancestry.Breeding programs in Malawi often use the CIAT lines as breeding parents, and CIAT released most landraces cultivated in Malawi in partnership with DARS, Malawi.This is revealed by the low F st between released varieties and the landraces.
The tanglegram comparing between phenotypic and genotypic clustering show that phenotypic and genotypic clusters were independent.The inconsistency between phenotypic and genotypic clusters is caused by environmental variance.Genotype × environment interaction confounds phenotypic performance, which reduces the correlation between genotype and phenotypic expression [65].The genotypes used in this study consisted of diverse genotypes with different adaptation, which lead to deviation from their genetic potential.Inconsistencies between genotype and phenotype expressions have been reported previously in common beans (Phaseolus spp) [66].A combined dendrogram based on genotypic and phenotypic data improves precision in genetic analyses of germplasm [67,68].The differential clustering of genotypes in the combined dendrogram showed that the combined dendrogram was independent of the phenotypic and genotypic matrices and can be used for more informative analysis [68].

Conclusion
The present results showed that the test genotypes exhibited phenotypic variation under pinned by genetic diversity, which will facilitate selection and development of breeding populations for common bean improvement.The accessions exhibited a wide variation in traits such as FC, NNF, DTF, NPP, GH, DTM and GYD.Genetic analysis revealed that the accessions were divergent, although they could only be delineated into two populations clusters based on their origin.The variation between the clusters accounted for 51% while within cluster variation accounted for 49% of the total variation.The significant variation between the clusters was attributed to the differences in the evolution of Mesoamerican and the Andean gene pools.Improvement of common bean using this population would be achieved by developing breeding populations from crosses involving genetically divergent and superior parental lines of Mesoamerican origin such as SER124, A344 and UBR(92)25 crossed with Andean genotypes including DRK95, NUA59 and VTTT924/4-4.The narrow population structure and low genetic differentiation estimates showed that the genetic diversity in the present common bean germplasm should be harnessed by targeted crosses and new introductions to facilitate efficient selection and improvement.The discrepancy between genotypic and phenotypic analyses in identifying divergent genotypes highlighted that environmental variance was significant and measures to minimize its impact on selection should be employed.

DF = degree of
freedom, Rep = replication, LTL = length of the fifth trifoliate leaf, WTL = width of the fifth trifoliate leaf, FIL = length between first node to fifth node of the main stem, NFF = number of nodes at first flower, DTF = days-to-50per cent flowering, DTM = days-to-physiological maturity, NPP = number of pods per plant, NSP = number of seeds per pod, SL = seed length, HSWT = hundred seed weight, GYD = grain yield, of the fifth trifoliate leaf (cm), WTL = width of the fifth trifoliate leaf, (cm) FIL = length between first to fifth node of the main stem (cm), NFF = number of nodes at first flower, DTF = days-to-50per cent flowering, DTM = days-to-physiological maturity, NPP = number of pods per plant, NSP = number of seed per pod, SL = seed length (cm), HSWT = hundred seed weight (g/100 seed), GYD = the grain yield (kg/ha).a See PC = principal component, PD = pod colour, DTF = days-to-50percent flowering, DTM = days-to-physiological maturity, FIL = length between first node to fifth node of the main stem, GYD = grain yield, HSWT = hundred seed weight, LTL = length of fifth trifoliate leaves, NFF = number of nodes at first flower, NPP = number of pods per plant, NSP = number of seed per pod, WTL = width of the fifth trifoliate leaf, GH = growth habit, SS = seed size, SP = seed pattern, SL is the seed length, SC = seed colour, LH = leaf hairiness, FC = flower colour.https://doi.org/10.1371/journal.pone.0243238.t0064A).The two identified groups were relatively similar in number (Fig4B).Group I consisted of 52 percent of the test genotypes, which were mainly large-seeded.Group II had 48 percent of the test genotypes and comprised of the small-seeded bean types belonging to the Mesoamerican gene pool.Genotypes NUA45 (E27), NUA59 (E85), CAL143 (E51) and CAL96 (E78) belonging to the Andean gene pool, were clustered in Group I.The Mesoamerican types such as genotypes A222 (E76), A55 (E13) and A429 (E73) were grouped along with the smallseeded bean genotypes in Group II.Cluster analysis.Cluster analysis based on SNPs markers grouped the 60 common bean genotypes into two main genetic groups (Table8).Cluster I contained 50 genotypes, which was further divided into two sub-clusters (I-a and I-b).Sub-cluster I-a contained the genotype A429 (E73) only and I-b comprised of the rest of the genotypes.Similarly, Cluster II was divided into two sub-clusters: II-a and II-b.Sub-cluster II-a contained the genotype MW3960 (E36), and II-b comprised of the rest of the genotypes.Sub-clusters I-b and II-b were further divided into distinct sub-clusters based on origin, pedigree, morphology and agronomic performance.Genotypes NUA35 (E63), NUA59 (E59) and CAL96 (E78) and the breeding lines were clustered in the same sub-cluster II-b.NUA35 and NUA59 were derived from the

Table 8 . Clustering of 60 common bean genotypes based on 16,565 SNP markers. Cluster Entry code of genotypes a FST He
a See Table 1 for genotype codes.https://doi.org/10.1371/journal.pone.0243238.t008