Genetic diversity, SNP-trait associations and genomic selection accuracy in a west African collection of Kersting’s groundnut [Macrotyloma geocarpum(Harms) Maréchal & Baudet]

Understanding the mechanisms governing complex traits variation is a requirement for efficient crop improvement. In this study, the molecular characterization, marker-trait associations and the possibility for genomic selection in a collection of 281 Kersting’s groundnut accessions were carried out. The diversity panel was phenotyped using an Alpha lattice design with two replicates in two contrasting environments. Accessions were genotyped using genotyping by sequencing technology. Genome-wide association analyses were performed between single nucleotide polymorphism markers and yield-related traits across tested environments. SNP markers were used to calculate the observed (Ho) and expected heterozygosity (He), and the total gene diversity (Ht). Genetic differentiation among accessions across ecological regions of origin was analysed. Our results revealed 493 quality SNPs of which 113 had a minor allele frequency>0.05, a total gene diversity of 0.43 and average Ho and He values of 0.04 and 0.22, respectively. Four clusters, highly differentiated by seed coat colour (Fst = 0.79), were identified. The population structure analysis showed two subpopulations with high differentiation across ecological regions (Fst = 0.37). The GWAS revealed 10 significant marker-trait associations, of which six SNPs were consistent across environments. The genomic selection through cross-validation showed moderate to high prediction accuracies for leaflet length, seed dimension traits, 100 seed weight, days to 50% flowering and days to maturity. This demonstrates the existence of genetic variability within Kersting’s groundnut and shows the potential for the improvement of the species. The findings also provide a first insight into the phenotype-to-genotype relationships in Kersting’s groundnut, using SNP markers.

Introduction are abundant, highly polymorphic and informative to reveal with accuracy the existing diversity within crop species at the nucleotide level [5,27,28]. Moreover, the exploitation of the existing genetic diversity for cultivar development requires a clear understanding of the relationships between the genome and agronomic traits. Genome-wide association study (GWAS) is one of the popular genomic approaches to decipher genetic mechanisms controlling the variation of phenotypic traits. Among other advantages, the GWAS is a powerful tool offering a first insight into the genetic architecture of phenotypic traits variation [29][30][31]. Furthermore, the rapid and efficient selection of superior genotypes in Kersting's groundnut breeding requires the development and the application of strong genomic selection (GS) and genomicenabled prediction (GP) models. Unlike GWAS where markers are associated with traits of interest, GS is an integrated strategy exploiting molecular markers to advance breeding populations based on genetic estimated breeding values (GEBVs), which is particularly effective for complex traits like yield and flavour [32,33]. Genomic selection accelerates the flow of candidate genes from genebank accessions to elite breeding lines, resulting in increased gains from selection [34].
Hence, the objectives of this study are to: (i) characterize the genetic diversity of Kersting's groundnut using SNP markers, (ii) identify single nucleotide polymorphisms (SNPs) associated with morphological traits of interest in Kersting's groundnut, and (iii) explore possibility for genomic selection in Kersting's groundnut for accelerated cultivar development. We hypothesized that: (i) Kersting's groundnut germplasm encompasses more genetic diversity, using SNP markers, contrary to Pasquet et al. [20] who reported an absence of genetic diversity within the species based on biochemical markers, (ii) polymorphic SNP markers are associated with traits of interest such as grain yield, flowering time, maturity time, number of seeds per plant, 100 seeds weight and number of pods per plant in Kersting's groundnut, and (iii) crossvalidation method revealed high genomic selection accuracies for key traits of interest in Kersting's groundnut.

Plant material
The material included 281 accessions of Kersting's groundnut collected across Benin and Togo and held in the genebank of the Laboratory of Genetics, Horticulture and Seed Science (GBioS) of the University of Abomey-Calavi (UAC) in Benin. The diversity panel was collected from a wide range of agro-ecological regions, namely the Guinean, Sudano-Guinean and the Sudanian regions of Benin and Togo [15]. Accessions belonged to four landraces based on seed coat colour e.g. white seed coat (217), red seed coat (18), black seed coat (40) and white with black eye (6) ( Table 1).

Field trials and experimental design
The 281 accessions were phenotyped during the growth season of August 2017 to January 2018 at Sékou and Savè, two contrasting environments in Benin. Sékou is located in the Guinean phytogeographical zone characterized by an average rainfall of 1300 mm/year. Total rainfall during the growing season was estimated at 361 mm with an average temperature of 27.2˚C. Savè belongs to the Sudano-Guinean zone characterized by an average rainfall of 1100 mm/year. In contrast to Sékou, the total rainfall recorded at Savè during the growing season was estimated at 161 mm from September to December 2017. The average temperature was estimated at 27.2˚C. The experimental design was an alpha lattice design with two replications in each environment. This resulted in 562 experimental units for each trial. Each experimental unit was a ridge of 3.0 m long, containing 10 plants with 0.30 m inter-plant spacing [17,35]. The field plan for the alpha lattice design was generated using R version 3.4.3 [36]. Kersting's groundnut seeds were sown on 21 st -22 nd August 2017 and the harvest was done from 3 rd to 6 th January 2018. Weeding was done systematically every two weeks in each location. Compound fertilizer NPK 15:15:15 was applied to plants four weeks after sowing at a rate of 100 kg/ha [37]. The Conti-Zeb 5_80% WP (mancozeb) fungicide was applied every two weeks with 500 g/ha to control fungal infestations.

Field data collection
In total, 15 morphological traits were recorded during the field characterization ( Table 2). Important traits evaluated were: diameter of the plant (DIP), plant height (PLH), leaflet length (LEL), leaflet width (LEW), petiole length (PEL), days to 50% flowering (DFF), and days to maturity (DTM). On a plant basis, the following were determined: grain yield per plant (GRY in g/plant), the number of seeds per plant (NSP), the number of pods per plant (NPP) and the number of seeds per pod (NSPod). Seed traits, namely seed length (SIL in mm), seed width (SWi in mm), seed thickness (STh in mm) and one hundred seeds weight (100SW in g) were collected (Table 2).

Phenotypic data analysis
Field data were explored for each trait for eventual outliers using the R package "outliers" [38]. For each trait, a mixed linear model was fitted per environment and across environments to estimate the best linear unbiased estimators (BLUEs) of accessions means using the META-R programme [39,40]. The variation of morphological traits across environments was assessed through the construction of boxplots. We performed the analysis of variance (ANOVA) across environments, using BLUE-values and the R package "ggpubr" the function "stat_compare_means()" [41]. The ANOVA model was: where the phenotypic response (Y ijk ) is function of the overall mean (μ), the fixed effect of the i th accessions (G i ), the effect of the j th environment (E j ), the k th replication (R k ) within the j th environment (E j ), the genotype by environment interaction (GE ij ) and the residual error (ε ijk ).
To assess field heterogeneity, the coefficient of variation (CV) was calculated for each trait, using the formula: where CV = coefficient of variation, μ = trait mean and σ = standard deviation Furthermore, heritability estimates were obtained for each trait across environments using the META-R programme [40] to assess the feasibility of the GWAS. The formula for the broad sense heritability estimates was: where s 2 Acc = variance of the accessions (Acc), s 2 Acc:Env = variance of the accession x environment (Env) interaction and s 2 Res = variance of the residual error. The Pearson's correlation matrix was also calculated between grain yield and other morphological traits using R version 3.4.3 [36] in order to select yield-related traits to include in the GWAS.

DNA extraction and genotyping by sequencing
Kersting's groundnut plants were grown at the University of Abomey-Calavi (Benin) under field conditions. Three-week old leaves were collected into 96 deep well samples collection plates and sent to the Integrated Genotyping Service and Support (IGSS) platform (https:// ordering.igssafrica.org/cgibin/order/login.pl) located at Biosciences Eastern and Central Africa (BecA-ILRI) Hub in Nairobi for Genotyping. DNA extraction was done using Nucleomag Plant Genomic DNA extraction kit. The genomic DNA extracted was in the range of 50-100 ng/ul. DNA quality and quantity were checked on 0.8% agarose. Libraries were constructed according to Kilian et al. [42] Diversity Arrays Technology and Sequencing (DArTSeq) complexity reduction method through the digestion of genomic DNA and ligation of barcoded adapters followed by Polymorphic Chain Reactions (PCR) amplification of adapter-ligated fragments. Libraries were sequenced using Single Read sequencing runs for 77 bases. Next generation sequencing was carried out using Hiseq2500.
DArTseq markers scoring was achieved using the DArt Proprietary Limited (PL'S) proprietary SNP and SilicoDArt calling algorithms (DArTsoft14). SNP markers were scored as binary fashion for presence/absence (1 and 0, respectively) of the restriction fragment with the marker sequence in genomic representation of the sample. SNP markers were aligned to the reference genomes of mung bean [Vigna radiata (L.) R.Wilczek] and adzuki bean [Vigna angularis (Willd.) Ohwi & Ohashi] [43,44], two related species of Kersting's groundnut, in order to identify chromosome positions.

Molecular analysis
We estimated minor allele frequency, observed (Ho) and expected heterozygosity (He), and total gene diversity (Ht) using the R package "adegenet" [45]. The total gene diversity (Ht), measured as the total expected heterozygosity, was calculated as follows: [46] where H t = total gene diversity of the total population as estimated from the pooled allele frequencies, H s = within landrace diversity, D st = between landraces diversity. Hs was estimated as follows: where p = frequency of the i th allele at the k th locus in each landrace and the value is averaged over all landraces. Likewise, Dst was calculated as: where s = number of landraces, Dst = gene diversity between the i th and j th landrace. Dst was estimated as: Where x ik = the frequency of the k th allele in the i th landrace, and x jk = the frequency of the k th allele in the j th landrace.
Missing marker data were imputed using the forest imputation method on the KDCompute sever (https://kdcompute.igss-africa.org/kdcompute/login), with the missForest algorithm based on multivariate unsupervised and supervised splitting techniques [47]. SNP markers with minor allele frequency (MAF) <0.05 were removed for the GWAS analysis.

Clustering and population structure analysis
To assess the genetic diversity of Kersting's groundnut accessions, the 493 SNP markers were used to calculate genetic dissimilarities among the 281 accessions including the four categories of landraces [48]. The genetic dissimilarities matrix was generating using marker data by calculating the presence/absence dissimilarity index with the "Dice" formula as follows: with dij = dissimilarity between accessions i and j; a = number of markers with xi = presence and xj = presence; b = number of markers with xi = presence and xj = absence; c = number of markers with xi = absence and xj = presence; xi = SNP allele in the i th accession, xj = SNP allele in the j th accession.
The genetic dissimilarity matrix was used to generate an un-rooted tree using the weighted Neighbour-Joining (NJ) algorithm. Branches distances were used as criterion to weight the NJ tree, taking into account that errors in distances estimates are larger for longer distances [49]. Both the genetic dissimilarity matrix and NJ tree were determined in the Darwin software 6.0.4 [50]. To assess the genetic differentiation between pairs of clusters of Kersting's groundnut accessions, a pairwise Fst analysis was performed using the R package "adegenet" [45]. Furthermore, the expected heterozygosity (He) was calculated using the function "poppr()" of the R package "poppr" to assess the level of genetic diversity within clusters of Kersting's groundnut accessions [51]. Moreover, an analysis of variance (ANOVA) was conducted using all morphological traits to assess the phenotypic diversity among clusters, using the following model: where the i th phenotypic response (Y i ) is a function of the overall mean (μ), the fixed effect of the i th cluster (C i ) and the residual error (ε i ).
The population structure was also investigated using the Bayesian clustering method in STRUCTURE version 2.3.4 [52]. The three agro-ecological regions (e.g. Guinean, Sudano-Guinean and Sudanian regions) were included in the analysis as putative geographic origins of accessions. The length of the burn-in period and Markov Chain Monte Carlo (MCMC) were set at 10,000 iterations [53]. To obtain an accurate estimation of the number of populations, 20 runs were performed for each K-value (assumed number of subpopulations), ranging from 1 to 10. Further, Delta K-values were calculated and an appropriate K-value was estimated according to the Evanno et al. [53] method using STRUCTURE Harvester program [54]. At the appropriate K-value, Delta K-values make a salient break in slope of the distribution of likelihood values of K. Given a K-value, divergence rate of each subpopulation from a hypothetical ancestral population is estimated by population Fst values generated by STRUCTURE. The divergence rates show the extent of differentiation between subpopulations and the ancestral population for an accurate estimation of the clustering patterns. To complement the results of population structure, the pairwise Fst analysis was conducted among agro-ecological regions using the R software 3.4.3 [36] to check whether genetic differentiation among accessions was explained by their geographical origins. In addition, the two-sided Student test was performed on all morphological traits to compare means between both subpopulations.

SNP-traits association analysis
The marker-trait association analysis was conducted per environment and across environments with heritability �0.50. Traits included grain yield per plant, days to 50% flowering, days to maturity, number of seeds per plant and number of pods per plant. The unified Mixed Linear Model (MLM) accounting for genetic relatedness (K-matrix) was used on BLUE-values estimated for each trait in order to control type I errors. The MLM analysis was conducted with and without including the three first principal components by using the GAPIT package of R software [55,56]. The combination of different models is a good approach for the appropriate control of false positives and negatives in GWAS [57]. Therefore, only markers that revealed significant associations with both MLM and MLM-Q were retained as true phenotype-to-genotype associations [39]. The significant cut-off threshold was estimated using the Bonferroni correction threshold as follows: p-value = 0.05/Me with Me = the number of markers included in the analysis [39].

Genomic prediction accuracy in Kersting's groundnut
Genomic selection models were built for each morphological trait using the 493 SNP markers and the ridge regression analysis in the R package "rrBLUP" [33,58]. The training and validation populations were defined through the stratified (all clusters) and "within cluster" sampling techniques [59,60]. The stratified sampling technique refers to a random selection of accessions from each cluster in a way the training and validation populations consider the genetic diversity revealed by the cluster analysis within the crop [59]. In this study, about 75% of accessions were randomly selected from each cluster and included in the training population (211 accessions), while the remainder (70) formed the validation population. Contrary to the stratified sampling technique, the "within cluster" sampling technique consists in a random selection of accessions from one cluster to form both training and validation populations [59]. This sampling technique considers only the genetic diversity within one cluster of accessions for genomic prediction. Therefore, 162 accessions were randomly selected in cluster I (essentially composed of white seeded accessions) to form the training population while the rest of the accessions (55) of this cluster were used as the validation population. Correlation coefficients between observed and predicted values of all traits were calculated, using the cross validation approach to assess the accuracy of the genomic selection models.

Morphological traits variation and association patterns in Kersting's groundnut
Highly significant (p<0.001) genetic variation was observed among accessions for all morphological traits, except seed thickness ( Table 3). The genotype x environment (GxE) interaction

PLOS ONE
was also highly significant for most traits except leaflet width, seed thickness, days to maturity and number of seeds per pod (Table 3). Average performances were lower at Savè than Sékou for all morphological traits (Fig 1). The coefficients of variation (CVs) and broad sense heritability estimates across environments for the 15 morphological traits are shown in Table 4. The coefficients of variation were <20% for most traits including the diameter of the plant, plant height, leaflet length, leaflet width, petiole length, 100 seeds weight, seed length, seed width, seed thickness, and the number of seeds per pod. In contrast, higher coefficients of variation were obtained for grain yield per plant (42.2%), the number of seeds per plant (36.3%) and the number of pods per plant (34.9%), revealing that there was a high variability for those traits across environments (Table 4). Moreover, the broad sense heritability estimates were high for 100 seeds weight (0.61), days to 50% flowering (0.86), days to maturity (0.87), grain yield per plant (0.53), number of seeds per plant (0.55) and number of seeds per pod (0.52) (Table 4). Furthermore, the Pearson correlation analysis revealed highly significant (p<0.001) positive correlations of grain yield per plant with the yield components, 100 seed weight, number of seeds per plant, number of seeds per pod, and number of pods per plant at Sékou, Savè and across environments (Table 5). In addition, there were significant negative correlations between grain yield per plant, days to 50% flowering and days to maturity for all environments. Moreover, a significant positive correlation was detected between grain yield per plant and seed thickness at Savè (Table 5). GRY was poorly correlated to leaf morphological traits.

Single nucleotide polymorphisms in Kersting's groundnut
In total, the high density Genotyping by Sequencing (GBS) of the 281 accessions yielded 493 single nucleotide polymorphisms (SNPs) with 0.3-30.9% of missing data. The call rate ranged from 63 to 100% with an average of 0.96±0.05. The reproducibility of markers ranged from 0.91 to 1.00 with an average of 0.99±0.02. Only 10.9% (54) of SNPs were aligned to the reference genomes of both adzuki bean and mung bean. The average minor allele frequencies frequency (MAF) was 0.04±0.07. About 22.9% (113) of markers had minor allele frequency greater than 0.05 (S1 Table). Moreover, mean observed and expected heterozygosity were  Table). Considering the low proportion of markers aligned to the reference genome of related species, both aligned and non-aligned SNP markers were considered for association analysis.

Genetic diversity of Kersting's groundnut germplasm
The clustering groups the 281 accessions into four clusters based on shared attributes (Fig 2). Cluster I (77.2% of accessions) was mainly composed of white seeded accessions, which were highly related to each other and clearly separated from other accessions. Cluster II (6.4% of accessions) was composed of red seeded accessions, which were highly related to each other. Cluster III (14.2% of accessions) was essentially composed of black seeded accessions. Moreover, cluster IV (2.1% of accessions) was exclusively composed of white with black eye accessions, revealing a high genetic relatedness among those accessions (Fig 2). The clustering was supported by results of the pairwise Fst analysis between pairs of clusters ( Table 6). The overall Fst-value was 0.62, showing a high genetic differentiation among clusters of accessions. In addition, the pairwise Fst-values ranged from 0.30 to 0.92. The lowest Fst was obtained between Clusters II and IV while the highest Fst-value was revealed between Clusters I and IV (Table 6). However, the within cluster expected heterozygosity ranged from 0.01 to 0.09, revealing a low genetic diversity within clusters (cultivated landraces) of Kersting's groundnut. The highest expected heterozygosity was obtained with Cluster 2 (He = 0.09) while Cluster 1 exhibited the lowest expected heterozygosity (He = 0.01). Clusters 3 and 4 showed an expected heterozygosity of 0.05 and 0.03 respectively.

PLOS ONE
Moreover, the analysis of phenotypic variance among clusters revealed high significant phenotypic differences between clusters for most morphological traits, including plant height, petiole length, leaflet length, 100 seed weight, seed length, seed width, seed thickness, days to 50% flowering, days to maturity, grain yield per plant and number of seeds per pod (    (Table 4).

Model-based population structure and phenotypic variation between subpopulations
The admixture model-based clustering, using the 281 accessions, showed two distinct populations of Kersting's groundnut accessions (Fig 3). Population I (Pop I) was composed of 64 accessions (22.78%) while population II (Pop II) consisted of 217 accessions (77.22%). Divergence rates of populations I and II from the hypothetical ancestral population built by the Bayesian clustering method, were estimated by mean Fst-values of 0.57 and 0.69, respectively. Therefore, populations I and II were highly differentiated from the hypothetical ancestral population. Moreover, the two populations were highly discriminated by agro-ecological origins of accessions and seed coat colours. About 87.5% of accessions of population I were collected in the Sudanian region while only 12.5% of them originated from the Guinean region. In contrast, all accessions of population II originated from the Sudano-Guinean (71.4%) and the Guinean regions (28.6%). In addition, population I included only white-seeded accessions while population II was composed of colourful accessions, e.g. red-seeded, black-seeded and white-seeded with black eye accessions. This reveals a high allelic differentiation between white-seeded and colourful accessions. Results of Fst statistics depicting the degree of differentiation among accessions from different agro-ecological regions are shown in Table 8. High genetic differentiation was observed among regions with overall Weir and Cockerham's Fst-value of 0.37. Pairwise Fst-values varied from 0.07 to 0.59 ( Table 8). The lowest Fst-value (0.07) was observed between the Guinean and the Sudano-Guinean regions. Relatively high Fst-value (0.25) was observed between the Guinean and the Sudanian regions. Moreover, the highest Fst-value (0.59) was detected between the Sudanian and the Sudano-Guinean agro-ecological regions.
The two-sided Student test revealed high significant differences between both populations for the diameter of plant, leaflet length, 100 seed weight, seed length, seed width, days to 50% flowering, days to maturity, grain yield per plant and number of seeds per pod (Table 9). Contrary to population II, accessions of population I were early flowering (41.6±2.52 days), early maturing (104.1±1.67 days) and showed the highest 100 seed weight (12.98±1.69 g), seed length (8.24±0.42 mm), seed width (5.71±0.29 mm), grain yield per plant (5.15±1.82 g/plant) and number of seeds per pod (1.29±0.07) ( Table 9).

Marker-traits associations in Kersting's groundnut
Based on the 113 SNP markers included in the GWAS analysis, the corrected Bonferroni threshold for significant marker-trait associations was p-value = 4.42 x 10 −4 . Significant SNP-  Table 10. Both the MLM and MLM-Q analyses revealed 10 SNP markers significantly associated with grain yield per plant and related traits across environments. Six of the marker-trait associations were repeated in at least the two sets of environments while the four other associations were environment-specific (Table 10). The analysis of Quantile-Quantile (QQ) plots showed good relationships between the expected and observed p-values for all studied traits (S1 Fig). The Marker M1 was significantly associated with 100 seeds weight at Sékou, Savè and for the overall environment and accounted for over 24% of the phenotypic variation. Markers M2 and M4 were respectively associated with days to 50% flowering and grain yield per plant at Sékou and for the overall environment (Table 10). Similarly, the markers M2 and M4 were respectively associated with days to maturity and the number of seeds per plant in all environments. Markers M5 and M6 were associated respectively with one hundred seeds weight and

PLOS ONE
days to 50% flowering at Savè and for the overall environment. Marker M7 was significantly associated with the number of pods per plant at Savè and in the overall environment. Moreover, the marker M3 was discovered at Sékou in a significant association with days to maturity. In addition, the marker M7 was significantly associated with days to maturity and the number of seeds per plant at Savè. Other significant associations in the overall environment included markers M8, M9 and M10. The marker M8 was associated with days to 50% flowering and days to maturity while both markers M9 and M10 were associated with days to 50% flowering (Table 10). Markers M2, M6, M8, M9 and M10 were associated with days to 50% flowering with R 2 values ranging from 10.6 to 25.4%. M2, M3 and M7 were associated with days to maturity. Grain yield per plant was correlated to the yield components of 100 seed weight, number of seeds per plant, number of seeds per pod, and number of pods per plant. Marker M4 was associated with grain yield per plant and number of seeds per pods. M1 and M5 were associated with 100 seed weight but not grain yield per plant and M7 was associated with number of seeds per plant but not grain yield per plant.

Genomic selection models and accuracy in Kersting's groundnut
The ridge regression analysis, including the 493 SNP markers, revealed moderate (0.42-0.44) to high (0.62-0.79) prediction accuracy for leaflet length, 100 seed weight, seed length, seed width, days to 50% flowering and days to maturity, using the stratified (involving accessions from all clusters) cross-validation sampling technique (Table 11). Moderate correlations were detected between observed and predicted values of leaflet length (0.44), seed length (0.43) and   Table 11). The cross-validation approach including only accessions from cluster I (within cluster sampling) revealed low model accuracy (0.02 to 0.30) for all morphological traits (Table 11).

Single nucleotide polymorphism and genetic diversity among Kersting's groundnut landraces
The discovery of good quality molecular markers is important to enhance the application of enabling biotechnologies for orphan crops improvement [6]. This study reports for the first time 493 SNP markers in Kersting's groundnut, which were further quality assessed to obtain 113 high polymorphic and informative markers with MAF�0.05 and a high reproducibility (0.99). Given the relative small number of SNP markers, Kersting's groundnut is not as polymorphic as other self-pollinated species [61][62][63]. The average heterozygosity (He = 0.22) and total gene diversity (Ht = 0.43) across markers revealed a high genetic diversity within Kersting's groundnut and a strong population structure. This finding reveals higher gene diversity than values reported by Pasquet et al. [20] on Kersting's groundnut using biochemical markers, and Wang et al. [64], Ren et al. [65] on peanut (Arachis hypogea L.) based on single sequence repeat (SSR) markers. Moreover, our results revealed a low alignment (10.9%) of SNP markers to reference genomes of closely related species such as adzuki bean and mung bean in contrast to findings of Ho et al. [66] on bambara groundnut [Vigna subterranean (L.) Verdc.]. Consequently, whole genome sequencing is crucial in Kersting's groundnut to make a reference genome available to increase the accuracy of SNPs calling and breeding prospects.

PLOS ONE
The results also showed the importance of SNP markers in revealing high genetic differentiation among Kersting's groundnut accessions (Fst = 0.79). Very high genetic differentiation was observed among the four types of landraces included in this study, that is, white, red, black and white with black eye seeded accessions. Similar results were reported by Mohammed et al. [67] who observed genetic variation among five different Ghanaian accessions using 12 single sequence repeat (SSR) markers. These findings imply that cultivated landraces of Kersting's groundnut encompass a high genetic differentiation in contrast to findings of Pasquet et al. [20] who used 19 enzymes (biochemical markers) on 20 accessions of Kersting's groundnut. SNP markers are highly codominant, polymorphic and more appropriate to unveil the existing genetic diversity within a species as opposed to biochemical markers which are not abundant and reduce the resolution of the genetic diversity [25,26]. On the other hand, the population structure analysis, including geographic origins of accessions, identified two subpopulations that were found to be highly structured, revealing the influence of geographic origins on the genetic diversity within Kersting's groundnut. Large genetic differentiation was observed among accessions based on agro-ecological regions since the overall Fst = 0.37, which is greater than 0.25. Low genetic differentiation was detected between the Guinean and Sudano-Guinean regions. This might be because of the proximity of these regions and seed exchange among farmers. Kersting's groundnut farmers in the Guinean and the Sudano-Guinean regions buy seeds on the markets [15]. However, great genetic differentiation was observed between the Sudanian and the two other agro-ecological regions. According to Akohoué et al. [15], farmers in the Sudanian region reused seeds from the previous harvest. The white with black eye seeded landrace was also reported to be specific to the Sudanian region. Further investigation in that region may reveal more diversity. The clear separation between early and late accessions as shown by both clustering and structure analyses could be explained by the high correlation between time to flowering and seed coat colour as reported by [68].
Moreover, the difference between the number of clusters revealed by the neighbour joining analysis and results of population structure could be attributed to limitations of the STRUC-TURE software to adequately describe the structure of the population. Among other limitations, STRUCUTRE results are sensitive to sample size, number of populations, number of loci scored and the type of markers [53]. Despite these limitations, it was informative to present both perspectives so that readers appreciate the possible incongruence of results when using different computation approaches. Similar incongruence was reported by Al-Abdallat et al. [39] when the neighbour joining analysis revealed several subgroups of barley (Hordeum vulgare L.) accessions while STRUCTURE identified two distinct subpopulations.

Broadening the genetic base within Kersting's groundnut landraces
The improvement of Kersting's groundnut requires the development of improved varieties for the most cultivated landraces, e.g. the white-seeded landrace. However, this study revealed a low genetic diversity within landraces, particularly the white-seeded landrace (He = 0.01) which is the most cultivated landrace due to the high economic value of its grains in most west African countries [15]. The low genetic diversity within landraces is likely due to the self-pollination mode of the species, the active geocarpic and chasmogamous nature of the flowers [20,69]. Among other disadvantages, the geocarpy in Kersting's groundnut limits seed or fruit dispersal, influences gene transfer and population genetic structure, and increases reproductive costs in the species [69]. Given the high phenotypic differences among clusters for grain yield and yield-related traits, the successful breeding of Kersting's groundnut requires intensive cross pollinations among all landraces (e.g. white-seeded, red-seeded, black-seeded and whiteseeded with black eye landraces) for a broader genetic diversity and improved gains from selection. Considering the influence of geographic origins on the distribution of landraces, the enhancement of the genetic diversity within Kersting's groundnut requires also the introduction of new germplasm and crossing among genotypes from different production countries and regions. In addition, the available germplasm of Kersting's groundnut could be enhanced through mutation breeding techniques, using chemical mutagenesis combined with the Targeted Induced Local Lesions in Genomes (TILLING). Mutation breeding has been successfully used to create genetic diversity and identify favourable mutants in many self-pollinated crops, including tomato (Solanum lycopersicum L.) [70] and soybean [Glycine max (L.) Merr.] [71].

Marker-trait associations and genomic selection accuracy in Kersting's groundnut
Phenotypic evaluation studies in Kersting's groundnut showed great phenotypic variability among accessions [17,72,73]. From the results of this study, the broad sense heritability of most morphological traits was greater than 0.50, showing the presence of genetic variability among accessions across environments. Therefore, GWAS was performed to associate the phenotypic variation of yield and related traits with the observed molecular genetic diversity. A similar approach has been used on major legumes crops including cowpea [74] and peanut [75] to decipher the genetic basis of morphological traits in a set of environments. The GWAS analysis detected 10 markers significantly associated with grain yield and related traits. Six of the markers, including M1, M2, M4, M5, M6 and M7, were consistent across environments. Nevertheless, the other markers identified in this study were not clearly consistent across the two environments.
The inconsistency of GWAS results could be explained by the highly significant genotype by environment (GxE) interaction observed for most morphological traits included in this study, and different genetic mechanisms under drought conditions as reported by Al-Abdallat et al. [39], Varshney et al. [76] in barley (Hordeum vulgare L.). In this study, average rainfall recorded during field trials was lower than the water requirement of 500-900 mm/year of Kersting's groundnut [14,19]. In addition, dissecting the genetic basis governing complex traits using GWAS on a natural population in dry environments could be less informative compared with bi-parental and specialised mapping populations [76]. Conventional genome-wide association studies also perform poorly for rare variants that might be prominent, particularly for self-pollinated species [77]. The high R 2 values (>26%) observed for some marker-trait associations suggests the presence of confounding phenotypic variation, revealing that including the three first principal components in the GWAS analysis did not adequately adjust for accessions clustering and population structure. This confounding effect between some of the markers and phenotypic variation arises from the high significant differences in the phenotypic variance among Kersting's groundnut clusters and subpopulations [78].
Despite these limitations, the GWAS provided a first insight into the genetic basis of farmers' preferred traits in Kersting's groundnut. Further investigation on the whole genome assembly is required for a clear identification of chromosome position of single nucleotide polymorphisms in the species. In addition, given the low genetic base within landraces, the development of specialised mapping populations like Multi-parent Advanced Generation Inter-cross (MAGIC) populations could be relevant for the accurate identification and mapping of quantitative traits loci (QTLs) in Kersting's groundnut as reported in many self-pollinated crops including rice (Oryza sativa L.) [79] and cowpea [80]. In contrast to bi-parental populations (e.g. F 2 and backcross populations, recombinant inbred lines, near isogenic lines and double haploids), MAGIC populations increase the recombination rate and genetic diversity, and reduces the extent of linkage disequilibrium (LD), giving the opportunity to detect more QTLs with a higher precision [81,82]. Cultivated landraces could serve as founder lines that could be mixed through inter-crossing to form a broader genetic base.
In addition to the GWAS, genomic selection models, using the stratified sampling technique, revealed moderate to high prediction accuracies for leaflet length, seed dimension traits, days to 50% flowering and days to maturity. The high prediction accuracy revealed by the stratified sampling technique could be explained by the existence of high relatedness among accessions. On the other hand, the within cluster sampling technique revealed very low to moderate prediction accuracies for all traits. This finding implies that the application of genomic selection for the improvement of the crop requires the development bi-parental and specialised mapping populations. The utilisation of these populations having low population structure could maximize accuracy and selection gains and accelerate the deployment of improved Kersting's groundnut varieties with farmers' preferred traits.

Conclusion
In this study, the genetic diversity, marker-trait association patterns and possibility for accurate genomic selection within a west African collection of Kersting's groundnut are described. In total, 493 SNP markers were discovered, of which 113 showed a minor allele frequency �0.05. High mean heterozygosity and total gene diversity were observed within the species. The analysis of genetic diversity revealed four clusters of accessions significantly discriminated by seed coat colours namely the white seeded, red seeded, black seeded and the white with black eye seeded accessions. However, a low genetic diversity was observed within clusters. The population structure revealed great genetic differentiation across agro-ecological regions of accessions. Further, the GWAS analysis detected 10 markers associated with yield and related traits. Six of the markers showed clear consistency across environments while the remainder were environment-specific. The genomic selection analysis revealed moderate accuracy for leaflet length and seed dimension traits, and high prediction accuracies for 100 seed weight, days to 50% flowering and days to maturity. SNP markers identified in this study could be useful for marker-assisted selection in Kersting's groundnut breeding programmes. Further investigations are required regarding the creation of broader genetic diversity within landraces, development of specialized mapping populations and the assembly of the genome of Kersting's groundnut to enable appropriate association mapping with clear chromosome positions.