High-density DArT-based SilicoDArT and SNP markers for genetic diversity and population structure studies in cassava (Manihot esculenta Crantz)

Cassava (Manihot esculenta Crantz) is an important industrial and staple crop due to its high starch content, low input requirement, and resilience which makes it an ideal crop for sustainable agricultural systems and marginal lands in the tropics. However, the lack of genomic information on local genetic resources has impeded efficient conservation and improvement of the crop and the exploration of its full agronomic and breeding potential. This work was carried out to obtain information on population structure and extent of genetic variability among some local landraces conserved at the Plant Genetic Resources Research Institute, Ghana and exotic cassava accessions with Diversity Array Technology based SilicoDArT and SNP markers to infer how the relatedness in the genetic materials can be used to enhance germplasm curation and future breeding efforts. A total of 10521 SilicoDArT and 10808 SNP markers were used with varying polymorphic information content (PIC) values. The average PIC was 0.36 and 0.28 for the SilicoDArT and SNPs respectively. Population structure and average linkage hierarchical clustering based on SNPs revealed two distinct subpopulations and a large number of admixtures. Both DArT platforms identified 22 landraces as potential duplicates based on Gower’s genetic dissimilarity. The expected heterozygosity which defines the genetic variation within each subpopulation was 0.008 for subpop1 which were mainly landraces and 0.391 for subpop2 indicating the homogeneous and admixture nature of the two subpopulations. Further analysis upon removal of the duplicates increased the expected heterozygosity of subpop1 from 0.008 to 0.357. A mantel test indicated strong interdependence (r = 0.970; P < 0.001) between SilicoDArT and DArTSeq SNP genotypic data suggesting both marker platforms as a robust system for genomic studies in cassava. These findings provide important information for efficient ex-situ conservation of cassava, future heterosis breeding, and marker-assisted selection (MAS) to enhance cassava improvement.

Introduction Cassava (Manihot esculenta Crantz) is the third most important source of calories in the tropics after rice and maize with millions of people depending on it in the world [1,2]. Mostly grown in marginal ecologies, the crop is usually cultivated by smallholder farmers due to its ability to grow and yield in unfavourable conditions with poor soil fertility and low rainfall [1,3]. The starchy roots contain mainly carbohydrates and the leaves are also used as a vegetable in some African countries which are cheap but a rich source of proteins, vitamins A, B and C, and other minerals [3,4]. The cultivation of the crop continues to spread in Ghana and around the globe due to its storage roots which serve as the main raw material for industrial starch and alcohol production [5,6]. However, marginal yields continue to be realised at the farm level which is partly due to the lack of improved varieties and the use of low yielding and environmentally sensitive lines by farmers [7]. There are also new emerging and diversified markets demand for cassava in Ghana which further suggests breeding for improved cultivars to meet specific domestic and industrial needs [8].
Germplasm banks and farmer's fields serve as reservoirs of genetic variability of crops. These collections harbour genes or germplasm with the potential to improve productivity and adaptation or tolerance to abiotic and biotic stresses [9,10] which is particularly relevant in the current frame of climatic change and global warming. It is therefore imperative to identify true biodiversity in biological resources for effcient management, including conservation and selection of genetically divergent accessions to optimize breeding programs [11]. Genetic diversity and population structure analysis are important for characterizing the natural selection history and genetic relationships among accessions [12]. A comprehensive understanding of the genetic variability and relationships in available germplasm is a determining factor towards efficient conservation and designing breeding programmes and/or achieving breeding objectives. Several reports have highlighted the significant genetic variability within the cassava genepool for several traits associated with yield, disease resistance and drought that can be exploited for crop improvement [3,8,[13][14][15][16].
Diversity analysis is an important component of plant breeding and genetics, conservation and evolution [17]. Most cassava diversity analyses have been based on phenotypic characters [18][19][20][21][22] which are mostly not reliable and environmentally plastic [23][24][25]. Therefore, genomic techniques may be useful in diversity assessment and selection. The use of molecular tools in plant genetic analyses and crop improvement cannot be overemphasized [26][27][28][29]. Molecular tools have proven to be more reliable in identifying duplicates among accessions during characterization which is important towards revealing genuine variability for breeding and reducing space and maintenance cost at gene banks [30,31]. Molecular marker technologies such as Random Amplified Polymorphic DNAs (RAPDs), allozymes, single nucleotide polymorphisms (SNPs), Simple Sequence Repeats (SSRs) are available for genetic study in cassava from Ghana and other countries [32][33][34][35]. However, several degrees of limitations are also associated with these gel-based molecular marker systems which include lack of reliability and resolution (RAPD markers), poor genome coverage or labour intensive and not amenable for throughput genotyping (AFLP, RFLP) or less cost-effectiveness or require sequence information (SSRs) [36,37]. These factors limit their applicability for many crops, especially for 'orphan' crops and polyploid species [37]. To this end, DArT markers developed through Next Generation Sequencing platforms (NGS) are the prime alternative for molecular studies since they cover wide section of the genome with high-throughput and cost effective [38].
DArT markers (Diversity Array Technology Pty Ltd) was developed as one of the ultrahigh-throughput, sequence independent, cost effective, whole-genome genotyping technique with large number of markers that cover the entire genome [36]. DArT markers have been applied successfully in genomic studies in many species including those with large and complex genomes such as barley, sugarcane, wheat, oat and strawberry [37,[39][40][41][42][43]. DArT markers are developed through the use of combinations of restriction enzyme digestions to reduce genome complexity, followed by next-generation sequencing of complexity reduced representations or fragments to identify DNA polymorphisms and SNPs leading to the production of thousands of polymorphic loci in a single assay [38,44,45]. Currently, DArT platform generates two variants of markers (SilicoDArT and DArTSeq SNP markers). SilicoDArT markers are dominant and are mostly scored for the absence (0) or presence (1) of a single allele while as DArTSeq SNPs are co-dominant markers [38,44].
Xia et al. [46] used DArT for high-throughput genotyping of cassava and its wild relatives and suggestedPstI/TaqI and PstI/ BstNI as the best complexity reduction method. DArTSeq was recently used to generate a garlic core collection from the accessions kept in garlic germplasm bank, Cordoba, Spain by revealing that 31.5% of the accessions were genetically redundant [30]. These cassava germplasm have been phenotypically characterized, nonetheless, redundancy may be expected [26,27]. Herein, the DArT markers (SilicoDArT and DArTseq SNPs) were used to analyse the genetic dissimilarities among a collection of cassava germplasm Ghana and examine the structure of the population. This will lay a foundation for efficient curation of cassava germplasm and parental selection in future breeding programs. DArT procedure. The DArT arrays were produced from libraries prepared from PstIbased genomic representations [38]. The genomic representations were generated by digesting 100 ng mixtures of DNA samples with 2 U PstI and a frequent cutter (BstNI or TaqI) (NEB) in a buffer containing 10 mM Tris-OAc, 50 mM KOAc, 10 mM Mg(OAc) 2 and 5mM DTT as suggested by Xia et al. [38,46,47] for genome complexity reduction. Fragments were sequenced on HiSeq 2500 (Illumina). Libraries were sequenced from one end by performing single read sequencing runs [38]. The SNP markers were searched and filtered using algorithms. The sequenced data were analyzed using DarTsoft14, an automated genotypic data analysis program and DArTdb, a laboratory management system. Markers were scored '1' for presence, and '0' for absence and '-' for calls with non-zero count but too low counts to score confidently as "1" for the SilicoDArT while the DArTSeq SNPs were scored '0' for reference allele homozygote, '1' for SNP allele homozygote and '2' for heterozygote. Marker quality parameters. Both marker systems were tested for their PIC, reproducibility (%) and call rate (%). PIC indicates the diversity of the marker in the population and showed the usefulness of the marker for linkage analysis while reproducibility involved the proportion of technical replicate assay pairs for which the marker score exhibited consistency [48]. Call rate (%) was also used to eliminate markers with �5% missing data.
Genetic relationship analysis. Genetic relationships among the accessions were estimated based on Gower's dissimilarity index for the set of DArT markers [49]. Accessions were declared potential duplicates if the dissimilarity between them fell within the threshold of the replicated DNAs. The "pvclust" package was used to generate the Gower's dissimilarity matrix and "cluster" package in R was used to construct average linkage hierarchical dendrogram for SilicoDArT and DArTSeq SNPs data [50,51]. Correlation between both marker systems was determined using the Mantel test as implemented in the "vegan" package of statistical software program 'R' by employing 10,000 random iterations in the non-parametric test calculator while the "ggplot2" package was used to generate the Mantel test scatterplot [52,53].
Genetic diversity and population structure analysis. STRUCTURE v.2.3.4 [54] was used to analyse the genetic structure of the initial 87 cassava collection and 67 individuals upon roguing off the potential duplicates. The number of hypothetical subpopulations (K) was estimated with the STRUCTURE software through the application of a Bayesian clustering approach for the organisation of genetically similar cultivars into the same subgroups. Admixture and shared allele frequencies model was used to determine the number of clusters (K) in the range from 1 to 10. For each run, the initial burn-in period was set to 20,000 followed by 30,000 MCMC (Markov chain Monte Carlo) iterations, with no prior information on the origin of individuals. Longer burn-in or MCMC has been reported not to change significantly the results [55]. The ΔK method was used to determine the most suitable value of K as implemented in Structure Harvester [55]. The structure results for the assumed population (1-10) were subsequently analysed online using the STRUCTURE HARVESTER [56] to identify a distinct peak of the curve in the change of likelihood (ΔK) at the true value of K. Principal Coordinate Analysis (PCoA) of the DArTseq markers was performed using PAST software v.3.14 [57].

Quality parameters of markers
A total of 31865 SNP and 32377 SilicoDArT markers were identified upon application of the complexity reduction method with a call rate in the range of 81-100% and 70-100% respectively (S1 Fig). Around 22516 SNPs and 27182 SilicoDArT markers were assigned to 18 haploid chromosomes of cassava after aligning to cassava_v61 and v8 model reference. These ranged from 910 markers on chromosome18 (chr18) to 2158 on chromosome1 (Fig 1) for the dominant SilicoDArT markers. A total of 10521 SilicoDArT markers (S1 Table) passed the quality parameters with 100% reproducibility, and 97.4% mean call rate. The selected 10521 markers were very informative with a PIC range of 0.2-0.5 and an average of 0.36 (Fig 2).
Also, 10808 SNPs (S2 Table) with 100% reproducibility and call rate passed the quality test and were selected for further analysis. The mean PIC of these selected markers was 0.28 which was relatively lower than that of the SilicoDArT markers. Around 20.65% of the SNPs were in the lowest PIC value range of 0<0.10 while 32.42% were in the highest range of 0.4-0.5 (Fig 2). A genome-wide SNP density plot revealed that 1019 to 2696 SNPs were physically mapped to chromosome16 (chr16) and chromosome (chr1) respectively (Fig 1).
Analysis of the type of SNP (Table 2) in the selected SNPs revealed that transition SNPs (50.60%) were closely similar to transversions (49.40%). Among the six SNP types (S2 Table), A/G transitions (0.256) had the highest frequency though it was similar to C/T transitions (0.250) while G/C transversions were the least (0.101).

Genetic relationships among the cassava accessions
Genetic relationships were estimated among the accessions through their genetic distances using the selected SilicoDArT and SNP data based on Gower's genetic dissimilarity.
The overall genetic distance ranged from 0.00 to 0.41 with an average of 0.30 as revealed by the DArTSeq SNPs (S3 Table). The range of genetic dissimilarity among the IITA lines was between 0.01-0.40 with a mean of 0.33 which was similar to that of the improved ones (0.34). The lowest average genetic distance was found within the landraces (0.20). The critical distance threshold to declare whether two genotypes are duplicates/clones was empirically determined  constituting 53% of the total landraces obtained from the genebank were found to be potential duplicates. Debor, a known landrace was found to be similar to 19 other landraces collected from different regions. The IITA lines were generally closely related to the improved varieties than the landraces. Grouping of the accessions based on average linkage hierarchical clustering gave two main clusters (C1 and C2) containing related cassava accessions with common origin or shared parental lines (Fig 3). Several of the IITA lines (64%) predominantly grouped in C2 (green) while the landraces (90%) grouped in C1 (red). The improved varieties were evenly distributed among the two clusters.
After employing the SilicoDArT data, the genetic distance among the 87 accessions ranged between 0.0 to 0.61 (S4 Table). The average genetic dissimilarity of 0.38 among the accessions revealed by the SilicoDArT was higher than that of the SNP markers (0.30). Also, a low mean genetic dissimilarity of 0.24 was found among the landraces. Again, the 22 landraces with pairwise distance within the 0.00-0.02 range or threshold were found to be redundant confirming what was revealed by the SNPs. Ten out of these 22 came from one region (UCC, central region, Table 1) manifesting the sharing of plant material among farmers within the vicinity. Generally, the genetic distance between the groups of accessions was higher with the Silico-DArT than that observed through the SNP markers. The dendrogram obtained with the  SilicoDArT markers produced two main clusters (C1 and C2) and its adjoining subclusters (Fig 3). The IITA lines mainly grouped in C2 (green) while the landraces (90%) grouped in C1 (red) just as was observed with the SNP markers.

Diversity and population structure based on DArTSeq SNP markers
The SNP markers were used for estimating the genetic structure of the cassava population using the Bayesian clustering model implemented in the computer software STRUCTURE. The simulations (logarithm probability relative to standard deviation, ΔK) estimated from the SNP markers showed a sharp peak at K = 2 (Fig 4) which is the real K similar to what was observed by Hampton et al. [58]. This means that the optimum number of subpopulations is two. The population assignment test with the total 87 samples in the structure analysis shows the overall proportion of membership of the sample in each of the two clusters as illustrated in the bar plot for K = 2 (Fig 5). The two subpopulations (subpop1 and subpop2) consisted of 35.4% and 64.6% members respectively ( Table 3). The hypothetical founder population seen in sub-pop1 (red) is represented by landraces while that of subpop2 (green) consisted mainly of IITA  and improved lines similar to what was revealed by the average linkage hierarchical clustering. Twenty accessions which were mainly landraces from Ghana were fully assigned in subpop1. These accessions were evident as C1 in the dendrogram and were the least divergent accessions based on Gower's dissimilarity. In addition, these 20 landraces were fully conserved in one group even when higher K (K = 9) was assumed (S2 Fig) indicating these accessions could be clones. On other hand, 15 lines comprising of nine IITA lines (IBA070134, IBA9102324, IBA011368, TME 149, 1083724, IBA 011371, IBA30572, IBA061635, 96/1613), three improved (Nyerikogba, Afisiafi, Eskamaye) and three putative landraces (UCC-2000-002, UCC-2001-184, UCC-2001-461) were also entirely assigned to subpop2. About 52 accessions showed admixtures (with �1% of ancestry from subpop1 or subpop2) of subpop1 and subpop2 genetic composition. Majority of the IITA lines (68%) were found to be in admixture. These results conformed with the average linkage hierarchical clustering developed through the set of marker systems (Fig 3).
The expected heterozygosity from the STRUCTURE analysis was used to define the average genetic diversity between individuals within each subpopulation (subpop). The expected heterozygosity (Table 3) of subpop1 was 0.008 while that of subpop2 was 0.391 indicating the homogenous and the diverse nature of subpop1 and subpop2 respectively. Divergence among the two subpopulations shown by Net nucleotide distance (0.22) revealed that subpop1 was moderately distantly related to subpop2 (Table 3).
Further population structure analysis was performed among 67 accessions upon purging the potential duplicates. The optimum number of subpopulations was again found to be two mainly made of landraces in SUBPOP1 and IITA and improved in SUBPOP2 while 45 accessions were in admixture (S3 Fig). The expected heterozygosity increased from 0.008 to 0.357 for subpop1 while that of subpop2 remained almost the same (0.368) ( Table 3) A principal coordinate analysis (PCoA) based on the pairwise Gower's genetic distance matrix among the accessions was performed to depict the genetic divergence in the cassava lines using the two variants of DArT markers. Using the SNP markers, 38.2% of the total genetic variation was explained by the first two axes of the PCoA (Fig 6A). The first two axes of the PCoA based on the SilicoDArT (Fig 6B) explained 47.2% of the total genetic divergence. The distribution of the accessions based on the two marker systems was similar which was also consistent with the average linkage hierarchical clustering (Fig 3) and the structure analysis ( Fig 5). The landraces (black) clustered together while the IITA lines (green) also clustered together. Though the improved varieties (blue) showed wide diversity, they were predominantly clustered with the IITA lines.

Association among the two DArT markers systems
Comparisons of the relationship among the accessions based on Gower's distance matrices derived from the SilicoDArT and DArTseq SNP markers depicted high association (r = 0.970; P < 0.001) between both markers systems through the Mantel correlation test (S4 Fig). The result showed a good fit between SilicoDArT and DArTseq SNP markers data sets in assessing genetic diversity in the cassava germplasm.

PLOS ONE
High-density DArT markers for genomic studies in cassava (Manihot esculenta Crantz)

Genome-wide marker discovery and quality analysis
Genome level profiling of crop germplasm collections is a critical initial step in the identification of duplicates and divergent parents for effective conservation and utilization in breeding programs. Cassava is known worldwide for domestic and industrial use however, its classification and conservation in germplasm banks is challenging, due to phenotypic plasticity of cassava being further complicated by its asexual life-cycle [25,59]. This study highlights the potential of highly informative and selective SilicoDArT and DArTSeq SNP markers for genomic studies in cassava which might underpin conservation and future breeding efforts. A total of 10521 SilicoDArT markers and 10808 informative DArTSeq SNPs were used for genetic diversity studies and population structure analysis. Future cassava breeding programs depend on the usage of a large number of these high-throughput markers for effective selection and genome association studies. The quality parameters of the selected markers were comparable with that of others plant species. The mean PIC of 0.36 for the dominant SilicoDArT markers was lower than the 0.42 identified in cassava and its wild relatives by Xia et al. [46] and 0.41 found in sorghum [60] but higher than the 0.28 in Beta vulgaris [61] and 0.29 in macadamia [48]. The average PIC of the SNPs (0.28) on the other hand was higher than the 0.228 identified in cassava [35], 0.21 in macadamia and 0.265 in Durum wheat [62]. The average PIC value of SilicoDArT was greater than that of SNP markers similar to that reported by Alam et al. [48] in Macadamia indicating the SilicoDArT were more informative than the SNP markers. The use of these high-density SilicoDArT and SNP markers may achieve better genome coverage through the sampling of a greater number of points in the whole genome, as marker density is reported to have a high correlation with gene density [38,63]. Earlier reports on diversity studies on cassava utilized relatively smaller number of molecular markers; 35 SSR [33], 26 SNPs [35] and 4 RAPD [64], hence these high-density SilicoDArT and SNP markers may better suit for robust genomic and conservation activities in cassava. Additionally, the co-dominant inheritance pattern of SNP markers may increase the utility of DArT platforms for genetic population analysis. Relative to other marker technologies, DArT markers are suitable for high-throughput work as well as being cost effective [46,65]. Similar to previous studies with relatively fewer SNPs within genes involved in cyanogenesis (CYP79D2), starch metabolism (GBSSII) and defense pathways within cassava, transition SNPs were similar to transversions with the most frequent transition and transversion being A/G and A/T, respectively [35]. Higher DNA polymorphisms are expected from out-crossing and inbreeding-sensitive crops like cassava partly due to the inherently high number of loci maintained in a heterozygous state. This contradicts what was reported in crops like Camelina sativa [12], rubber tree [66], Brassica napus [67] where transitional SNPs substantially exceeded transversion SNPs.

Genetic relationship among accessions
The assessment of genetic diversity is a vital pre-breeding activity towards crop improvement and efficient conservation of crop biodiversity. A genetic distance approach based on Silico-DArT and DArTSeq SNPs were successfully used to ascertain the relationship among the accessions as well as revealing potential duplicates. Similar to an earlier report by Alam et al. [48], the average genetic distance among the cassava accessions was higher with the dominant SilicoDArT than the SNPs (S3 and S4 Tables). Two major clusters were formed using both marker systems. The dendrogram (Fig 3) created using the average linkage hierarchical clustering method for both sets of markers grouped the landraces into C1 (red) and the IITA lines in C2 (green) showing their relationship with their pedigrees.
Base on the markers, the highest average dissimilarity index was found among the improved lines which were uniformly distributed among the two clusters probably due to their diverse pedigree or origin. Ferguson et al. [68] found elite lines from ESC Africa and IITA breeding lines to be more closely related and indicated that, this could be as a result of the movement of germplasm from IITA to ESC Africa through collaborations. Similar to the situation in Ghana, the known improved cassava varieties documented in the Catalogue of crop varieties released and registered in Ghana had their pedigree/line from IITA and/or the local landraces so it was not surprising the improved lines were evenly distributed among the two clusters [69]. Cassava, unlike crops such as maize, landraces are often indistinguishable from improved clones and are often considered for release hence, breeding has not significantly separated improved clones from landraces [68].
The landraces on the other hand had the lowest mean dissimilarity index due to the high level of duplicates that were detected within the group. Both marker systems identified 22 landraces to be within the distance threshold of the replicated DNAs. These were identified as potential duplicates. The lack of a formal seed system for cassava production promotes the sharing of planting materials among farmers as most farmers use their own planting materials (usually stem cuttings from the preceding crop) or they source stem cuttings from neighboring farmers especially for varieties with good culinary attributes for cultivation [70,71]. These result in the renaming of the accessions leading to increased synonymy and homonymy which may confound the true diversity within the accessions when relying on the use of local/vernacular names alone [70,72,73]. Most of these landraces classified as potential duplicates were independently collected from different regions of Ghana and therefore came with different accession names. Though morphologically characterized, the low resolution of morphological markers could not reveal the redundant accessions [25,59,74]. Debor has excellent culinary traits, two of which are mealiness after boiling and relatively sweet taste hence it was not surprising it was found to be redundant with 19 landraces. This aligns with the report by Rabbi et al. [71] between Debor and other cassava genotypes. The complementary results by the Sili-coDArT and SNP markers provide enough evidence to cull/rogue off redundant lines for efficient conservation and subsequent breeding.

Structure of the population
Population structure analysis provides helpful information in maintaining and monitoring the genetic diversity required for a robust breeding program [75]. Results of population structure analysis among the original 87 samples revealed only two major subpopulations (Fig 5) analogous to the report by Adjabeng-Danquah et al. [33]. They studied some of these accessions with 35 SSR markers but indicated a clear genetic divergence based on the origin of the cassava genotypes. The grouping of the landraces based on the Bayesian model showed similar results as reflected in average linkage hierarchical clustering (Fig 3) and principal coordinate analysis. The genetic structure present in the population meets our expectations based on the sources/pedigree of the genotypes. All the genotypes were originally collected from two different locations (1. Landraces + improved = local, 2. IITA). Subpop1 consisted of only landraces while the IITA and other genotypes were clustered in subpop2.
In addition to these distinct two subpopulations, a large number of admixtures was obtained among the cassava population studied [12,33]. The expected heterozygosity was used to express the genetic diversity between individuals within each subpopulation ( Table 3). The expected heterozygosity of subpop1 was 0.008 which was very low indicating the large homogeneity of the individuals in the subgroup. This was also seen in the low level of admixture in subpop1 as revealed by the STRUCTURE results (Fig 5). As evidenced in the cluster analysis, the 22 landraces found to be redundant based on Gower's dissimilarity constituted subpop1 hence the low genetic diversity between the genotypes in the subgroup was not surprising. This shows a true reflection of the exchange of plant materials with good economical and culinary traits among farmers within the country resulting in the generation of such duplicates and the high resolution of molecular markers over morphological ones [59,76,77]. In contrast to subpop1, the average diversity between individuals within subpop2 was 0.391 which shows the significant divergence within the subgroup ( Table 3). This level of differentiation of individuals inside the group indicates that even among individuals clustered by genetic proximity, there is still high variability [78]. The high level of admixture found in subpop2 again reflects the genetic variation within the group [62]. The high level of admixture found in the IITA lines in subpop2 corroborates what was reported by Adjabeng-Danquah et al. [33]. Also, the diverse background (source) of the accessions in subpop2 relative to subpop1 could influence the amount of variation between individuals in the subpopulation. The collections obtained from IITA could likely be made up of accessions originating from various geographical areas in West Africa [68]. As witnessed in most of the released improved cassava varieties in Ghana, they had their parental sources from IITA, indicating the formal exchange of breeding materials across borders [69] which could lead to 'secondary' mixing of the gene pool of the collections through hybridization and enhanced variation.
The variation between the two subpopulations defined by the Net nucleotide distance (Allele-frequency divergence among populations, 0.22) showed a moderate genetic differentiation between subpop1 and subpop2 ( Table 3). Therefore, the genetic variation in the cassava germplasm is captured within and between subpopulations while a high level of similarity also exists within the subpopulation (subpop1). Adjabeng-Danquah et al.
[33] reported a high level of variation within cassava groups similar to what was found here.
Further population structure analysis without the potential duplicates gave a higher average diversity within subpop1 (expected heterozygosity, 0.357 for SUBPOP1 and 0.368 for SUB-POP2) which was similar to the 0.333 average reported by Ferguson et al. [68] within groups. This supports the earlier revelation of a high level of duplicates among the landraces resulting in low expected heterozygosity in subpop1 and the need to purge them. This seems to be a common phenomenon among gene banks since the earlier characterizations of collections were based on agro-morphology which are phenotypically plastic or limited DNA-based markers [24,25]. Similarly, a recent study found several duplicates within the gene bank at IITA, and efforts are been made to purge these duplicates [68]. This information is necessary particularly for the selection of individuals to serve as parents in breeding programs.
The set of DArT platforms were very informative in revealing the genetic variations in the germplasm as earlier reported [38,48]. This was confirmed by the high correlation index revealed by the Mantel test that was conducted to check the association between both marker systems (S3 Fig). The consistency in genetic relationships among the cassava by the Silico-DArT and SNPs suggest that the two DArT marker systems are highly reliable for genetic diversity study in cassava. The available population structure is a step towards efficient germplasm conservation and parental selection to take advantage of heterosis breeding.

Conclusion
In this study, high-density dominant SilicoDArT and codominant DArTSeq SNP markers were used to explore genetic diversity and population structure among a collection of cassava. Both DArT markers successfully revealed the parental relationships and the extent of diversity in the population by showing some degree of genetic diversity and duplicates in the collection. There was a high level of redundancy in the local landraces (53%) compared to those obtained from IITA. This level of genetic diversity could be the basis for developing new cassava cultivars with desirable characteristics while improving the ex situ cassava germplasm conservation activities at the gene bank. In addition, our study identified two subpopulations that could be explained by the pedigrees of the genotypes. The Structure analysis indicated subpop2 to be more diverse than subpop1 (0.008) based on expected heterozygosity. Further analysis following the removal of the potential duplicates increased the expected heterozygosity from 0.008 to 0.357. Both DArT platforms seem an inexpensive and robust option for genomics studies in cassava. The large number of highly polymorphic markers developed and the knowledge of population structure and genetic diversity of cassava accessions will be important for cassava germplasm curation, heterosis breeding and future marker-traits association studies and genomic selection.