Genetic diversity and association mapping in the Colombian Central Collection of Solanum tuberosum L. Andigenum group using SNPs markers

The potato (Solanum tuberosum L.) is the fourth most important crop food in the world and Colombia has one of the most important collections of potato germplasm in the world (the Colombian Central Collection-CCC). Little is known about its potential as a source of genetic diversity for molecular breeding programs. In this study, we analyzed 809 Andigenum group accessions from the CCC using 5968 SNPs to determine: 1) the genetic diversity and population structure of the Andigenum germplasm and 2) the usefulness of this collection to map qualitative traits across the potato genome. The genetic structure analysis based on principal components, cluster analyses, and Bayesian inference revealed that the CCC can be subdivided into two main groups associated with their ploidy level: Phureja (diploid) and Andigena (tetraploid). The Andigena population was more genetically diverse but less genetically substructured than the Phureja population (three vs. five subpopulations, respectively). The association mapping analysis of qualitative morphological data using 4666 SNPs showed 23 markers significantly associated with nine morphological traits. The present study showed that the CCC is a highly diverse germplasm collection genetically and phenotypically, useful to implement association mapping in order to identify genes related to traits of interest and to assist future potato genetic breeding programs.


Introduction
Solanum tuberosum L. is a herbaceous species that reproduces mainly vegetatively by tubers, distributed from the Southwestern United States to South-central Chile, with centers of diversity located in Central Mexico and in the high Andes from Peru to Northwestern Argentina [1]. Potato is the fourth most important crop food in the world after corn, rice and wheat [2]. It is consumed by people worldwide either as a non-grain staple or as a vegetable. It has high nutrient value providing carbohydrates, proteins, vitamins and minerals [3]. Solanum tuberosum contains two cultivar groups, the Chilotanum group comprising lowland tetraploid a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 By combining molecular and morphological data from the potato germplasm of CCC is possible to map simple or complex traits and subsequently to identify candidate genes through Genome-Wide Association Studies (GWAS) or Association mapping (AM). Such studies provide an efficient way to map quantitative trait loci (QTL) in natural populations or germplasm collections because they can detect historical recombination events and provide high mapping resolution [29][30][31]. The number of molecular markers required for implementing GWAS and the resolution for QTL mapping, is determined by the rate of LD decay between loci through the genome [32]. Although the LD decay in potato populations has been previously calculated, all reports differ: 265 bp (base pairs) [22], 1 cM (centiMorgan) [33], 5 cM [34] and 10 cM [35]. The incongruence between studies is probably due to differences in number, type and origin of samples and the type and number of molecular markers used. It is then necessary to calculate the LD background in this study.
In the present study, a genetic analysis of the CCC of S. tuberosum Andigenum group was conducted based on SNPs markers in order to evaluate its population structure and genetic diversity. Also, the extent of the linkage disequilibrium between pairs of SNPs markers was estimated in order to determine the utility of this germplasm and the molecular markers used to implement association-mapping studies. Accordingly, association mapping in tetraploid potatoes was conducted using morphological traits related with stem, berry, tuber and flower variables.

Plant material
A total of 809 accessions (one clone randomly selected from 16 clones grown per accession) of the CCC-CORPOICA of S. tuberosum group Andigenum conserved under field conditions in Zipaquira, Cundinamarca, Colombia (5˚03" 34.36" N, 74˚03" 29.61 W, 2.950 m altitude, average temperature 15˚C and relative humidity of 75%) were characterized. Six hundred seventyfive accessions are classified from passport data as Andigena (83.5%), 85 as Phureja (10.5%) and 49 as Chaucha (6.0%). Six hundred and sixteen accessions were collected from different Colombia regions (76.1%), 75 accessions from other countries (9.3%) and 118 accessions do not have passport data (14.6%) (Fig 1, Table 1). The information of each accession is presented in the S1 Table. DNA extraction, genotyping and SNP markers selection Fresh young leaves were collected from one plant randomly selected per accession. The material was lyophilized during two days at -50˚C and 0.20 mBar. The genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA). DNA concentration and quality were checked by visualization in a 1% (w/v) agarose gel and a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, USA). Genotyping was performed using the array available in 2013, the Infinium 8303 potato SNP array [19,21]. The array was read in the Illumina HiScan SQ system (Illumina, San Diego, CA) at CORPOICA. The software GenomeStudio version diploids and polyploids (Illumina, San Diego CA) was used to assign the genotype to each locus; five possible genotypes (AAAA, AAAB, AABB, ABBB or BBBB) in tetraploid potatoes and three possible genotypes (AA, AB and BB) in diploid potatoes. The assignation of samples as diploids through molecular markers was confirmed with the available information of cytogenetic analysis made in Phureja and Chaucha samples of the CCC reported by Guevara [36] and Uribe [37]. The SNPs that could not be called or were monomorphic were discarded. The remaining SNPs were filtered for up to 20% missing data and a Minor Allele Frequency (MAF) lower than 0.05. Genotypic data is provided in the S2 Table. Population structure and genetic differentiation The population structure analysis was performed using a Bayesian model implemented in the software Structure [38] without a priori population information using a tetraploid model

Morphological characterization and correlations among morphological, geographical and genetic data
Phenotypic data from fifteen qualitative characteristics of stem, berry, tuber, and flower were used for the morphological analysis ( Table 2). The Plant Genetics Resources team of COR-POICA recorded this information in eight different years (1995, 1996, 1997, 2004, 2006, 2009, 2010 and 2012)  . The correlations between morphological and genetic data were independently estimated for each variable. Subsequently, the global correlation was first calculated using the total of variables, and then using only positive and significant correlated variables.

Linkage disequilibrium
The linkage disequilibrium (LD) was calculated in each inferred population. The SNPs used presented the physical position (mapped) on the potato genome version 4.03 [55]. To include all SNP dosage (heterozygous genotypes), diploid and tetraploid data were analyzed following the report by Vos et al. [56], using the Pearson correlation coefficient between each pair of SNP marker. The LD decay was estimated using a combination of SNP markers in significant correlation (p < 0.001) with a threshold of r 2 that corresponded to 90 th percentile [56] of pairwise correlations of each population.

Association mapping analyses
Phenotypic data corresponding to 15 qualitative variables (

Genetic molecular analyses
The 809 accessions of the CCC were genotyped with 8303 SNPs using the Infinium SolCAP, 1584 markers were removed from the dataset since 1174 were monomorphic (14.1%), 405 SNPs could not be called (4.9%), and five presented more than 20% of missing data (0.1%). Genotype calling inferred 6719 high confidence SNPs (81%), from which 751 SNPs presenting a MAF less than 0.05, were also excluded, giving a total of 5968 useful markers (72%) ( Table 3). Of these

Population structure and genetic diversity in the Colombian Central Collection
The population structure analysis of the CCC using the software Structure discriminated two main populations (K = 2) (Fig 2A, S1A Fig). The previous result was supported by Neighbor-Joining clustering analysis ( Fig 2B) and the Principal Component Analysis in which 25.5% of variability was explained by the three first components ( Fig 2C). The first population, named as Phureja, contains 133 accessions (16.4% of the CCC) from which 82 accessions have passport data and are classified as Phureja, two as Andigena (And_4 and And_183) and 49 as Chaucha. The majority of accessions of the CCC (83.6%) constituted the second population, named as Andigena, which regrouped 673 accessions with passport data of Andigena and three of Phureja (Phu_47, Phu_119 and Phu_122) ( Table 3, S1 Table). The percentage of polymorphic SNPs was 66.2% and 99.7% for Phureja and Andigena populations, respectively ( Table 3). The genetic differentiation between Phureja and Andigena populations was high (F ST = 0.203, p = 0.000), and the percentage of genetic variation was higher within populations (81%) than among populations (19%) ( Table 4). High values of genetic variation within populations imply high genetic diversity. The CCC presented an excess of heterozygosity (F IS = -0.517, p = 1.000) and a low gene flow (Nm = 0.98) ( Table 4). High genetic diversity was found in the CCC (Ho CCC = 0.355, He CCC = 0.252), where the genetic diversity was higher in Andigena (Ho = 0.516, He = 0.337) than Phureja (Ho = 0.194, He = 0.167) ( Table 3). Observed population structure supported the passport data that differentiates two main groups, Andigena and Phureja. Samples included in the Phureja population were characterized by presenting a   Table 4). The genetic differentiation was supported by significant F ST values (p = 0.000) observed among the subpopulations that ranged from 0.161 (Phureja_1 vs. Phureja_2) to 0.435 (Phureja_2 vs. Phureja_3) (S4 Table). The distribution of genetic variation within and among subpopulations estimated by AMOVA indicated that 77% of the total genetic variation was found within subpopulations and 23% among subpopulations ( Table 4). The population Phureja presented high genetic diversity with an average Ho of 0.437, He of 0.267 and PIC of 0.279 (Table 3).
Andigena population. A total of 5901 SNPs (MAF > 0.05) were polymorphic in Andigena population and the analyses conducted on these data subdivided the Andigena population in five groups (K = 5) (Andigena_1-Andigena_5) (Fig 4 and S1C Fig). The inferred groups in the structure analysis were not clearly separated by the cluster analysis ( Fig 4B) and the DAPC, where the three first components of the PCA only explained the 20.7% of the variation. An unique subpopulation (Andigena_1) was genetically differentiated of the other four subpopulations (Fig 4C; S2 Table). The AMOVA showed that genetic variation was higher within subpopulations (93.5%) than among subpopulations (6.5%), with a population with low genetic structure (F ST = 0.06, p = 0.000), excess of heterozygosity (F IS = -0.59, p = 1.000) and high gene flow (Nm = 3.91) ( Table 4). The F ST values (p = 0.000) of Andigena_2 to Andigena_5 subpopulations were low, ranging from 0.031 (Andigena_3 vs. Andigena_5) to 0.080 (Andigena_2 vs. Andigena_3), and high among these subpopulations with Andigena_1 that ranged from 0.122 (Andigena_1 vs. Andigena_5) to 0.216 (Andigena_1 vs. Andigena_3) (S4 Table). The Andigena population presented a high genetic diversity with averages of Ho = 0.535, He = 0.319 and PIC = 0.269 (Table 3).

Morphological characterization of Andigena population
The MCA based on morphological traits among 624 Andigena accessions showed that the total morphological variation was distributed in 73 dimensions, from which the three first dimensions explained the 12.3% of the variation ( Table 2). The first dimension was provided by tuber variables as shape (GTS), color (PTSC) and primary skin intensity color (PTSIC). The second by berry color (BC), secondary color (STFC) and distribution of tuber flesh (DSTFC) and all variables related with flower (PFC, PFIC, SFC and DSFC). Finally, the primary tuber flesh color (PTFC), secondary color (STSC) and distribution of skin tuber (DSTSC) and variables related to stem color (SC) and berry shape (BS) contributed to the variation of the third dimension ( Table 2).
The cluster analysis discriminated six morphological groups within the Andigena population ( Fig 5). Although all the groups presented flesh tubers cream, in every group the largest proportion of accessions was characterized by specific tuber traits (S5 Table). Group 1 (108 accessions) is characterized to present compressed tubers with pale yellow skin and purple dots. Group 2 (59 accessions) had compressed tubers with dark purple skin, sometimes with scattered yellow spots and flesh cream color with secondary purple color distributed in narrow vascular ring. Group 3 (119 accessions) had compressed tubers with dark red skin. Group 4 (32 accessions) had compressed tubers with pale purple skin. Group 5 (159 accessions) had round tubers with dark purple skin with scattered yellow spots. Finally, compressed tubers with pale purple skin and yellow scattered spots are characteristics of group 6 (147 accessions). Group 4 presented white flowers while the other groups presented dark purple flowers. Correlations among morphological, geographical and genetic data The Mantel test showed no correlation between geographical distribution and morphological (1.2%, p = 0.311), and geographical distribution and genetic data (4.2%, p = 0.111). However, a low but significant correlation (13.2%, p = 0.001) was identified between all morphological variables analyzed and the genetic data (Table 5). Additionally, the correlation analysis was implemented for each morphological variable, independently. Within the 15 variables used, three (BS, PTFC and SFC) were not correlated (p > 0.05), three (SC, STFC and DSTFC) were negatively correlated (p < 0.05) and the remaining nine were positively correlated (p < 0.05). The variables with higher correlation were those related with flower variables (PFIC: 18.0%, PFC: 12.9%, DSFC: 12.8%), general tuber shape (GTS: 9.4%) and primary (PTSIC: 12.8%, PTSC: 8.5%) and secondary color of skin tuber (DSTSC: 7.5%, STSC: 7.1%) ( Table 2). The global correlation between morphological and genetic data using only variables significantly correlated was of 21.6% (p = 0.001) ( Table 5). Although this correlation was low, the subpopulations identified using molecular markers were characterized by presenting tuber traits in common. For instance, the tuber skin primary color of Group 1 and group 2 is dark purple, group 3 is pale yellow, group 4 is pale purple and group 5 is dark red. However, morphological and genetic groups did not completely match.

Linkage disequilibrium
The linkage disequilibrium between pairwise SNPs was estimated for Phureja and Andigena populations; the analysis showed that the amount of SNPs in LD and the extent of LD differed among these. Linkage disequilibrium in Phureja. The LD in Phureja was estimated using data from the entire population (133 accessions) and separately for the subpopulation Phureja_1. The analysis was not conducted in the subpopulations Phureja_2 and 3, because they presented a low number of samples. In this analysis the 2555 markers used, mapped on the 12 chromosomes of the genome, with a mean distance between markers of 22.7 Mb, ranging from 11.5 Mb (Chr. 2) to 34.2 Mb (Chr. 1). The Pearson r 2 values for the 133 Phureja accessions were 0.463 for linked markers with 49.8% of the markers in significant LD. The r 2 values ranged from 0.440 (Chr. 1) to 0.496 (Chr. 12) ( Table 6). The pairwise correlations among linked markers in significant LD (p < 0.001) were used to assess the extension of LD decay. The threshold for r 2 was 0.45 representing the 90 th percentile of all pairwise correlations in the Phureja population. Using this threshold, the LD declined to 3.5 Mb for linked markers in the population Phureja. For each chromosome of the potato genome the LD decay was estimated and ranged from 2 Mb (Chr. 1,4,11) to up to 9 Mb (Chr. 3, 12) ( Table 6).   (Table 6).

Association mapping analyses
The marker-phenotype association analysis was implemented using 4666 polymorphic SNPs of 463 tetraploid accessions of the CCC. A complete dataset of the phenotypic variables was available for these accessions. A total of 23 markers with log 10 (p-value) ranging between 4.6 for STFC (solcap_snp_c1_12945) and 9.36 for PFC and PFIC (solcap_snp_c2_43970), were significantly associated with 9 of the 15 evaluated variables (Table 7). In addition, seven markers presented significant p values less than 0.01 and four had p values less than 0.001. Of these four markers, three (solcap_snp_c2_45693, solcap_snp_c2_23347 and solcap_snp_c2_43970) were associated with PFC and PFIC and one (solcap_snp_c2_45235) with STFC (Table 7).

Discussion
The growth in food demand and climate change raised the necessity to generate crop varieties having higher yield and adapted to a changing environment [58]. It is fundamental to plant breeding to characterize the genebank collections because the genetic improvement of economically important traits depends on the genetic diversity available within the crop species and its wild relatives [59,60]. Modern elite gene pools could be created exploring the genetic resources conserved in large ex situ germplasm collections to identify genes of interest and allelic diversity [61,62]. Highly polymorphic molecular markers could be identified in diverse germplasm that could be effectively used for mapping genes or QTLs [62] to assist plant breeding programs. In Colombia, the CCC contains potato accessions coming from different Colombian regions and several countries. Researchers from CORPOICA had selected accessions from the CCC presenting valuable traits such as resistance to drought, to several diseases and to insect pests. Information about the genetic diversity and population structure of the CCC and the identification of molecular markers related to traits of interest for potato breeding could speed  [18]. The analysis included only 97 diploid accessions, from which few are in common with the CCC-CORPOICA [18]. The accession numbers of the CCC-Universidad Nacional de Colombia were modified and do not correspond to the accessions numbers of the CCC-CORPOICA, difficulting the comparison between studies. The present study is the first report using the majority of accessions of the CCC to assess its genetic variability, population structure and linkage disequilibrium. The information obtained will allow the implementation of association-mapping studies to this collection.

Genetic analyses
The development of SNP arrays using high-throughput technology has allowed to genotype germplasm of crops such as potato [20,19], tomato [63], barley [64], rice [65] among others. In this study, the Infinium SolCAP 8K was used to genotype accessions of the CCC, providing informative data with 72% of polymorphic loci. Previous studies in potato germplasm of other collections reported similar level of polymorphism using the same array: 77% [9], 74% [22], 61% [23], 67% [25], and 76% [66]. A degree of ascertainment bias could be expected when the SolCAP 8K is used to analyze populations such as the Colombian potato germplasm because it was designed based on transcriptome data and EST databases of North American cultivars [19,21]. However, the high percentage of polymorphism suggested that the array provided enough markers representing the allelic composition of the CCC compared to previous works in other germplasm using the same array [22,23]. A high number of polymorphic markers was expected due to the significant number of samples included [20]. This paper presents a robust analysis of the genetic diversity of CCC using a high number of molecular markers distributed on the 12 chromosomes of the potato genome. A previous genetic study using only 97 diploid accessions and 42 SSR covered a small amount of the potato genome, with a mean coverage of three markers per chromosome [18]. In general, the highest proportion of genetic studies in potato have used techniques that produced few molecular markers such as SSR [67][68][69], AFLPs [34, 59,70] and RAPDs [71][72][73]. Each type of molecular marker provides information not always comparable because some have a biallelic and others a multiallelic nature [7,74]. However, the estimation of the genetic variability of a population improves as the number of markers increase [75]; the SolCAP 8K could then provide a better assessment of the genetic variability of the CCC.

Population structure and genetic diversity in the Colombian Central Collection
In this study, the molecular markers were useful to identify mislabeled accessions [7,23]; some accessions of Andigena and Phureja did not clustered according to their passport data. The impossibility to identify two different populations of Phureja and Chaucha suggested an error of classification in the CCC. According to Guevara [36], accessions of the CCC labeled as Chaucha are not triploids as expected but diploids (2n = 2x = 24) as Phureja accessions [37]. Hence these accessions were probably misclassified as Chaucha, being in fact Phureja. The misclassification of accessions and errors in the assignment of samples to corresponding group in the CCC could have several explanations. The common name used by farmers for the same type of potato probably changes from region to region. For example, in the state of Nariño in Colombia farmers use the name Chauchas for potatoes similar to Phurejas. Another explanation could be a hybrid origin of these accessions; natural hybridization occurs between varieties in cultivated areas because potato farmers do not cultivate the varieties separately [4,76].
Population structure and genetic diversity in Phureja and Andigena populations. The two inferred populations of CCC present high genetic diversity and were genetically differentiated with low gene flow among them, probably due to the difference in ploidy level [35]. The SolCAP array was also able to differentiate European [22] and American [23] potatoes by their ploidy level. The diploid population (Phureja) had high genetic differentiation, all the multivariate analyses supported the presence of the three subgroups and genetic admixture was no identified. In fact, the results showed a low gene flow, suggesting a strong genetic differentiation, given that Nm is inversely proportional to the genetic differentiation among populations [77]. Human selection (e.g. breeders, farmers) to color and quality of tuber probably played an important role shaping the current population structure of group Phureja. However, it is necessary to conduct a morphological evaluation of Phureja potatoes of the CCC in order to support this hypothesis. The results obtained from Phureja population contrasted to the reported in the study of Juyó et al. [18], who identified a moderate population structure (F ST = 0.09), a high gene flow (Nm = 1.61) and only 9.64% of the variation among populations in diploid accessions of CCC-Universidad Nacional de Colombia. These two studies differed in the molecular markers (number and type) and samples (number and origin) evaluated. Samples analyzed in the two works were not exactly the same. Although the CCC-Phureja from the Universidad Nacional de Colombia conserves part of the accessions of the CCC-CORPOICA, the ID numbers did not match. In addition, some accessions of the CCC-Universidad Nacional were recently collected. Juyó et al. [18] used SSR markers, which are considered more efficient than SNP markers to identify subpopulations, because they are neutral and more alleles can be identified [78][79]. However, the high number of SNPs markers used in this study allowed to identify three populations in Phureja accessions. The population structure is influenced by the joint effects of many factors including the mating system, natural and artificial selection, mutation, migration and dispersal mechanism, drift, etc. [80,81]. In potato, the selection of potatoes by farmers and breeders presenting characteristics such as high yield, large tubers, low glycoalkaloid levels, desirable flavor, short cooking times and high nutritional value could affect the genetic structure [82][83][84].
Andigena population presents a genetic admixture supported by a high gene flow among populations [85]. The lack of population structure in tetraploid potatoes has been previously reported in other studies [35, [86][87][88] and has been explained by sexual polyploidization, intervarietal introgressive hybridization and long-distance dispersion [5,89]. Although the whole Andigena population did not show a population structure, a cluster (Andigena_1) with samples probably belonging to the Tuberosum group could be identified. The S. tuberosum group tuberosum of CCC were probably originated from landraces and breeding material from United States and Europe [12]. Tuberosum potatoes differentiate from other Andigena potatoes by the formation of tubers in long days and by their adaptation of medium altitudes and subtropical weather from Europe, United States and Asia [8,90].
High genetic diversity was found in both populations according with other studies [18,68,89]. In this work, the observed heterozygosity was higher than expected heterozygosity. Potato is an outcrossing species thus the proportion of inbreeding is expected to be low, thus the heterozygosity is higher than expected. The high diversity in potato is explained by its evolution shaped by selection, migration, mutation, hybridization, polyploidization and introgression. In the case of diploid potatoes, wild and cultivated species are often self-incompatible (SI) [91,92]. Thus, potato genetics allow the production of heterozygote plants increasing the genetic variability [1,35,81]. The PIC values suggested that the SNPs of SolCAP are useful to analyze diploid and tetraploid accessions and could support the suggestion that genetic diversity in tetraploid potatoes has not been narrowed in spite of the commercial breeding efforts [10,34]. Based on PIC values, the CCC (PIC = 0.437) is more diverse than European potatoes (PIC = 0.35), supporting the idea that South American potato populations are more diverse than European potatoes reported by Bornet et al. [93] and Esfahani et al. [94]. According to these results, the CCC has a broad genetic basis with alleles that could be profitable for plant breeding [21]. In fact, studies in diploid accessions of the CCC-Universidad Nacional de Colombia have already detected markers related to resistance to Phytophthora infestans [95], sugar content and frying color [96].

Morphological characterization of Andigena population
The accessions of Andigena population showed wide phenotypic diversity based on fifteen morphological traits, in which shape, skin color and color intensity of tuber and flower attributes were the most informative variables to discriminate the six groups of Andigena. Previous works reported that the same variables were useful to differentiate potato accessions [97][98][99]. Variables describing the tuber are the most useful descriptors to select potatoes for breeding programs [100]. The dark color in skin and flesh tuber is an indicator of the presence of phenolic compounds which are considered health-promoting phytochemicals because of their antioxidant properties [101]. The CCC presents a wide variability in tuber colors indicating a potential source of accessions with high phenolic compounds levels; further characterization of content of biochemical compounds of the CCC is needed. Previously, Bernal et al. [97] analyzed morphologically 464 accessions of the CCC of potato. They found seven different groups instead of six groups and they identified higher morphological variability than the present study. However, the same traits were reported as informative in the two studies, and the samples were regrouped based on the same characters of tuber and flower. The difference in results between studies could be due to a smaller number of variables used in this study. Additionally, the data analysis made by Bernal et al. [97] was based on one year of morphological records. In contrast, the present study used morphological data recorded on eight different years. Our analyses identified that some descriptors changed over the years such as color and intensities of tubers and flowers. The lack of stability of morphological characters has been also reported in the evaluation of the CIP collection [5] suggesting that the selection of potato materials could not be only based on morphological data. The characterization and selection of potato accessions should be complemented with molecular data, reported to be more informative and neutral than phenotypic traits in establishing potato relationships [62].

Correlations among morphological, geographical and genetic data
In this work, geographic distance was not correlated between genetic and morphological distances. Similar results were obtained in previous studies using potato collections for morphological data [102,103] and molecular data [7,[104][105][106]. The lack of correlation is probably the result of tuber transportation by humans [107], caused by historical migrations of wild potato germplasm away from their regions of origin [23]. Morphological and genetic data were weakly correlated; similar results were found in other populations of potato [108][109][110]. The low correlation between genetic and morphological data is probably due to differences in selection pressure. Non-adaptive molecular markers are usually not subjected to natural or artificial selection while phenotypic characters are subjected to selection pressure and influenced by the environment [106,111]. This result could explain why groups identified through molecular and morphological markers did not match.

Linkage disequilibrium and association mapping analyses
The linkage between molecular markers and phenotypic polymorphisms is required for the association mapping of genes or QTLs underlying traits of interest [112]. The extent of LD can be affected by factors such as genetic drift, population structure and selection [113]. In association mapping studies, a key factor is to know the population structure in order to improve the statistical power and decrease the false positive rate in gene discovery [76]. The analysis of LD was independently conducted for Phureja and Andigena, where the LD levels varied between these. High levels of r 2 and SNP pairs with significant LD in Phureja and Andigena were identified. These results contrasted with the study of Juyó et al. [18] in diploid potatoes in which no molecular markers in significant LD were detected, probably due to the number and type of markers used (SSRs). Additionally, the number of linked markers in LD was higher than unlinked markers as expected, thus physical linkage strongly influences LD. The results indicate that molecular markers found in CCC in this study are suitable for an association analysis [114].
To estimate the LD decay in Phureja and Andigena populations, a r 2 threshold of 0.45 (Phureja) and of 0.25 (Andigena) were used. Those values corresponded to the 90 th percentile of the distribution of all pairwise Pearson correlation in each population. Vos et al. [56] found that percentiles of 90 or 95 are useful to estimate the LD in potato. The difference in cutoff used in previous studies (r 2 = 0.1) did not allow the comparison among studies [22,34,35,95]. However, the LD decay values obtained in this work in tetraploid potatoes were similar to the reported in the potato germplasm (0.6-1.5 Mb) analyzed by Vos et al. [56]. The r 2 values and extent of LD through the genome differ among studies because of differences in population size, number and type of markers [115] and the regression methods used to measure the LD [116]. The polyploidy and outcrossing species generally exhibit low LD because of the recombination events, which occur more frequently in large and highly heterozygous populations [117]. In contrast, the self-pollinated crops usually display LD over larger distances as a consequence of their mating system [34]. Based on its LD value, potato behaves as a self-pollinated crop even if it is an outcrossing species. The clonal propagation of potato limits the number of meiotic generations and in consequence the recombination events [33][34][35]118]. The LD in Andigena and Phureja decayed slowly, previous works also reported a slow LD decay for potato populations: 1 cM [33], 10 cM [35] and 5 cM [34]. It is not rare to found differences in values of LD decay among populations that have suffered different breeding history and human selection [119,120].
The LD decay value is useful to design future GWAS studies; it makes possible to estimate the minimum number of SNPs required to have a successful GWAS [115]. Since Phureja and Andigena populations have a long range LD through the genome, with a physical genome length of 844 Mb [121,122] and a genetic map length of 800 cM [123], association studies can be performed with a modest number of markers per unit of genetic distance, this inference in potato has been reported previously by D'hoop et al. [34] and Simko et al. [35]. The inferences about the association mapping in the CCC of potato were validated with the identification of molecular markers associated with the morphological traits. In a GWAS analyzing North American potatoes using the same array, molecular markers with minor effects were identified to be related to morphological data such as total yield, eye depth, tuber shape and tuber length [27]. In the present work, four of 23 associated markers presented p values less than 0.001. Of these four markers, three were associated with flower primary color and one with secondary color distribution in tuber skin. The marker solcap_snp_c2_45235 (Chr. 10, position: 58437496) was associated to secondary color and was mapped to the gene Sotub10g021050.1.1 (PGSC0003DMG400008137) which has a glucosyltransferase function. Some glucosyltransferase enzymes are implicated in the production of anthocyanin, pigment compound of skin and flesh tubers [124]. In addition, the same SNP (solcap_snp_c2_45235) is located closed to two genes (PGSC0003DMG400013965, PGSC0003DMG400012891) associated to skin and flesh color of potato tuber, reported recently by Endelman and Jansky [125].
The SNP dataset produced in this study and the germplasm analyzed would allow the implementation of association-mapping studies and to detect markers or genes associated to traits of interest useful for potato breeding such as resistance to pathogens and insect pests, tolerance of abiotic stresses and tuber quality. The function of the associated markers should be validated through genetic transformation. Additionally, conventional potato plant breeding programs could be supported using the genetic information through marker-assisted selection (MAS) and genomic selection (GS), and thus to accelerate the selection of potato materials and reduce the cost and time to develop new potato varieties.

Conclusion
The present study is the first report of phenotypic and genotypic evaluations of the Colombian Central Collection of Solanum tuberosum using morphological and SNP molecular markers. The study identified high levels of genetic diversity and genetic differentiation in diploid and tetraploid potatoes. CCC constitutes a potential source of variable traits useful for a genetic breeding program. Additionally, the linkage disequilibrium study of the CCC indicated that the genomes of Phureja and Andigena presented an elevated number of SNP pairs in significant LD and a slow LD decay, suggesting that with a modest number of molecular markers, a marker-phenotype association could be detected. The information obtained in this work allowed to conclude that the CCC is a germplasm with a broad genetic base and is useful to conduct association mapping studies suitable for the identification of QTLs/genes associated to quality traits and biotic and abiotic stress tolerance traits.