Genetic variability in Brazilian Capsicum baccatum germplasm collection assessed by morphological fruit traits and AFLP markers

Capsicum baccatum is one of the main pepper species grown and consumed in South America. In Brazil, it is commonly cultivated by family farmers, using mostly the genotypes bishop's hat genotypes (locally cambuci) and red chili pepper (dedo-de-moça). This study had the objective of characterizing 116 C. baccatum accessions from different regions of Brazil, based on morphological fruit descriptors and AFLP (Amplified Fragment Length Polymorphisms) markers. Broad phenotypic variability among the C. baccatum accessions was detected when using morphological fruit descriptors. The Ward modified location model (Ward-MLM) discriminated five groups, based mainly on fruit shape. Six combinations of AFLP primers detected polymorphism in 97.93% of the 2466 identified bands, indicating the high genetic variability in the accessions. The UPGMA coincided with the Bayesian clustering analysis and three large groups were formed, separating the wild variety C. baccatum var. praetermissum from the other accessions. There was no relation between genetic distance and geographical origin of the accessions, probably due to the intense exchange of fruits and seeds between farmers. Morphological descriptors used together with AFLP markers proved efficient in detecting the levels of genetic variability among the accessions maintained in the germplasm collections. These results can be used as an additional source of helpful information to be exploited in C. baccatum breeding programs.


Introduction
The genus Capsicum (Solanaceae) is native to the tropical zones of Central and South America, and was one of the first genera to be domesticated, around 6000 B.C. [1]. The genus comprises approximately 38 described species, with great morphological variation, mainly in terms of fruit color, size, shape and levels of pungency [2]. Among these species, five are considered For the experiments, the accessions were sown in 128-cell polystyrene trays containing the substrate Vivatto1. After 30 days, the seedlings were planted in a protected cultivation area of UEL, Londrina, Paraná, Brazil. The fruit morphology was characterized based on the descriptors proposed by the International Plant Genetic Resources Institute (currently named Biodiversity International) [16]. Fourteen fruit descriptors were used, of which 10 were qualitative: anthocyanin spots or stripes, fruit color at intermediate stage, fruit color at mature stage, fruit shape, fruit shape at pedicel attachment, fruit blossom end appendage, fruit shape at blossom end, fruit cross-sectional corrugation, number of locules per fruit and fruit surface; and four quantitative variables: fruit length, fruit width, fresh fruit weight, and fruit wall thickness.

Genotyping: AFLP markers
For molecular analysis, the DNA was extracted separately from young leaves of five plants per accession. The samples were prepared using an automatic DNA extractor (Retch MM 400), followed by extraction as proposed by Doyle and Doyle [17] CTAB (Cetyltrimethylammonium Bromide, Sigma-Aldrich, Missouri-USA) method, except that CTAB in the extraction buffer was replaced by MATAB (Alkyltrimethylammonium Bromide, Sigma-Aldrich, Missouri-USA). The DNA quality and integrity were assessed by electrophoresis on 1% agarose gel. The DNA concentration was determined by a Nano drop Spectrophotometer 2000/2000c (Thermo Scientific, California-USA).
The AFLP technique was applied according to the protocol proposed by Vos et al [18], with modifications. The DNA extracted from five plants per accession was mixed proportionally. Approximately 700 ng of this DNA was double-digested with 1 U of MseI and 5 U of EcoRI (Thermo Scientific, California-USA) and ligated to the adapters EcoRI (0.5 μM) and MseI (5 μM) in a reaction containing: T4 DNA ligase (2U); buffer T4 DNA ligase 1X; NaCl (0.05 M); BSA (50 μg/μL); and DTT (0.25 mM) up to a final volume of 10 μL. The program for the digestion-ligation consisted of: 37˚C for 4 h, 22˚C for 1 h and 70˚C for 10 min. The pattern of digestion-ligation products was visualized on 1% agarose gel. Once the digestion was confirmed, the amplified product was diluted 1:4 with ultrapure water.
Pre-selective amplification was performed using 3.5 μL of GoTaq1 Green Master Mix (Promega, Winchester-USA), 0.58 μL of the pre-selective primers EcoRI+A and MseI+C (4.75 μM) pre-selective primers, 3.0 μL of the dilution of the restriction-ligation mixture and ultra pure water to a volume of 10 μL. Pre-selective amplification consisted of: 1 cycle at 72˚C for 2 min, 20 cycles at 94˚C for 1 sec, 56˚C for 30 sec, 72˚C for 2 min and a final cycle at 60˚C for 30 min. Pre-selective PCR amplification was confirmed on 2% agarose gel and the amplified product was diluted to 1:8 in ultrapure water. For the selective amplification, 12 combinations of selective EcoRI/MseI primers were initially screened for polymorphism and repeatability. The six most polymorphic combinations were chosen for fluorescent labeling

Data analysis
Phenotypic descriptors (qualitative and quantitative) were analyzed using the Ward-MLM (Modified Location Model) method, as proposed by Franco et al. [19], which allows the simultaneous analysis of qualitative and quantitative traits. For the clustering formation between accessions, the CLUSTER and IML procedures, available in the SAS program [20], were used. The distance matrix was determined by the Gower algorithm [21], for Ward's clustering. The ideal number of groups was defined based on the criterion of the likelihood function, maximized according to the MLM method [20]. The differences between groups were analyzed by canonical variables and Mahalanobis' distance [20].
To analyze the molecular data, Jaccard's distance matrix was calculated and later UPGMA (Unweighted pair-group method using arithmetic averages). Data were analyzed by software FAMD, v. 2.3 [22] and FigTree 1.4.3. Based on the molecular data, Bayesian clustering was also performed using Structure software v. 2.3.4 [23], based on the method described by Evano et al. [24], with 100,000 iterations (Monte Carlo Markov Chain), with a burn-in of 10,000 iterations, in a model assuming mixed clusters (admixture) and correlated allele frequencies. The cluster number was determined according to the method described by Evanno et al. [24], and the graphs were generated by the online interface Structure Harvester [25]. For the analysis of molecular variance (AMOVA) among accessions of the three genebanks software Arlequin 3.5 we used [26].

Morphological characterization of fruits
Wide morphological variability was observed among the accessions (Fig 1). Six different colors were observed in the mature fruit stage. The red color was predominant, with a total of 97 Fruit color at mature stage accessions, followed by orange (8), dark red (5), orange-yellow (4), lemon yellow (1) and pale orange (1) colors (Table 1). In the intermediate fruit stage, 60 accessions were green, followed by 53 orange and 3 yellow accessions. All fruit shapes proposed by IPGRI [16] were observed, but elongated shape was predominant, with 46 accessions, followed by triangular, campanulate, almost rounded, and blocky shapes, with 35, 19, 10, and 6 accessions, respectively. The fruit surface was predominantly smooth, and all classes of fruit shape at pedicel attachment were represented, with prevalence of the obtuse shape. Fruit shape at blossom end was mostly pointed, followed by sunken, sunken and pointed, blunt and other. Most of the accessions had no appendage at the fruit tip, and had anthocyanin spots in the immature stage. The fruit cross-sectional corrugation varied from slightly corrugated to intermediate corrugated and corrugated, and the number of locules varied from one to four, with predominance of three locules.
For the quantitative fruit descriptors, wide variability among accessions was also observed (Fig 2). Fruit length varied from 0.93 to 13.64 ( " X = 5.38 cm), while the width ranged from 0.40 to 5.90 cm ( " X = 2.48 cm). For fruit weight and wall thickness, the variation was 0.33 to 34.21 cm ( " X = 10.62 cm) and 0.01 to 0.46 cm ( " X = 0.21 cm), respectively. The Log-Likelihood function showed that the optimal number of groups was two or five, since the highest values were reached at these points (93.80 and 79.13, respectively) ( Table 2; Fig 3). Group G1 consisted of 18 accessions, in which orange and red fruits predominated in the immature and mature stages, respectively, together with triangular shape and smooth surface (Table 1). Group G2, with 20 accessions, had fruits with mostly green and red color in the immature and mature stages, respectively, campanulate shape, with absence of appendage at the fruit tip. Group G3 associated 37 accessions, with prevalence of red fruits in the mature stage, and elongated fruit shape, while in group G4, 17 accessions were grouped, with predominantly green and red fruits in the immature and mature stages, respectively. Group G5, with 24 accessions, had mostly green and red fruits in the immature and mature stages, respectively, elongate fruits with obtuse shape at pedicel attachment and pointed shape at blossom end. G2 and G5 clustered the heaviest fruits, and fruits in G2 were majority campanulate, had the largest width (Fig 3). For fruit length, the longest fruits (9.75 cm) were grouped in G5, while G1 and G4 clustered the lowest values (2.57 and 2.82 cm, respectively). For the variable wall thickness, the highest values were obtained for groups G1 and G2 (0.3 and 0.25 cm, respectively). The analysis of canonical variables (CAN) showed that the first two variables explained 96.76% of the total variation (CAN 1 and 2 with 79.88 and 26.88%, respectively) (Fig 4). The groups G2, G3 and G5 were allocated separately, while groups G1 and G4 overlapped, and had the smallest genetic distance (4.63) ( Table 3). Groups G2 and G5 were the most distant from each other (69.42).
The mean genetic distance, estimated by the Jaccard coefficient among all accessions, was 0.60. The classes with a genetic distance between 0.6 -| 0.7 and 0.5 -| 0.6 had the highest frequency (42.77 and 26.95%, respectively) ( Fig 5). The genetic distance was smallest (0.32) between accessions UEL149 and UEL153 and greatest (0.8) between UEL182 and UEL105. When analyzing samples of the three gene banks separately, we found a mean distance of 0.55, 0.52 and 0.54, among the accessions of the collections from UENF, Embrapa Clima  Temperado and Embrapa Hortaliças, respectively. The distribution of the frequency classes was broad in all three genebanks banks (Fig 5). The dendrogram, obtained by UPGMA hierarchical clustering analysis, identified the formation of three large groups, with a clear separation of the accessions UEL179 and UEL123, which can be considered outlier (Fig 6). Group I consisted of 37 accessions, all from the genebank of Embrapa Hortaliças, originally from the states of Goiás, Distrito Federal, São Paulo, Santa Catarina, Paraná, Minas Gerais, and representative accessions from Peru (UEL187, UEL189 and UEL190), Bolívia (UEL204, UEL207 and UEL208), India (UEL205) and AVDRC (UEL196 and UEL197), plus three commercial cultivars (Topseed (UEL211 and UEL212) and Agroflora-Sakata (UEL213). Group II, consisted of 37 accessions, including accessions from the UENF and Embrapa Clima Temperado gene banks, which were originally from the states of Rio de Janeiro, Rio Grande do Sul, Paraná, Santa Catarina, Pará, Mato Grosso, Minas Gerais  Based on simulations provided by Structure software and the methodology of the Δk value proposed by Evanno et al. [24], the optimal K was four (Fig 7). An agreement of results of Structure analysis with UPGMA clustering was observed. However, four accessions were classified as admixture for having an adhesion coefficient lower than 0.6 for all groups.
The analysis of molecular variance (AMOVA) of the domesticated accessions (C. baccatum var. pendulum) of the three genebanks detected higher variation within (83.18%) rather than among the collections (16.82%).

Morphological characterization of fruits
The characterization of fruit morphology by the descriptors proposed by Biodiversity International detected broad variability among the accessions studied. The variability in fruit color and shape and the use of new descriptors, not proposed in the original list (e.g. fruit surface and fruit shape at blossom end), evidenced the enormous phenotypic variability in fruits of C baccatum accessions. Some accessions (for example, UEL139 and UEL157) also had more than one color before reaching the mature stage. These results were also observed by Sudré et al. [13], who verified wide variability in color of the Capsicum spp. accessions.
The wide phenotypic variability in fruits of species of the genus Capsicum was also reported in studies with C. chinense [12,27], C. annuum [28], C. frutescens [29,30], and C. baccatum [4,8,10]. The morphological variability observed in C. baccatum fruits may be related to the wide geographic distribution of the species in diverse climatic and environmental conditions, which enables the selection of genotypes more adapted to local conditions [7,31]. Our results demonstrate that some descriptors were essential for the distinction between groups, especially fruit shape, in agreement with the study of Baba et al. [12], which also mentioned this descriptor as essential for the distinction of C. chinense accessions. Some traits related directly to fruit shape were also essential for the separation of the groups.
The characterization and quantification of the phenotypic variability of the fruits are also highly relevant with a view to their conservation and use in plant breeding programs. The accessions of C. baccatum with longer and elongated fruits, which are predominant in groups G3 and G5 (e.g. UEL103, UEL107, UEL111, UEL113, UEL122, UEL129, UEL135, UEL136, UEL151, UEL156, UEL174, UEL175, UEL182, UEL205, and UEL214), can be included in breeding programs of red chili pepper (Fig 1). The red chili pepper is highly appreciated in Brazil, particularly in the South and Southeast, and is generally consumed fresh, in sauces or in the form of dehydrated flakes [11,32].
Large and long fruits [33], are generally more attractive for the fresh pepper market in Brazil, while smaller fruits with a higher dry mass content are more suitable for the dehydrated food industry. Another important trait is the wall thickness, since fruits with a thicker wall are more resistant against damage during post-harvest handling and have a fresher appearance than fruits with a thinner wall.
Other accessions, such as UEL105, UEL119, UEL123, UEL146, UEL158, UEL198, and UEL201, have small fruits with different colors during maturation. The attractive and aesthetic value of ornamental peppers is related to the color change during fruit maturation, as well as the different fruit shapes and sizes [34]. In Brazil, the market of ornamental peppers is extensive and therefore an alternative for small rural producers [14,35].
The differentiation of the groups formed by Ward-MLM was confirmed by the analysis of canonical variables and Mahalanobis'distance. This statistical procedure, which allows the combined analysis of quantitative and qualitative data, consists of two stages. In the first stage, the groups are defined by analysis with Ward's clustering algorithm and a Gower dissimilarity matrix [21], and in the second, the data are grouped by the Modified Location Model (MLM) analysis [19]. This methodology has been used to analyze morphoagronomic traits in several studies, e.g., for common bean [36], tomato [37] and Capsicum spp. [12,13,38]. The evaluation of 56 Capsicum spp accessions using the Ward-MLM procedure, for the analyses of 26 (15 qualitative and 11 and quantitative) descriptors, Sudré et al. [13] stated that allowed a separation of the species C. annuum, C. frutescens, C. baccatum, and C. chinense, confirming the importance of a simultaneous use of morphological and agronomic traits [13].

Molecular characterization
The high polymorphism generated by six AFLP primer combinations clearly shows that these markers were highly efficient to identify polymorphisms among accessions of C. baccatum. Similar results were found in analyses of the species C. baccatum and C. anuumm by Krishnamurthy et al. [39], in which the authors found a percentage of 93.96% polymorphic AFLP markers. In an evaluation of 226 accessions of a C. baccatum collection from different regions of South America [4], the percentages of polymorphism were 60% and 67% for accessions of C. baccatum var. baccatum and C. baccatum var. pendulum, respectively. These data strengthen the importance of genetic studies analyzing the collections maintained in genebanks.
The UPGMA analysis of the AFLP data separated the accessions in three large groups (Fig  6), in agreement with the results of the Bayesian analysis (Fig 7). In the latter, accession UEL123 was grouped close to UEL179, although considered an admixture, suggesting a genetic proximity among this accessions. The accession UEL157 has ornamental features and due to its proximity to UEL179 may present interesting properties that can be explored in the improvement of ornamental varieties. Besides, the accession UEL179 belongs to the wild variety of C. baccatum (C. baccatum var. praetermissum). The discrimination between domesticated and wild accessions of C. baccatum was also observed in another study [4], in which the authors mentioned that the separation of C. baccatum var. praetermissum is much more evident than that of other C. baccatum varieties, such as the wild variety C. baccatum var. baccatum. There are controversies in the scientific community regarding the taxonomic classification of C. baccatum var. praetermissum. Previous studies, reported by different authors [40,41,42] classified C. baccatum var. praetermissum as C. praetermissum, a distinct species from C. baccatum. This classification was supported by the studies of Moscone et al. [41] and Ibiza et al. [42]. On the other hand, in another study [43], it was classified as a variety of C. baccatum (C. baccatum var. praetermissum). In our results, the position of UEL179 in the dendrogram and in the Bayesian graphics provided evidences in favor of the treatment of C. baccatum var. praetermissum as a taxonomic entity, i.e. C. praetermissum. However, since our analysis included mostly accessions from Brazil (eastern distribution), additional studies, including more representatives from the entire distribution range (eastern and western) of C. baccatum, are needed to support this suggestion.
The accessions clustered in each of the groups generated by AFLP markers indicated high genetic variability, whereas no relation between the morphological fruit descriptors and the geographical origin was identified. The absence of an association between geographical origin and molecular markers is most likely due to seed exchange between farmers and the unrestricted fruit transport between different regions of Brazil [12]. This result was corroborated by other studies on C. chinense [12,44] and C. annuum [45].
In a genetic diversity study of C. baccatum accessions [4], two large groups, separated according to the geographical origins (East and West), were identified. The eastern group corresponded to accessions native to Brazil and to eastern Argentina and Paraguay, whereas the western group contained accessions from Peru, Colombia, Chile, Bolivia, and western Argentina. It was observed that the genetic pool of these two areas is homogeneous and that this separation probably resulted from isolation by distance. Therefore, the authors suggest that the species C. baccatum had different origins of domestication, evolving in two main lines (eastern and western). In our study, almost all accessions were from the eastern region, which may explain the absence of relationships between the genetic data and geographic origin.
The lack of associations between the morphological and molecular data of the fruits indicates that the two analyses are complementary and both necessary for a reliable characterization of a genebank. Several studies with Capsicum spp. [10,12,46,47] highlighted the relevance of phenotypic and molecular characterization for an improved understanding of the variability. The combined analysis of morphoagronomic and molecular data provides important findings about the genetic basis of the analyzed accessions, evidencing the great potential of the collection as a source of genes of interest. Our findings can be used to improve the efficiency of conservation and management procedures of the C. baccatum genebank.

Conclusions
Wide genetic variability among C. baccatum accessions was detected by fruit traits and AFLP molecular markers, indicating the high potential of these accessions in pepper breeding programs.