Insight into the Wild Origin, Migration and Domestication History of the Fine Flavour Nacional Theobroma cacao L. Variety from Ecuador

Ecuador’s economic history has been closely linked to Theobroma cacao L cultivation, and specifically to the native fine flavour Nacional cocoa variety. The original Nacional cocoa trees are presently in danger of extinction due to foreign germplasm introductions. In a previous work, a few non-introgressed Nacional types were identified as potential founders of the modern Ecuadorian cocoa population, but so far their origin could not be formally identified. In order to determine the putative centre of origin of Nacional and trace its domestication history, we used 80 simple sequence repeat (SSR) markers to analyse the relationships between these potential Nacional founders and 169 wild and cultivated cocoa accessions from South and Central America. The highest genetic similarity was observed between the Nacional pool and some wild genotypes from the southern Amazonian region of Ecuador, sampled along the Yacuambi, Nangaritza and Zamora rivers in Zamora Chinchipe province. This result was confirmed by a parentage analysis. Based on our results and on data about pre-Columbian civilization and Spanish colonization history of Ecuador, we determined, for the first time, the possible centre of origin and migration events of the Nacional variety from the Amazonian area until its arrival in the coastal provinces. As large unexplored forest areas still exist in the southern part of the Ecuadorian Amazonian region, our findings could provide clues as to where precious new genetic resources could be collected, and subsequently used to improve the flavour and disease resistance of modern Ecuadorian cocoa varieties.


Introduction
Theobroma cacao L. is the most economically important species of the Theobroma genus which is a member of the Malvaceae family. It is a diploid perennial species (2n = 26= 20) with a small genome ranging in size from 411 to 494 Mb [1]. Most of the T. cacao accessions are self incompatible due to a gameto-sporophytic incompatibility system [2][3][4]. However, some varieties, domesticated long ago, as the varieties Criollo from central America, Comun from Brazil, or Nacional from Ecuador, and some of their hybrid forms are self compatible.
Traditionnaly, two main genetic groups have been defined to classify cocoa accessions, based on morphological traits and geographical origins [5]: Criollo and Forastero. This classification also reflects the first commonly cultivated cocoa varieties: Criollo, first domesticated in Central America more than 2000 years ago, and a Lower Amazon Forastero variety (Amelonado type) domesticated in Brazil. Trinitario, a third group, corresponding to hybrids between Criollo and Forastero, was also recognised. However, the Forastero group also includes many other populations from all the Amazonian and Orinoco regions and presents a high diversity, as revealed by many genetic studies [6], [7], [8][9][10][11][12][13][14][15][16]. A new classification, identifying 10 genetic clusters, was more recently proposed [15].
The domestication of Criollo, originally cultivated by the Mayas in Central America was previously studied [17]. The authors concluded that the genetic base of the 'ancient' Criollo variety, cultivated in Central America before foreign cocoa introductions occurred during the 18 th century, was very narrow.
Cocoa cultivation in Ecuador is very ancient, dating back to the pre-Columbian age before the Spanish colonization of this territory. Pizarro, during his first voyage in 1526 along the South American coasts (nowadays Ecuador), found evidence of small plantations of an apparently native cocoa tree [18], [19]. It is very likely that the 'Nacional' Ecuadorian cocoa population existed for several centuries prior to the arrival of the Europeans [20], but its origin in the Ecuadorian coastal region has never been clarified. Two hypotheses have been put forward: according to Allen and Lass [21], the Nacional cocoa variety could have originated from a local wild population, that has nowadays completely disappeared along with the original forest cover of the region, or Nacional could have been introduced in the coastal region from the Amazonian area of Ecuador where wild cocoa is common [22][23][24]. There is no evidence that T. cacao and its products played any part in the lives of Ecuadorian coastal inhabitants during the Chorrera phase of the pre-Columbian Valdivia culture (2000 BC), contrary to the extensive amount of information available on this phenomenon with respect to Mesoamerican people, most notably the Mayas and Aztecs. In Ecuador, available information related to the history of Nacional cocoa cultivation is linked with the Spanish colonization history in this country.
Since the time when the first Spanish colonists began deforestation of the Ecuadorian coastal region, a large number of native cocoa trees have been reported [21][22][23][24][25], principally along the Guayas basin. Apparently, these colonists began to sow seeds of these native trees approximately 100 years after the discovery of America, and when the native Mesoamerican populations started to decline. The cocoa cultivation areas expanded, and the native Nacional variety rapidly became known worldwide due to its strong floral so-called ''Arriba'' aroma, exclusively generated by Nacional cocoa beans. The fine flavour quality of chocolate products obtained from Nacional cocoa beans has always been highly appreciated by chocolate manufacturers.
The native Nacional variety was the only one planted in Ecuador until the early 1890s, when foreign germplasm was first introduced in this country. In 1890, due to the quality traits of Nacional cocoa beans, Ecuador had a privileged position in the markets of Hamburg and London [26]. From 1910, foreign germplasm introductions progressively increased due to the appearance of two fungal diseases known as witches' broom (Moniliophthora perniciosa) and frosty pod (Moniliophthora roreri), which together devastated the native plantations.
A large genetic admixture between the native variety and foreign germplasm is currently found in modern Ecuadorian cocoa plantations [27]. The fine flavour cocoa aroma has decreased in this hybrid complex, and 25% of the Ecuadorian cocoa production was recently classified as 'bulk' cocoa by the International Cocoa Organization (ICCO). There is nowadays increasing demand for fine flavour cocoa, which presently represents 6.8% of the world cocoa production and for which Ecuador remains the main supplier [28].
In a previous work, a few non-introgressed Nacional types were identified within this hybrid population as potential representatives of the native Nacional variety [27]. Here we analyze their relationships with a wide range of wild cocoa genotypes covering a broad geographical range from upper and lower Amazonian regions [21], [22], [29], [30], [31], in order: a) to identify the putative centre of origin of Nacional, and b) to trace the domestication history of the Nacional cocoa variety.

Plant Material
A total of 176 individuals from different geographical origins were used for simple sequence repeat (SSR) analyses in this study (Table 1 and Table S1): N Seven putative ancient Nacional variety individuals cultivated along the Ecuadorian coastal region [27].
N Three cultivated genotypes from the lower Amazonian region of Brazil (BA).
N Twenty-four Criollo (Cr) variety samples collected in Central America, from Venezuela to Mexico [32], were also used in this study.
The germplasm collection and country of origin is reported for each accession in Table S1 and Fig. 1. The wild cocoa accessions from Ecuador and Peru were selected from the CRU living collection on the basis of the geographical area sampled along the Amazonian provinces [21][22] (Fig. 1b).

Molecular Markers
Eighty SSR markers were chosen for these analyses ( Table 2). Seventy four SSRs were isolated from expressed sequence tags (EST) and six from genomic sequences, and mapped [33][34][35]. They are distributed on all of the 10 cocoa chromosomes.

DNA Isolation and PCR Amplification
DNA samples were isolated from each individual as described previously [35]. For SSR analyses, PCR amplifications were performed as previously reported [35] and fragments were subsequently detected using a 'Megabase 1000' DNA analysis system (Molecular Dynamics/Amersham Life Science).

Statistical Analysis
Amplified SSR fragments were scored as alleles. The following genetic parameters were calculated for each population using GENETIX software [36]: unbiaised expected heterozygosity [37] (HE), observed heterozygosity (HO), proportion of polymorphic loci at which the frequency of the most common allele does not exceed 95% (P0.95) and 99% (P0.99), and mean number of alleles per locus.
We also estimated allelic richness (Rs) for each population, using a rarefaction approach to correct for unequal sample sizes [38] with the program FSTAT V. 2.9.3 [39].
Wright's F-statistics were estimated according to Weir and Cockerham [40] using the GENETIX software: F IS , an estimate of heterozygote deficiency or excess, calculated for the whole population sample and for each population, and F ST , the proportion of variance due to population differenciation, calculated for the whole population sample and between pairs of populations; their significance were assessed using a 1000-fold permutation test.
As most mutations in SSR involve the addition or subtraction of a small number of repeat units, according to the stepwise mutation model population, genetic differentiation was also estimated by R ST [41]. Rho ST , an unbiased version of R ST in which allele sizes are transformed to standardize variances, was estimated for pairs of populations using GENEPOP software, WEB version 4.0.10 [42][43][44].
Analyses of genetic distance among pairs of populations [45] were carried out to determine the genetic similarity between the Nacional pool and the wild and cultivated genotypes using GENETIX.
In addition, a dendrogram was constructed using a dissimilarity index (simple matching) and the neighbour-joining (NJ) method with 500 bootstrap replicates. Neighbour-joining cluster analyses were carried out using DARWIN software 5.0 [46].
Finally, a paternity analysis was performed using the software CERVUS, version 2.0 [47], to identify the most likely ancestral population of the Nacional variety. An 80% confidence threshold was used for paternity assignment.

Patterns of Genetic Diversity Among Cocoa Accessions
A total of 463 alleles were detected at the 80 analysed SSR loci across all 176 individuals sampled ( Table 2). A wide range of allele numbers was generally obtained for each locus (from 2 to 15), with an average of 5.78 alleles per locus. Genetic diversity parameters were evaluated in each group. As shown in Table 3, the mean number of alleles obtained per locus and per accession group corresponding to wild accessions ranged from 1.1 (BA) to 4.45 (LCT-EENa). The allele richness, which takes the sample size into account, varies from 1.05 (CR) to 1.53 (LCT-EENa). For both parameters, the most polymorphic wild cocoa accessions were found in the centre and northern Amazonian regions (A and B) of Ecuador (LCT-EENa and LCT-EENb).
Significant  (Table 3). The highest positive values are observed in the BA and Criollo groups, which correspond to self compatible varieties. However, highly homozygous genotypes can also be identified in the other groups such as in the French Guyana, LCT-EEN and Nanay groups (Table S1).
The F IS value for the whole sample set and over all loci was 0.2402, with a confidence interval comprised between 0.2155 and 0.2654 after a 1000-fold permutation test.
Genetic differentiation (F ST ) was estimated between pairs of populations (Table 4). Nearly all F ST values were significant after permutation tests. Only 3 F ST values were not significant: those between the population LCT-EENa from the northern part of the Ecuadorian upper Amazonian region and the two other centre and south regions of the Ecuadorian Amazonia (LCT-EENb and LCT-EENc), and the F ST between the Pound and the Nanay populations.
The F ST and Rho ST values between cocoa populations were very similar, with slight variations depending on the pair-wise populations (Table 5). Only in the case of the Morona population, its differenciation from the Scavina, VEN and Nacional groups appeared higher when evaluated with the Rho ST values compared to the F ST values.
The F ST value for the whole multilocus sample set was 0.4979, with a confidence interval comprised between 0.4782 and 0.5192 after a 1000-fold permutation test. The corresponding Rho ST value for the whole sample set was 0.530.
The neighbour-joining tree (Fig. 2), clustered the majority of cocoa samples according to their geographical origins. Four main clusters were obtained in this analysis ( The NJ tree (Fig. 2) highlighted the close relationship between the Nacional pool and the wild cocoa genotypes (LCT-EEN) sampled in the Amazonian region C from Ecuador (Fig. 1b). The accessions sampled in Peru along the Morona river were also genetically close to the Nacional pool; the Morona river region is geographically close to region C of Ecuadorian Amazonia. The largest genetic diversity, expressed by the proportion of polymor-  phic loci, the mean number of alleles per locus, and the allele richness, was found in the LCT-EEN accessions from Ecuador, and particularly in those originated from regions A and B (Fig. 1b).

Genetic Distance between the Native Nacional Genotypes and Wild Populations Present in the Amazonian Regions
To refine the previous analysis, coefficients of genetic distance were calculated for pair-wise comparisons of the 14 cocoa groups [45]. A matrix of genetic distance values for all groups is presented in Table 4. The highest genetic distance (2.445) was obtained between the cultivated Criollo and the Lower Amazon group from Brazil (BA), while the most similar populations were LCT-EENa and LCT-EENb, with a genetic distance of 0.039. In the case of the Nacional pool, the lowest genetic distance was noted with the  groups from the LCT-EENc region (0.250) and the Morona region (0.307). Subsequent analyses of genetic distance were carried out to identify the genetic distances between the Nacional pool and wild individuals collected along the neighbouring rivers within each Amazonian region of Ecuador (A, B, C), including all accessions collected in the upper Amazonian region of Peru (data not shown). This new approach allowed us to identify a few LCT-EENc samples (LCT-EEN85, LCT-EEN86 and LCT-EEN91), collected along the adjacent rivers Yacuambi and Nangaritza, as being wild cocoa trees genetically closest to the Nacional pool originally cultivated in the coast region. A key feature of some of the wild cocoa trees from these adjacent rivers is their low heterozygosity level (e.g. 16% for LCT-EEN 91). Note also that the Amazonian region C is geographically close to the Guayas basin (Fig. 1b) where the first Nacional cocoa plantations were established.

Potential Wild Ancestors of the Nacional Variety
A paternity analysis was performed to identify the most likely representatives of the ancestral population from which the Nacional variety originated. All putative tested parents were from the south Amazonian region C of Ecuador, closest to the Nacional genotypes.
Our results ( Table 6) indicated that the LCT-EEN85, LCT-EEN86 and LCT-EEN91 genotypes were the most likely parents of Nacional individuals. These findings were clearly in line with our previous results, suggesting that the native Nacional pool likely descended from wild cocoa trees growing in the vicinity of the Nangaritza and Yacuambi rivers in the southern part of Amazonian region C.

Discussion
Throughout the history of cocoa cultivation in Ecuador, substantial genetic changes, including reproductive behaviour, have occurred as a consequence of foreign germplasm introductions and genetic admixture associated with natural and human selection. The first study of genetic diversity of modern Ecuadorian T. cacao accessions was undertaken by Lerceteau et al. [6], [7]. However, the origin of Nacional cocoa variety, its genetic relationships with wild cocoa trees from Amazonia, and its domestication history had yet to be clarified.
In this study, we used 80 SSR markers to analyse relationships between representatives of the putative founders of the Nacional variety and wild populations from Amazonia in order to identify its wild genetic origin and elucidate the domestication events that occurred before the first Nacional plantations were set up in Ecuador in the Guayas Basin.  The accessions studied here cover most of the geographic origins of known T. cacao accessions, except for the Upper Amazonian Brazilian accessions, which were not very represented in this study, but for which no specific cluster was identified in the last proposed classification [15]. In our study, the diversity was first structured in four main groups, corresponding to different geographic origins, and with subdivisions within some of these four main groups: N The first group (cluster 1) included mostly accessions from the northern and central regions of Ecuadorian Amazonia (regions A and B), but without any structuring between them. This is in agreement with the very low genetic distance (0.039) and low genetic differentiation (F ST of 0.0081 and Rho ST of 20.0018) between these 2 groups.
N The second group (cluster 2) included most of the Peruvian accessions, i.e. mainly those collected in the northern region of Peru, associated with wild accessions collected in French Guiana and in the Orinoco region of Venezuela, and with some accessions from Ecuador. The cultivated genotypes from Lower Amazonia in Brazil were also included in this group.
The Upper Amazonian region has always been considered as the primary centre of origin and diversity of T. cacao L. species [5], [48]. The high genetic similarity observed between some wild cocoa accessions from the Upper Amazonian region (Parinari or Pound) and the group from French Guiana, suggests that during T. cacao evolution the extension of cocoa populations towards the eastern part of South America and until the French Guyana region, could have started from this Parinari Upper Amazonian region.
N The third group (cluster 3) included most of the accessions from the southern Amazonian region of Ecuador, the accessions collected in Peru along the Morona river, close to this southern Ecuadorian region, and the Nacional accessions. In this group, a close genetic relationship was found among some wild accessions from the southern part of Ecuadorian Amazonia (region C), and the Nacional pool from the coastal region. These results were supported by the genetic distance assessment, as well as by a parentage analysis. Our results suggest that the Amazonian region located in the Province of Zamora Chinchipe could be the centre of domestication of the Nacional variety at the origin of the first Nacional cocoa plantation that was set up on the Guayas riverbanks. Later, from this latter plantation, cocoa cultivation could have expanded along the Guayas tributary rivers (upper waters), which is now the 'Arriba' cocoa production region in the provinces of Guayas, Los Ríos and Manabí (Fig. 1b).
N The fourth group (cluster 4) included all Criollo accessions collected from Mexico to Venezuela, reflecting its narrow genetic base. In a previous study, some accessions from Colombia (EBC) were identified as being the closest to Criollo [17]. It has been suggested that Colombia is the centre of origin of Criollo. In our study, we did not have access to EBC Colombian accessions, but the higher genetic similarity and lower genetic differenciation (FST and RhoST values) observed between Criollo and accessions from the northern Amazonian region of Ecuador showed the same geographic trend as previously noted [17].
This migration and domestication hypothesis is supported by the geographical proximity between the first cocoa plantation set up along the Guayas riverbanks [26] and the wild cocoa collected along the Nangaritza, Yacuambi and Zamora riverbanks in the province of Zamora Chinchipe (Fig. 1b, region C). The morphological similarity between the Nacional variety and the native cocoa populations located in the Ecuadorian jungle (fruit and seed) close to the Amazonian cities of Archidona and Macas in region C, and already reported [49], further supports this hypothesis.
Inbreeding coefficients (F IS ) demonstrate a clear and significant excess of homozygotes in some groups. It is the case for the old domesticated and self compatible varieties Criollo and BA varieties, but also for some wild genotypes such as those of the VEN, GUY and LCT-EEN groups. Highly homozygous wild genotypes were identified in each region sampled. This latter result suggests that self compatible alleles already existed in these wild populations and that natural selection could have already eliminated part of the genetic burden due to inbreeding, thus facilitating the process of domestication of some wild populations.
Most representatives of the native Nacional variety have a high level of homozygosity, probably associated with their self compatibility [50][51][52]. The question is whether the domestication process has fixed and selected, by selfing, highly homozygous genotypes, or whether natural selection, leading to increased homozygosity, was already under way in wild populations before this domestication event. A common domestication feature is the reduction of genetic diversity in crops relative to wild progenitors. The severity of genetic loss ascribed to bottleneck effects varies greatly among crop species. Some authors indicated that this reduction results from two major forces [53]: first, most domestication events are thought to have involved initial populations of small size (relative to wild ancestors) that contained a narrow level of genetic diversity; the second factor to have an impact on crop genomes is the selection in favour of agronomic traits that distinguish crops from their ancestors. Humans usually apply selection pressure on the ancestral gene pool to select for favourable traits, thus increasing or fixing favourable alleles at genes controlling these traits.
The particular characteristics of some cocoa varieties could have facilitated the selection and migration of specific cocoa materials. From this standpoint, the specific aromatic flavour of chocolate produced from Nacional cocoa beans, which can already be detected on the bean pulp, even without a fermentation step, could have been one of the criteria used by the primitive human communities to choose the cocoa mother tree for further seed sowing. Indeed, it is possible that travelling merchants transporting cocoa pods along the roads used the fresh pulp only for their own refreshment and nutrition, but without consuming the cocoa beans, thereby introducing the cocoa tree into a new environment [54]. An extensive flavour aroma study in wild genotypes combined with an evolutionary analysis of this trait variation could help to identify candidate genes responsible for flavour in the aromatic Nacional variety.
Many crop species have been domesticated for thousands of years. Archeological evidence, based mainly on the presence of theobromine in pottery residue, revealed that cocoa was used in the early formative period in Mesoamerica, dating back to 600 BC until 1900 BC [55][56][57][58].
Cocoa domestication seems to have occurred much more recently in most of the other cultivating countries, despite the fact that wild cocoa populations exist in their areas, but no archeological evidence has been found so far in these countries. In Ecuador, information on the domestication of its native cocoa plantations dates back only four centuries.
There is no evidence of human dispersal of this species before the first cocoa plantations were planted by the Spanish in the Ecuadorian coastal region. Knowledge of the dispersal mechanism involved in the long-distance migration process is thus essential to explain our results. Several hypotheses have been put forward to explain the cocoa migration process, such as: the transport of fruits or seeds by birds, animals or humans, but so far none of these has been formally confirmed. It was suggested that the Nacional cocoa variety arrived via the old Inca roads and was planted by the native people of that time along the coastal regions [49]. However, archaeological evidence suggests that these old roads were not built by the Incas but rather by pre-Columbian native people, who inhabited the coastal, Andean and Amazonian regions of Ecuador [59] thousands of years before the arrival of the Incas in the Ecuadorian regions.
Recently, a ceremonial formative site was discovered on the eastern slopes of the Andes in the southern part of Ecuadorian Amazonia, and dating back to the 3rd millennium BC. The cultural remains showed a high degree of development of these societies, with ceramics and marine shells (Spondyle and Strombus), which is evidence that commercial exchanges with coastal people occured in this region [60], [61]. This site is located in the province of Zamora Chinchipe (region C), where the putative ancestors of the Nacional variety were identified. Archaeological evidence also showed that these exchanges went as far as a place presently known as La Cueva de los Tayos, which is located in the province of Morona Santiago [59].
Unfortunately, contrary to the Mayas who used hieroglyphs and represented cocoa, the more primitive people that inhabited Ecuador did not use writing symbols, so there is little evidence of cocoa domestication and use before Spanish arrival. However, archaeological evidence about contacts and exchanges of products between pre-Columbian peoples from coastal, sierran and Amazonian regions of Ecuador, dating back 3000 years BC [60], [62], could provide further explanations on the origin of cocoa trees in the Ecuadorian coastal region.
From our results, future cocoa expeditions should be carried out to confirm that the southern Amazonian area of Ecuador, could be the putative centre of origin of the Nacional variety, and to collect new germplasm with the same Nacional flavour specificities adapted for fine flavour cocoa improvement in Ecuador.

Supporting Information
Table S1 Origin, collection, % heterozygosity and status of the 176 T. cacao accessions analysed in the present study. CATIE: Centro Agronomico Tropical de   Table 1 for code identification. doi:10.1371/journal.pone.0048438.g002