Morphological and genetic diversity of camu-camu [Myrciaria dubia (Kunth) McVaugh] in the Peruvian Amazon

Camu-camu [Myrciaria dubia (Kunth) McVaugh] is currently an important and promising fruit species grown in the Peruvian Amazon, as well as in Brazil, Colombia, and Bolivia. The species is valued for its high content of fruit-based vitamin C. Large plantations have been established only in the last two decades, and a substantial part of the production is still obtained by collecting fruits from the wild. Domestication of the species is at an early stage; most farmers cultivate the plants without any breeding, or only through a simple mass selection process. The main objective of the study was to characterize morphological and genetic variation within and among cultivated and natural populations of camu-camu in the Peruvian Amazon. In total, we sampled 13 populations: ten wild in the Iquitos region, and three cultivated in the Pucallpa region in the Peruvian Amazon. To assess the genetic diversity using seven microsatellite loci, we analyzed samples from ten individual trees per each population (n = 126). Morphological data was collected from five trees from each population (n = 65). The analysis did not reveal statistically significant differences for most of the morphological descriptors. For wild and cultivated populations, the observed heterozygosity was 0.347 and 0.404 (expected 0.516 and 0.506), and the fixation index was 0.328 and 0.200, respectively. Wild populations could be divided into two groups according to the UPGMA and STRUCTURE analysis. In cultivated populations, their approximate origin was determined. Our findings indicate a high genetic diversity among the populations, but also a high degree of inbreeding within the populations. This can be explained by either the isolation of these populations from each other or the low number of individuals in some populations. This high level of genetic diversity can be explored for the selection of superior individuals for further breeding.

obtained by collecting fruits from the wild. Domestication of the species is at an early stage; most farmers cultivate the plants without any breeding, or only through a simple mass selection process. The main objective of the study was to characterize morphological and genetic variation within and among cultivated and natural populations of camu-camu in the Peruvian Amazon. In total, we sampled 13 populations: ten wild in the Iquitos region, and three cultivated in the Pucallpa region in the Peruvian Amazon. To assess the genetic diversity using seven microsatellite loci, we analyzed samples from ten individual trees per each population (n = 126). Morphological data was collected from five trees from each population (n = 65). The analysis did not reveal statistically significant differences for most of the morphological descriptors. For wild and cultivated populations, the observed heterozygosity was 0.347 and 0.404 (expected 0.516 and 0.506), and the fixation index was 0.328 and 0.200, respectively. Wild populations could be divided into two groups according to the UPGMA and STRUC-TURE analysis. In cultivated populations, their approximate origin was determined. Our findings indicate a high genetic diversity among the populations, but also a high degree of inbreeding within the populations. This can be explained by either the isolation of these populations from each other or the low number of individuals in some populations. This high level of genetic diversity can be explored for the selection of superior individuals for further breeding. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
The Amazonian tropical ecosystem is characterized by a high biodiversity of plant species (approx. 55,000 species), of which there are more than 150 edible fruit-bearing species used by the local population. However, only a few of these currently have a significant economic importance [1].
Camu-camu [Myrciaria dubia (Kunth) McVaugh], a shrub or small tree from the family Myrtaceae, has become an economically important fruit species in the recent decades [1]. This species grows naturally in seasonally flooded areas along rivers and oxbow lakes in the Amazonian basin. The economic importance of this species lies in the high content of vitamin C in its cherry-like fruits, which is reported to be in the range from 877 to 3,133 mg per 100 g of pulp. Camu-camu can be considered one of the richest sources of ascorbic acid (vitamin C) of all plant species [2]. Higher ascorbic acid content is present only in the kakadu plum (Terminalia ferdinandiana Exell), native to Northern Australia [3,4] (406-5,320 mg per 100 g of pulp). For centuries, the fruits of the camu-camu have been collected from wild trees, while cultivation is relatively new and plantations established only in the last few decades [5].
There is little information about the genetic diversity of camu-camu. Most of the studies published to date, focusing on this species, mainly targeted the ascorbic acid content of the fruit [6,7,8,9]. However, knowledge about its morphological and genetic diversity is important for further domestication and breeding of new varieties, which can achieve a higher yield, higher vitamin C content, or higher resistance to pests and diseases. For breeding of new varieties, it is necessary to preserve sites with the highest genetic diversity and, thereby, to protect not only the valuable genetic material but also animal species that contribute to its distribution [10,11]. This knowledge can help us to decide which populations should be protected, how large the protected area should be, and how many plants should be in the population to avoid inbreeding. Knowledge of genetic diversity is therefore important for in situ conservation, gene mapping, and finding new lineages. Preservation of genetic material is currently common practice, and defining diversity within and between natural populations is the first step to implementing breeding programs [12].
Due to the large area and specific conditions in which camu-camu grows, genetic diversity would be expected to be high among populations originating in different geographical regions [13,14]. Here we aimed to characterize the morphological and genetic diversity of cultivated and natural populations of camu-camu in the Peruvian Amazon and compare the genetic diversity among and within these populations. The diversity was assessed by morphological descriptors and genetic analysis using microsatellite markers (SSR markers). Specifically, we asked: (i) is the genetic and morphological diversity high due to the distances between populations and the isolation of individual sites by the Amazonian forest; (ii) are the populations rather inbred due to their small size; and (iii) are the cultivated populations less diverse than wild populations.

Study site and sample collection
Plant samples were collected between July and September 2015 around the towns of Iquitos and Pucallpa. In Iquitos, plant samples were collected in cooperation with IIAP (Instituto de Investigaciones de la Amazonía Peruana) on their experimental field "San Miguel" (GPS: -3.763404, -73.183840). IIAP is a national Peruvian research institution, and we cooperated with its investigators on the collection of samples in their research collection. From them, we obtained permission to collect samples of leaves for further analysis and to measure morphometric data of selected trees. The experimental field contained a collection of 115 different populations collected from Loreto and Ucayali Departments of Peru. These plants were identical to the parent plants from the natural habitats, as they were directly transplanted from their original environment as young seedlings/saplings (i.e., the plants were propagated by vegetative offshoots and not by seeds). Hence, these populations were considered as natural or wild for the purpose of this study; ten of those populations were chosen for sampling in our study with the aim to cover the largest possible area of natural occurrence of camu-camu in Peru (Fig 1). The (semi)domesticated or, in this study, so-called cultivated populations, were sampled around Pucallpa on three plantations (farms) located near the Yarina Cocha lake (Fig 1), each considered as one population ( Table 1). The samples were collected directly from farmer's fields with the permission of their owners. We encountered no obstacles to our sample collection and morphometric measurements. We did not collect the samples from wild nature. In each population, we randomly selected five individual trees per population for observing major morphological descriptors, and eight to ten trees for leaf tissue sampling. In total, 65 trees from 13 populations were sampled for evaluation of morphological traits. At each individual tree, we randomly selected five leaves, inflorescences and fruits per tree and measured: length/width of the leaves, petiole length, the number of flowers per inflorescence, size and weight of fruits, number and weight of seeds, and the weight of pulp. For genetic analysis, we collected 126 fresh leaf samples (96 from the wild, and 30 from cultivated populations), which were dried by silica gel and stored in plastic bags. We confirm that the field study did not involve endangered or protected species, M. dubia is a fruit tree widely cultivated in the Amazon region, currently not covered under any protection. DNA isolation. DNA was extracted from the dried leaves using a modified CTAB method [15]. Before the extraction, the dried leaves were homogenized by a crusher machine. Approximately 0.1 g of dried leaf was used. We then progressed to the second phase of the CTAB method [16]. Homogenized plant material was added to the tubes with 500 μl of CTAB buffer and 10 μl of mercaptoethanol. Samples were incubated for 30 min at 60˚C. Next, 500 μl of chloroform isoamylalcohol (24:1) was added, and samples were left at room temperature for 10 min. Then, they were centrifuged for 15 min at 9,000 rpm, and the supernatant was extracted into new tubes. Isopropanol (0.7-times the volume of the supernatant) was added to the supernatant and the tubes were left in the freezer for 1 h at -18˚C. Samples were then centrifuged for 15 min at 14,000 rpm, the supernatant was discarded, and 500 μl of 70% ethanol was added. The samples were centrifuged for 15 min at 14,000 rpm, the supernatant was removed, and the DNA pellet was dried at room temperature. The pellet was dissolved in 100 μl of TE buffer with 5 μl of RNase, the tubes were briefly mixed, and the content and purity of DNA were measured on a Nanodrop (Thermo Scientific) spectrophotometer.
Microsatellite analysis. In the GenBank database [17], several accession of M. dubia DNA sequences were identified [MDI003 (EX151484.1); MDI004 (EX151485.1); MDI006 (EX151486.1); MDI007 (EX151487.1); MDI009 (EX151488.1); MDI010 (EX151489.1); MDI015 Table 1. Location of collected Myrciaria dubia populations and summary of genetic diversities within 13 populations of M. dubia, based on seven microsatellite loci. The table is divided into two parts: Wild (natural populations of camu-camu) and Cultivated (domesticated populations of camu-camu). In the Wild population, the first name refers to the river in which the population is located; the second name is an oxbow lake of that river. In the PCR reaction of M2, 0.5 μl of each primer was added to the total volume 10.5 μl. PCR amplifications were performed with the Thermal Cycler T 100 (Bio-Rad, USA) with the following profile: 95˚C for 2 min; followed by 30 cycles of 95˚C for 1 min, either 51˚C (M1) or 57˚C (M2) for 90 s, 72˚C for 1 min; followed by a hold at 72˚C for 5 min. The PCR products were separated by electrophoresis in an ABI PRISM 3500 sequencer (Applied Biosystems, USA). A 1-μL aliquot of PCR product was mixed with 0.5 μL of GeneScan-500 LIZ (Applied Biosystems) and 12 μL of Hi-Di formamide (Applied Biosystems). Allele sizes were determined using GeneMarker version 2.4.0 (SoftGenetics, USA). A microsatellite locus was treated as missing data after two or more amplification failures. Null allele frequencies were calculated using the Brookfield 1 equation [18,19], and no null allele was presented.

Data analysis
To evaluate morphological data we used several statistical methods. Because the sampled trees of each population were selected randomly across the experimental orchard, and five leaves/ inflorescences/fruits were sampled per tree, and to avoid the strong effect of randomness in sampling morphologically highly variable parameters within one single tree, we used the linear mixed-effect model to assess differences between wild and cultivated populations.  [20].
To evaluate the genetic data, summary data for SSR loci, including the mean allelic richness (R S ) (here allelic richness is a metric that uses a rarefaction index to take into account differences in sample size) [21,22], and Weir & Cockerham's parameter f(F) (a measure of deviation from random mating within a population) [23] were calculated using FSTAT 1.2 [21]. Observed (H O ) and expected (H E ) heterozygosities were calculated using Arlequin [24], and deviation from the Hardy-Weinberg equilibrium was determined based on 10,000 permutations in FSTAT.
Euclidean distances among all samples were also employed to obtain an unweighted pair group method with an arithmetic mean (UPGMA) phenogram (calculated using Past- [25]).
In STRUCTURE software, the number of genetic clusters (K) was estimated, and individuals were fractionally assigned to the inferred clusters. We applied a model, which allows population admixture and correlated allele frequency [26]. Ten replicates for each K = 2-6 (the user-defined number of clusters) were set up to confirm the repeatability of the results. Each run comprised a burn-in period of 25,000 iterations, followed by 100,000 Markov chain Monte Carlo (MCMC) steps. The STRUCTURE output data was parsed using the Structure-sum script in R [27], mainly to determine the optimal K value following the method of Nordborg et al. [28] and Evanno et al. [29]. Alignment of cluster assignments across replicates analyses was then conducted in CLUMPP 1.1.2 [30] and subsequently visualized using DISTRUCT 1.1 [31].

Morphological diversity
Morphological data was calculated using Analysis of Variance for the linear mixed-effects model (ANOVA) ( Table 3) and Principal Component Analysis (PCA) (Fig 2). Using ANOVA, we detected no differences in the morphological data, except in the number of seeds. The number of seeds was higher in the wild populations. PCA analysis is congruent with ANOVA, and only confirmed the morphological similarity between wild and cultivated populations (Fig 2, S1 Table).

Genetic diversity
In total, 126 samples from 13 populations were analyzed using seven polymorphic SSR primers. All microsatellite loci were polymorphic, with 91 alleles identified ( Table 1). The average number of alleles per locus was 3.4 ± 1.6. The value of allelic richness ranged from 1 to 6.5 with an average of 3.0 ± 1.2. Observed heterozygosity (H o ) varied from 0.137 to 0.527 with an average of 0.357 ± 0.128. Expected heterozygosity (H E ) varied from 0.218 to 0.680 with an average of 0.512 ± 0.185. Heterozygote deficit was significant in the majority of populations with a high fixation index [f(F) = 0.304 ± 0.117] ( Table 1). Differences in allelic richness, fixation index, fixation index, and observed vs. expected heterozygosity were not significant between wild and cultivated populations (S2 Table,).
By UPGMA analysis, we found that the populations could be divided into two distinct groups and that these groups could each be further subdivided into two subgroups (Fig 3). The division of the cultivated populations is interesting. According to our findings, the Y1 population belongs to the group of populations from the Putumayo River. Whereas the Y2 and Y3 populations were assigned to the populations from the rivers Curaray (CU, Ct, CC), Tigre (TH), and Napo (NY). Populations from the rivers Napo (NN) and Itaya (IP) were surprisingly assigned to the group of populations from the Putumayo River, though geographically very distant.
The same division of populations was determined by PCoA (Principal Coordinate Analysis) (S1 Fig)    The genetic cluster (K = 2), depicted by blue color, is dominated by populations from the Curaray (CC, Ct, and CU), Tigre (TH), Itaya (IP), and Napo (NN, NY) rivers, while populations PC, Pc, and PM are from the Putumayo River (green color). The cultivated populations from Pucallpa (Y2 and Y3) were assigned to the first genetic cluster (the Curaray, Tigre, Napo and Itaya rivers) and population Y1 is genetically closely related to the populations from the Putumayo River (Fig 5).
When the populations were divided into three genetic clusters (K = 3), populations from the Putumayo River (PM, Pc, and PC) formed a separate genetic cluster. The rest of the populations were divided according to the river system into two groups (i.e. the Curaray River, with the prevailing cluster depicted in yellow, and the Napo, Tigre and Itaya rivers shown in green). The genetic composition of the cultivated populations showed that they were combined from only two areas (i.e., from the Putumayo River and the Napo, Tigre, or Itaya rivers) (Fig 5).
By division into five genetic clusters, we were able to recognize four geographically distinct groups of populations along different rivers, especially when populations occurring in the Napo, Tigre and Itaya rivers were further divided. The cultivated populations Y2 and Y3 were assigned to the population Itaya-Pelejo. Cultivated population Y1 remains part of the "Putumayo" group ( Fig 5).

Morphological diversity
The measured morphological characteristics showed low variability within and among populations. The wild and cultivated populations were not found to be significantly different, and we did not detect any differences among the populations. The only significant difference we detected was the number of seeds per fruit, where the cultivated populations had a higher number of seeds per fruit. These results clearly show a low degree of domestication of M. dubia in the Peruvian Amazon. It is evident those plants grown in plantations were neither selected nor bred, and that they possess quite similar phenotypical characteristics to wild plants. The first attempt at the cultivation of camu-camu in Peru was in 1980, and this trial was established near Iquitos [32]. Thus, the domestication process started only about four decades ago.
According to our results, the use of morphological descriptors for quantitative and qualitative traits of leaves had only little significance. Greater importance should be placed on the (CC, Ct, CU); Itaya (IP); Napo (NN, NY); Putumayo (PC, Pc, PM); Tigre (TH); Yarina cocha (Y1, Y2 and Y3) (for population locations, see Fig 1).
https://doi.org/10.1371/journal.pone.0179886.g003 detection of fruit characteristics such as diameter, weight, and number of seeds, the weight of the pulp, and vitamin C content, which was not included in this study. It seems that cultivated populations of camu-camu were not yet highly selected according to fruit morphological traits, and, seemingly, there remains potential to select individuals with larger fruits from wild populations.

Genetic diversity
We are aware of the limitation of our study as the sample size per population was relatively low (8-10 trees per population), so that the present results might be different if the sample size of the populations was increased. This is because the estimated genetic diversity indices are strongly dependent of the accurate estimate of gene frequencies within populations.
However, compared to our morphological characteristics findings, the microsatellite loci detected a high level of genetic diversity among and within populations. The expected heterozygosity was higher than the observed heterozygosity, and the fixation index showed high rates. Similar results were also obtained by Rojas et al. [14] and Koshikene [13], who also used microsatellite markers. In both cases, the values of expected and observed heterozygosity of wild populations were higher than in our study. For wild camu-camu populations in Brazil, Rojas et al. [14] found that the average value of expected and observed heterozygosity were 0.797 and 0.409 respectively. The average value of the fixation index was 0.377, and the average number of alleles across all loci was 12.7. Slightly lower values were determined by Koshikene [13] in Brazil. The lower values of heterozygosity and coefficient of inbreeding reached in our study can be explained by the smaller geographical area from which the samples were collected. In the two previous studies, the samples were collected in areas that covered the whole Amazonian region of Brazil. Despite the high diversity measured in our study (and the previous two studies), the fixation index was high. In all three studies, the results can be explained in several ways. Some possible explanations of high genetic diversity are the relatively large distances between populations, and that the populations located along main rivers are isolated from each other by upland tropical rainforest, which forms a barrier the species cannot cross. Migration of the species through the upland forest is less probable because the seeds are mainly dispersed by water and fish. Oxbow lakes, where trees grow naturally, are relatively small, and the populations are dense. According to Rojas et al. [14], outcrossing and low gene flow took place, resulting in a higher fixation index. The bottleneck effect might also account for our findings in some populations. During some unexpected events, the number of individuals in the population is reduced, so that many alleles are lost when the population is restored [33].
Franceschinelli et al. [34] performed similar research on Myrciaria floribunda, which is native to the wet forests of Central and South America. Allozyme markers were used to determine the genetic diversity. The fixation index was 0.153, which is lower than that reported here for M. dubia. This is likely because allozyme markers are less variable than SSR markers. Nevertheless, allozyme markers were still able to demonstrate higher values of homozygotes in the populations compared to our study. The higher fixation index was again explained by the bottleneck effect and low gene flow, and this confirms the results of our study.
In the dendrogram, it can be seen that populations were divided into groups depending on the river watershed where they were located. For example, in our case, population Napo-Núñez (NN) is located close to the conflux of the Napo and Amazon rivers. Population Itaya-Pelejo (IP) was located on the Itaya River, which also flows into the Amazon, and this conflux is not far from the conflux of the Napo and Amazon. During the rainy season, some fruits were probably naturally transported from the Itaya River to the tree populations of the Napo River, and genes were mixed. This can be a possible explanation why populations NN and IP belong to the same group, and why population NN is not connected with population NY from the same river. This theory was supported by Rojas et al. [14], who achieved similar results. STRUCTURE software divided the natural populations into two distinct groups. Populations PM, Pc, and PC make up the first group, lying on the river Putumayo. This river is geographically remote from other populations, and upland tropical rainforest complicates its connection with those populations. This is because the seeds of M. dubia are primarily distributed by water during floods, and pollen is not transmitted over longer distances. Populations from the second group are closer in proximity, and the rivers are often connected during floods and, thus, enable the transfer of genetic material.
The cultivated populations can also be divided into two groups according to their origin. The populations Y2 and Y3 originated in the catchment area of the rivers Curaray, Tigre, Napo, and Itaya, and were brought by farmers to the Pucallpa region. STRUCTURE software (K = 5) assigned the origins of these populations closer to Iquitos city, around the river Itaya. The Y1 population was assigned to the other group, and its origin lies in the area of the river Putumayo on the border between Peru and Colombia, flowing into Brazil. Further research will be necessary to confirm this hypothesis.
Genetic analysis revealed significant differences between wild populations. These observed differences in genetic diversity should be used for the preservation of wild genotypes in situ and ex situ, and the most promising genotypes should be stored in germplasm banks. For further studies focused on this species, we propose to also collect plant material from the peripheral areas of its occurrence (Ecuador, Bolivia, Venezuela, and Guyana). Plants from those marginal areas might be more resistant to unfavorable climatic conditions and could allow the cultivation of this species outside of the area of its natural occurrence. It is also recommendable to compare the variability of populations from the Peruvian Amazon and Brazil, to find populations with superior plants for further breeding. The fruit collection from wild populations leads to losses of genetic material because the excessive collection of the fruits reduces the amount of seeds, and thus might reduce the number of plants in the population. These losses might affect the structure of these populations in the future and cause its degradation. Because the demand for M. dubia continues to rise, selection and breeding can allow easier and more efficient cultivation, and thus prevent the destruction of the natural populations.

Conclusion
This study found a low morphological variability within and among wild and cultivated populations of M. dubia. From all detected characteristics, only fruit parameters had the tendency to be different. However, those could be possibly explained by different environmental conditions in which the populations were grown, or by the lack of collected data. After evaluation of the fruits, it was found that the cultivated populations chosen for our study have not yet passed through any process of domestication, and had more-or-less the same morphological characteristics as the wild varieties.
Seven of the eight microsatellite loci developed were polymorphic and showed a high level of genetic diversity within and among populations. Bayesian analyses divided the wild populations into two main groups (a group from the river Putumayo and a group from the rivers Curaray, Tigre, Napo, and Itaya), and were able to show us the origin of cultivated populations. For this reason, microsatellite primers developed within this study can be recommended for further population-genetic studies of M. dubia.
Our research showed that populations located in different watersheds also possessed a different genetic composition. The origin of each population assigned according to each watershed increased the genetic distance between the populations and, thus, overall genetic diversity. Genetic diversity among and within cultivated populations was also relatively high, thanks to the different origin and generative propagation of plants by the use of seeds. Transfer of seeds and seedlings by local inhabitants plays a crucial role in the preservation of this genetic diversity.
This diversification could be used in the future breeding of this species. The crossbreeding of the individual trees from different geographical locations that suffer from inbreeding, and are genetically different, could result in a heterosis effect, and the resulting hybrids will possess higher genetic diversity, which might be more suitable for growing in plantations.  Table. Quantitative morphological descriptors for the wild and cultivated populations of camu-camu. Including the number of samples (n), mean, median, and standard deviation (SD) of each characteristic. Red-marked data with an asterisk were found to be statistically significant at probability level p = 0.05. (DOC) S2 Table. Main coefficients of genetic diversity for wild and cultivated populations of camu-camu with two-sided p-values obtained after 10,000 permutations. Ho, observed heterozygosity; He, expected heterozygosity; F, fixation index; Fst, fixation index of a subpopulation relative to the total population. (DOC)