An integrative approach reveals five new species of highland papayas (Caricaceae, Vasconcellea) from northern Peru

The assignment of accurate species names is crucial, especially for those with confirmed agronomic potential such as highland papayas. The use of additional methodologies and data sets is recommended to establish well-supported boundaries among species of Vasconcellea. Accordingly, six chloroplast (trnL-trnF, rpl20-rps12, psbA-trnH intergenic spacers, matK and rbcL genes) and nuclear (ITS) markers were used to delimit species in the genus Vasconcellea using phylogeny and four DNA-based methods. Our results demonstrated congruence among different methodologies applied in this integrative study (i.e., morphology, multilocus phylogeny, genetic distance, coalescence methods). Genetic distance (ABGD, SPN), a coalescence method (BPP), and the multilocus phylogeny supported 22–25 different species of Vasconcellea, including the following five new species from northern Peru: V. badilloi sp. nov., V. carvalhoae sp. nov., V. chachapoyensis sp. nov., V. pentalobis sp. nov., and V. peruviensis sp. nov. Genetic markers that gave better resolution for distinguishing species were ITS and trnL-trnF. Phylogenetic diversity and DNA-species delimitation methods could be used to discover taxa within traditionally defined species.


Introduction
The family Caricaceae is composed of six genera containing 35 species that are distributed from southern Mexico to northern Chile [1,2]. Two of these genera, namely Carica L. and Horovitzia Badillo are monospecific. The former is considered the most economically important and is distributed in tropical and subtropical America [3], whereas the latter is endemic to Mexico [4]. Three additional genera are Cylicomorpha Urban, Jacaratia A. DC, and Jarilla Rusby. The first one has two species and is the only genus restricted to the African premontane wet forestst [2]. The second one comprises seven species distributed along South America [2]. The latter one encompasses three species distributed along Pacific Coast from northern Mexico to El Salvador [5]. The remaining genus, namely Vasconcellea Saint-Hilaire, is the largest one in this family encompassing 20 species and 1 hibrid (V. x heibornii) distributed mainly from Ecuador to Peru [3,6].

DNA sequencing and alignment preparation
Genomic DNA was extracted from leaf tissue using the NucleoSpin Plant II Kit (Macherey-Nagel, Düren, Germany) following the manufacturer's instructions. Six molecular markers were sequenced (ITS, matK, psbA-trnH, rbcL, rpl20-rps12, trnL-trnF). Each gene was amplified using polymerase chain reaction (PCR) with MasterMix (Promega, Wisconsin, USA) in the following reaction mixture: 10 ng of DNA and 0.25-0.5 pmol of forward and reverse primers for a total volume of 10 μl. The PCR protocols followed Bustamante et al. [17,27], and primer combinations are summarized in S1 Table. The sequences of the forward and reverse strands were determined commercially by Macrogen Inc. (Macrogen, Seoul, Korea). New generated sequences from the six markers were deposited in GenBank. These sequences and others obtained from GenBank (Table 2) were initially aligned with Muscle algorithms [28] and were adjusted manually with MEGA6 software [29].

Phylogenetic analyses
The phylogeny was based on concatenated data of the six molecular markers (65 sequences, Table 2). Selection of the best-fitting nucleotide substitution model was conducted using the program PartitionFinder [30] with six partitions (ITS, matK, psbA-trnH, rbcL, rpl20-rps12, trnL-trnF). The best partition strategy and model of sequence evolution were selected based on the Bayesian Information Criterion (BIC). The general time reversible nucleotide substitution model with a gamma distribution and a proportion of invariable sites (GTR + Γ + I) was selected for all partitions. Maximum likelihood (ML) analyses were conducted with the RAxML HPC-AVX program [31] implemented in the raxmlGUI 1.
To perform the GMYC delimitation method, an ultrametric tree was constructed in BEAST v.2.0.2 [40], relying on the uncorrelated lognormal relaxed clock, the GTR + Γ + I model, and a prior coalescent tree. Bayesian Markov chain Monte Carlo simulation was run for 50 million generations, and trees and parameters were sampled every 1000 generations. Log files were visualized in Tracer v.1.6 [34] for assessing the stationary state of parameters on the basis of the value of estimate-effective sample size (ESS). After removing 25% of trees as burn-in, the remaining trees were used to generate a single summarized tree in TreeAnnotator v.2.0.2 [40] as an input file for GMYC analyses. The GMYC analyses with a single threshold model were performed in R (R Development Core Team, http://www.R-project.org) under the 'splits' package using the 'gmyc' function (R-Forge, http://r-forge.r-project.org/projects/splits/).
To validate the outcomes of single locus species delimitation, a multilocus BPP was applied using the program BP&P v.2.0 [41,42]. The six markers were used as inputs for BPP under the A11 model (A11: species delimitation = 1, species tree = 1). Specimens were a priori assigned to species based only on the minimum number of species from the results of the phylogenetic analysis. Five variables (ε1~ε5) were automatically fine-tuned following the instructions of BP&P [41]. The prior distribution of θ could have influenced the posterior probabilities for different models [41]. Analyses were run with three different prior combinations [43]. Each analysis was run five times to confirm consistency between runs. Two independent MCMC analyses were run for 100,000 generations with the 'burn-in' = 20,000.

Nomenclature
The electronic version of this article in Portable Document Format (PDF) in a work with an ISSN or ISBN will represent a published work according to the International Code of Nomenclature for algae, fungi, and plants, and hence the new names contained in the electronic publication of a PLOS article are effectively published under that Code from the electronic edition alone, so there is no longer any need to provide printed copies.
In addition, new names contained in this work have been submitted to IPNI, from where they will be made available to the Global Names Index. The IPNI LSIDs can be resolved and the associated information viewed through any standard web browser by appending the LSID contained in this publication to the prefix http://ipni.org/. The online version of this work is archived and available from the following digital repositories: PubMed Central, LOCKSS.

Molecular phylogeny
In the phylogeny of Vasconcellea species, the analysed data matrix included a total of 5208 base pairs (bp) (723 bp for ITS, 1089 bp for matK, 736 bp for psbA-trnH, 1362 bp for rbcL, 802 bp for rpl20-prs12, and 496 bp for trnL-trnF) from 65 individuals. Phylogenetic trees obtained from the ML and BI analyses confirmed the monophyly of the genus Vasconcellea and its sister relationship to the genus Jacaratia (Fig 2). Our multilocus phylogeny also molecularly con-  distinguishing interspecific relationships among several species of Vasconcellea, indicating that these species share different common ancestors depending on the marker, which is understood under hybridization scenarios [44]. In addition, the genetic divergence comparisons of the six markers showed that there is not a minimum threshold (p-distance) for distinguishing genetic species in Vasconcellea (Table 3) confirming genetic discordance. For instance, V. microcarpa (Jacq.) A. DC. and V. omnilingua (V.M. Badillo) V.M. Badillo are identical when comparing matK, psbA-trnH, and trnL-trnF; but different when comparing ITS, rbcL, and rpl20-rps12 (Table 3).

Species delimitation
The species-delimitation methods based on genetic distance (ABGD, SPN) and coalescence (GMYC, BPP) showed incongruent results for the six genes (Fig 2, Table 4). Among these methods, the highest number of species was delimited by the BPP analyses (25), whereas the most conservative results were obtained from GMYC (16 ± 10) (S5 and S6 Figs, S2 Table). Moreover, similar species numbers resulted from the ABGD (22 ± 6) and SPN (23 ± 5) analyses. The additional species delimitation by ABGD, BPP, and SPN was mainly due to the split of the clades composed of V. pubescens/V. sprucei, V. stipulata, Vasconcellea sp. 1, Vasconcellea sp. 2, Vasconcellea sp. 3, Vasconcellea sp. 4, and Vasconcellea sp. 5 (Fig 2). However, the split of these clades was not supported by the posterior probabilities obtained from BPP analyses (S3 Table). Although there were incongruent results in species number among different methods, the genetic distance methods (ABGD, SPN) and the multi-locus coalescent species validation (BPP) showed similar species numbers with those obtained in the phylogenetic analyses (Fig 2, Table 4). Regarding the six molecular makers, the highest number of species was delimited for the spacer ITS (27 ± 4) and the intergenic trnL-trnF (27 ± 6), whereas the lowest numbers were obtained for the genes matK (16 ± 10) and rbcL (17 ± 4) and the intergenic rpl20-prs12 (12 ± 5) ( Table 4). These low species numbers were a consequence of the merging of several species that have similar sequences with null or very low genetic distance between these markers as a consequence of hybridization events (Table 3).
Vasconcellea carvalhoae D. Tineo & D.E. Bustam., sp. nov. (Fig 4) [ Diagnosis. Dioecious tree to 4 m tall that is very similar morphologically to V. sprucei/V. pubescens, but differing in the sister phylogenetic relationship with these species. The sequence divergence between V. carvalhoae and V. sprucei/V. pubescens is 0.36% for the ITS region.
Etymology. The specific epithet 'carvalhoae' honours Fernanda A. Carvalho for her valuable contributions to the understanding of the Caricaceae in the bioinformatics era.
Ecology and distribution. The species is known from the area around Pomacochas (54 9'45"S 77˚58'12"W) in the province of Bongará, Amazonas, Peru. It is found in the wild in montane areas at 2236 m elevation. Plants not cultivated. Remarks. Vasconcellea carvalhoae is highly similar in morphology to V. sprucei/V. pubescens, growing in sympatry. However, these species are distinguished by their elongated ovoid berry and lack of pubescence (S4 Table). Phylogenetically, V. carvalhoae is also a sister to the clade composed of V. sprucei/V. pubescens. They are genetically different species based on a 0.36% divergence in the ITS region. Diagnosis. Dioecious tree to 12 m tall having ovoid berries with acuminate apex as distinguishing features. Species very similar morphologically to V. weberbaueri, but differing in the larger inflorescence and the wider leaflets at the base and in the sister phylogenetic relationship with this species. The sequence divergence between V. chachapoyensis and V. weberbaueri is 1.1% for the ITS, 0.6% for matK, and 0.3% for trnL-trnF.
Description. Dioecious tree to 12 m tall ( Fig 5A); bark light brown, covered with prominent leaf scars; stipules absent. Latex white milky. Leaves membranaceus, alternate, crowded at top of tree, palmately compound, deeply lobe ( Fig 5B); petiole 50 to 70 cm long; leaflets 5 to 7, glabrous and bright green above, lighter green below with purple red stripes and stipules in the
Etymology. The specific epithet 'chachapoyensis' is derived from the province where the samples were collected.
Etymology. The specific epithet 'pentalobis' refers to the diagnostic feature of the central leaflets that is composed of five lobes (four laterals and one central lobe).
Etymology. The specific epithet 'peruviensis' is derived from the country where the samples were collected.
Morphologycally, the species of Vaconcellea reported from Peru can be distinguished by the following taxonomic key:

Discussion
The assignment of accurate names for species is crucial, especially for those with confirmed agronomic potential as highland papayas. The taxonomy of these species, which are members of the genus Vasconcellea, has been mostly based on morphological characters and multilocus phylogeny, including detailed species descriptions and precise distribution maps [4,5,7,45]. However, the use of additional methodologies and data sets is recommended to establish wellsupported boundaries among species [17,21]. Accordingly, six molecular markers have been used to delimit species in the genus Vasconcellea using phylogeny and four DNA-based methods. Although incongruence among some of these methods was observed in our analyses, genetic distance (ABGD, SPN), a coalescence method (BPP), and the multilocus phylogeny supported 22-25 different species in Vasconcellea, including five new species from northern Peru.

Integrative approach
Our six loci phylogeny resulted in topology incongruence mainly to single loci phylogenies. However, multilocus sequence data are pivotal for the establishment of robust species delimitations [46,47]. Incomplete lineage sorting, horizontal gene transfer, gene duplication and loss, hybridization, or recombination are probable explanations for this discordance [48]. Our data provided molecular evidence of hybridization, but natural processes such as introgression, chloroplast capture, selection, differentiation, mutations, and human selection might have all played a part generating evolving hybridizing species complexes in Vasconcellea [14,20,48,49]. According to our multilocus phylogeny, 24 species (including 5 new species) were molecularly confirmed in Vasconcellea and are distributed in two main lineages, although previous studies grouped them into 3 clades [9,44,50]. One of these lineages, labelled clade 1 by d'Eeckenbrugge et al. [44], is composed of six taxa, including three new species, V. badilloi, V. chachapoyensis, and V. pentalobis. The restricted distribution of this lineage confirms its endemism to southern Ecuador and northern Peru with a high level of introgression history and sympatry [44]. The other lineage is composed of 18 species, including two new species, V. carvalhoae and V. peruviensis. This lineage contained clades 2 and 3 of d'Eeckenbrugge et al. [44], which are composed of specimens from different taxa but belong to sympatric populations with high morphological diversity related to hybrid segregation and phenotypic plasticity [14]. Strikingly, these two evolutionary lineages in Vasconcellea are molecularly well distinguished clades that can be considered two different genera. However, the lack of additional diagnostic features suggests that further analyses (e.g., anatomical observations on the basis of ultrastructure of vegetative and reproductive tissues and chemotaxonomic evaluations) must be accomplished before recognizing them as separate genera.
Regarding the genetic distance methods, similar results to those from the multilocus phylogeny were obtained by ABGD and SPN when delimiting Vasconcellea species. The additional putative species identified with these methods mainly resulted from the split of V. chachapoyensis, V. pentalobis, and V. peruviensis with the markers ITS, psbA-trnH, and trnL-trnF. This might suggest that these species encompass cryptic lineages [17] as a consequence of the initial hybridization process, but these splits are not supported by the multilocus phylogeny and BPP analyses dismissing crypticism.
In the coalescent methods, the presence of gene flow due to the high hybridization levels in different species of Vasconcellea had negative impacts, particularly on the GMYC model [51]. The GMYC model usually produces false positives and complex false positives when delimiting different taxa that have low or high magnitudes of gene flow, respectively [52,53]. Nevertheless, the validation of BPP supports the status of the species recognized by the multilocus phylogeny (posterior probabilities, pp 0.61-0.99, S3 Table) and did not support those split or merged taxa by GMYC (pp lower than 0.29, S3 Table). Moreover, the additional species delimited by BPP is an unidentified Vasconcellea from Peru (IV17, IV18, IV20, IV21) which lacks of support to be considered a different entity. Therefore, sistership between this species and V. stipulata is not confirmed, suggesting that they might be conspecific. Additional specimens of this unidentified taxon should be sequenced to confirm its taxonomical status. The performance in empirical studies of the genetic and coalescent methods tends to undersplit and oversplit species, respectively [53][54][55][56]. However, our results suggest that ABGD and BPP are appropriate for determining diversity in Vasconcellea by recognizing those well-supported clades delimited by the multilocus phylogeny.
In our study, V. pentalobis and V. peruviensis were morphologically distinguished by their pentalobed central lobes and monoic inflorescence, respectively. Traits related to leaf (petiole length), female flowers (color, stigma shape), seeds (shape and texture), and fruits (length) slightly differentiated the other three new species. For instance, V. badilloi and V. chachapoyensis have divided stigmas and conical protuberances on seed surfaces, while V. carvalhoae has an entire stigma and rounded projections on seed surfaces. The absence of robust morphological distinction traditionally occurs in organisms that lack complex structures such as fungi or algae, but the high phenotypic plasticity and hybridization scenarios in Vasconcellea might explain this absence in these plants [9,44]. This morphological indistinctiveness among some Vasconcellea species was overcome by the application of molecular methods in plant taxonomy. In addition, our multilocus data, ABGD, and BPP analyses suggested conspecificity between V. goudotiana/V. sphaerocarpa as well as V. pubescens/V. sprucei. Therefore, V. sphaerocarpa (García-Barr. & Hern-Cam, 1958 in Badillo [7]), and V. sprucei (Badillo [57]) might be synonymized with V. goudotiana (Triana & Planch, 1873 in Badillo [7]) and V. pubescens (de Candolle [58]), respectively, on the basis of the principle of priority. However, further studies should delimit the relationships of those taxa including analyses of new material collected from type localities. Although several chloroplast and nuclear sequences have been used for assessing inter-and intraspecific relationships among species of Caricaceae [2,9,14,20], only ITS and trnL-trnF intergenic showed better resolution for distinguishing species based on phylogeny and species delimitation methods. This suggests that initial screening regarding the diversity of Vasconcellea should include amplification of these markers. The segregation of five new species confirmed that phylogenetic diversity and DNA-species delimitation methods could be used to discover taxa within traditionally defined species [15,17,59].

Conclusions
The use of an integrative approach to analyse diversity, including DNA-based delimitation methods, allowed the establishment of boundaries among species with morphological diversity, such as Vasconcellea, and thus provided support for the description of new taxa or validated the taxonomic uncertainty of other Vasconcellea members. Our results demonstrated that the congruence among different methodologies applied in this integrative study (i.e., morphology, multilocus phylogeny, genetic distance, coalescence methods) are more likely to prove reliably supported species boundaries. Therefore, ABGD, BPP, and multilocus phylogeny are pivotal when establishing species boundaries in Vasconcellea.