A comparative approach for species delimitation based on multiple methods of multi-locus DNA sequence analysis: A case study of the genus Giraffa (Mammalia, Cetartiodactyla)

Molecular data are now commonly used in taxonomy for delimiting cryptic species. In the case of giraffes, which were treated as a single species (Giraffa camelopardalis) during half of a century, several molecular studies have suggested a splitting into four to seven species, but the criteria applied for taxonomic delimitation were not fully described. In this study, we have analysed all multi-locus DNA sequences available for giraffes using multispecies coalescent (MSC: *BEAST, BPP and STACEY), population genetic (STRUCTURE, allelic networks, haplotype network and bootstrapping, haplowebs and conspecificity matrix) and phylogenetic (MrBayes, PhyML, SuperTRI) methods to identify the number of species. Our results show that depending on the method chosen, different taxonomic hypotheses, recognizing from two to six species, can be considered for the genus Giraffa. Our results confirm that MSC methods can lead to taxonomic over-splitting, as they delimit geographic structure rather than species. The 3-species hypothesis, which recognizes G. camelopardalis sensu strico A, G. giraffa, and G. tippelskirchi, is highly supported by phylogenetic analyses and also corroborated by most population genetic and MSC analyses. The three species show high levels of nucleotide divergence in both nuclear (0.35–0.51%) and mitochondrial sequences (3–4%), and they are characterised by 7 to 12 exclusive synapomorphies (ES) detected in nine of the 21 nuclear introns analysed for this study. By contrast, other putative species, such as G. peralta, G. reticulata, G. thornicrofti or G. tippelskirchi sensu stricto, do not exhibit any ES in the nuclear genes. A robust mito-nuclear conflict was found for the position and monophyly of G. giraffa and G. tippelskirchi, which is interpreted as the result of a mitochondrial introgression from Masai to southeastern giraffe during the Pleistocene and nuclear gene flow mediated by male dispersal between southern populations (subspecies G. g. giraffa and G. g. angolensis).

Introduction Biologically, speciation implies reproductive isolation through barriers preventing or limiting gene flow between populations [1]. Over the process of genetic differentiation, reproductively isolated populations may accumulate distinct phenotypic features that facilitate their recognition as different species. However, separated populations facing similar selective environments often converge phenotypically and show no visible differences (see Fišer et al. [2] for a review on cryptic species), which complicates their recognition as distinct species.
The development of DNA sequencing techniques over the last three decades allowed for studying species delimitation with molecular data in order to uncover cryptic taxa. Mitochondrial genes, and in particular the COX1 gene (cytochrome c oxidase subunit 1), have been intensively used for species delimitation [3,4]. However, numerous molecular studies have revealed that the mitochondrial tree may deviate from the species tree. Indeed, the maternal inheritance of the mtDNA genome can be misleading for species delimitation in mammals, because females and males have usually different dispersal behaviours (female philopatry versus male dispersal) [5,6], and because interspecific hybrid females are generally fertile, whereas hybrid males are often sterile (Haldane's rule), facilitating mitochondrial introgression between closely related species [7][8][9]. To overcome these limitations, most recent taxonomic studies dealing with the delimitation between cryptic mammal species have focused on multi-locus datasets [10][11][12], as the use of multiple independent DNA markers has been shown to provide a strong and reliable signal for deciphering relationships among closely related taxa [13,14]. However, interpreting the results from multi-locus datasets can be difficult, especially when the DNA markers show low genetic variation or conflicting relationships between them. These difficulties have led to the development of a plethora of new methodological approaches for multi-locus species delimitation [15,16], which may be subdivided into three categories: (1) phylogenetic methods, (2) multispecies coalescent (MSC) approaches, and (3) population genetic methods (Table 1). Phylogenetic methods were not originally developed for studying species delimitation, but the species monophyly criterion has been widely used since the origin of molecular taxonomy [17]. For multi-locus datasets, several phylogenetic approaches can be considered: the concatenation of all markers into a supermatrix (although this approach has been widely criticized [18]), the separate analyses of the markers, or more sophisticated methods, such as � BEAST [19] or Super-TRI [20]. Based on the coalescent theory, some authors have suggested that species can be delimited without monophyletic gene trees [21]. The incorporation of the coalescent model [22] in certain software (e.g. � BEAST [19], BPP [23] and STACEY [24]) enabled the inference of species limits from multi-locus data by accounting for incongruences among gene trees in the presence of incomplete lineage sorting [19]. MSC approaches often require prior assignments of samples to populations or taxa and are hence restricted to the validation of proposed delimitations [25]. Population genetic approaches are generally applied to detect "cryptic substructure" between groups showing very similar phenotypes. The program STRUCTURE [26] is probably the most popular approach for Bayesian clustering using multi-locus data. It has recently gained new interest as the clusters identified with STRUCTURE can be used as preliminary hypothesis for assigning individuals to populations or taxa, which represents the first step of most MSC analyses [27]. In addition, geographic clusters detected with STRUCTURE are often interpreted (perhaps wrongly [28]) as reproductively isolated populations, which may constitute a strong argument in favour of a division into several species (e.g., Brown et al. [29]). Allele-sharing methods, such as haplowebs [30] and conspecificity matrix (CM) [31], can be also used to detect reproductively isolated populations.
The systematics of giraffes is a controversial issue, since at least nine different hypotheses of species delimitation were proposed on the basis of morphological characters and, more recently, molecular data (see Table A in S1 Appendix). The existence of several giraffe species was first proposed by Geoffroy Saint-Hilaire [32], who noted that differences in coat pattern, horn shape and skull can be used to distinguish the Nubian giraffe (from the Sennaar region in Sudan) from the Southern giraffe (from the Cape region). Thomas [33] proposed another arrangement in two species, in which Nubian and Southern giraffes were assigned to Giraffa camelopardalis, whereas the reticulated giraffe was treated as a full species, Giraffa reticulata. Lydekker [34] shared this view, but recognized 12 subspecies in G. camelopardalis and two in G. reticulata. However, Dagg and Foster [35] indicated that phenotypic features are highly variable between and within populations, and recognized therefore a single species, G. camelopardalis. Subsequently, this point of view was accepted by most other taxonomists, despite persisting controversy regarding the number of subspecies [36,37]. However, the taxonomy of giraffes has been challenged by recent genetic studies: based on the analyses of mitochondrial sequences and 14 nuclear microsatellite loci, Brown et al. [29] proposed a minimum of six species, corresponding to Giraffa angolensis, G. giraffa, G. peralta, G. reticulata, G. rothschildi, and

MSC
Jones [24] Bayesian method implemented in BEAST 2 [65] for the inference of a "species or minimal clusters tree" (SMC) under the birth-death-collapse tree prior and without the requirement of a guide tree.  [38] and Winter et al. [12] suggested a division into four species, i.e., G. camelopardalis, G. giraffa, G. reticulata and G. tippelskirchi, based on multi-locus analyses of 7 and 21 nuclear introns, respectively. However, the four-species hypothesis proposed by Fennessy et al. [38] has previously elicited concerns and controversy (see Bercovitch et al. [39]).
In this study, we reanalysed all multi-locus data available for the nine giraffe subspecies (i.e., camelopardalis, angolensis, antiquorum, giraffa, peralta, reticulata, rothschildi, thornicrofti and tippelskirchi) using various phylogenetic (MrBayes, PhyML, SuperTRI), population genetic (STRUCTURE, allelic networks, haplotype network and bootstrapping, haplowebs and conspecificity matrix) and MSC ( � BEAST, BPP, STACEY) methods. Our intention was to provide sound scientific evidence about the number of species of Giraffa by comparing different methods currently used for molecular species delimitation that rely on different species concepts (Phylogenetic species concept [40], Genetic species concept [41] and Genealogical species concept [42]). Such a strategy is especially important for taxa of conservation concern like giraffes, since the application of a single species concept has been shown to "overlump" or "oversplit" species, which can entail negative consequences for conservation management [43,44].
Our five main goals were (1) to test if the different methods converge towards the same conclusion or if they support divergent taxonomic hypotheses, (2) to examine if one hypothesis is more supported by the analyses than the others (comparative approach of species delimitation), (3) to understand why some methods or models can lead to taxonomic over-splitting, (4) to know if available molecular data are sufficient to conclude on the number of species, and (5) to determine which data, methods and operational criteria are relevant for delimiting species with molecular data.

Nuclear and mitochondrial datasets used for the analyses
Seven giraffe datasets were generated for our analyses using the sequences available in the NCBI nucleotide database: 1. the mtDNA-G507 dataset, which contains a mitochondrial fragment covering the whole cytochrome b (Cytb) gene and the 5'part of the control region (length = 1742 nucleotides [nt]) for 507 individuals (listed in Table A  6. the nuDNA-G137O3 dataset, in which the nuDNA-G137 dataset was aligned to the three outgroup species mentioned above (length = 17276 nt); 7. the nuclear haplotype dataset, named nuDNA-GH274, which was inferred from the nuD-NA-G280 dataset in DNASP v5.0 [45] using the "Generate Haplotype Data File" option under non-consideration of gaps/ missing data and subsequent exclusion of outgroup taxa (length = 1362 nt; it contains only the sites found to be variable between haplotypes).
The datasets were aligned automatically using the MAFFT algorithm implemented in Geneious R10 (Biomatters, Auckland, New Zealand) and subsequently verified by eye. All alignments generated for this study were deposited in Open Science Framework at https://osf.io/9wv86/.

Phylogenetic analyses
The mtDNA-GH82O3 and nuDNA-G137O3 datasets were analysed with probabilistic methods. Bayesian inferences were conducted in MrBayes v3.2.6 [46] by calculating the posterior probabilities (PP) after 10 7 Metropolis-coupled MCMC generations with tree sampling every 1000 generations and a burn-in of 25%. Maximum Likelihood (ML) analyses were performed with PhyML v3.1 [47] and Bootstrap percentages (BP) were calculated after 1000 replicates. The GTR+I+G substitution model was applied for both methods, as suggested by the Likelihood calculations in jModeltest [48] based on the Akaike information criterion [49].
Bayesian analyses were also performed for each of the 21 introns using the model of DNA substitution selected under jModeltest ( Table 2).

SuperTRI analyses
The lists of bipartitions obtained from the Bayesian analyses (.parts and .tstat files) for each nuclear marker were transformed into a weighted binary matrix (MRP, matrix representation with parsimony) for supertree construction using SuperTRI v57 [20]. Here, each binary character corresponds to a node, which was weighted according to its frequency in one of the 21 lists of bipartition. Thereby, the SuperTRI method accounts for principal as well as secondary signals, given that all phylogenetic hypotheses found during the Bayesian analyses are represented in the weighted binary matrix used for supertree construction. The reliability of the nodes was assessed using three measures: supertree bootstrap percentages (SBPs) were obtained from PAUP � v4b10 [50] after 1000 BP replicates of the MRP matrix of 24749 binary characters generated by SuperTRI v57; mean posterior probabilities (MPP) and reproducibility indices (Rep) were directly calculated on SuperTRI v57. In the nuclear tree (Fig 1A), we chose to indicate the number of markers supporting each node of interest (NRep) rather than the Rep value, which represents the ratio of the number of markers supporting the node to the total number of markers [20].

STRUCTURE analyses
Giraffe haplotypes were reconstructed from the nuDNA-G137O3 dataset for each of the 21 introns by applying the PHASE v2.1 algorithm implemented in the software DNASP v5.0 [45], allowing for recombination and reducing the output probability threshold of conserved regions (CT) from 0.9 by default to 0.6.
Bayesian analyses of genetic admixture were run in STRUCTURE v.2.3.4 [26] to identify genetically homogeneous groups of individuals (populations of origin, K). The haplotype information deduced by PHASE for each of the 21 introns was used to code individuals sharing the same genotype with a unique integer in the input file [51]. The analyses were done as recommended by Gilbert et al. [52], i.e., number of MCMC generations = 200 000 and burnin = 100 000 generations for K = 1-10 clusters. We applied several combinations of ancestry model, allele frequency and supporting information (Popdata) like the assignment of the subspecies (population identity/ POPID) or sampling location (LOCPRIOR model) for each individual. We tested two ancestry models, since we do not know whether studied populations were discrete or had an admixed ancestry. Moreover, the identification of the most probable number of clusters (K) might be further affected by the choice of the allele frequency model. By default, the software assumes correlated allele frequency among populations caused by migration and shared ancestry [53]. Since past admixture was expected between giraffe populations, this model may represent the appropriate choice. However, several runs were conducted under the independent allele frequency model, as it might be more powerful to detect highly distinct populations [54].
We also tested two settings for lambda (λ), the parameter specifying the distribution of allelic frequencies in each population: the default setting (λ = 1) and an estimated value of λ (λ = 0.45), calculated during a run comprising 20 iterations for K = 1. Runs were performed  G. tippelskirchi (subspecies tippelskirchi and thornicrofti); G: G. giraffa; Th: G. thornicrofti; "-": not found; " � ": outgroups excluded. https://doi.org/10.1371/journal.pone.0217956.t002 Species delimitation within the genus Giraffa based on multi-locus DNA sequences  Table B in S2 Appendix), as this option is recommended when only a weak signal is present in the markers [55]. All analyses were replicated 20 times. The most likely number of distinct groups for each run was identified by means of STRUC-TURE HARVESTER [56]. Thereby, the optimal K was determined using two approaches: (1) the ΔK method of Evanno et al. [57], which recognizes the most likely number of distinct clusters by the largest ΔK value, calculated by the rate of change in the log probability of data between successive K values; and (2) the "plateau "method of Pritchard et al. [51], where the log probability of the data (ln Pr (X|K) was plotted against a range of K values, and the optimal K was selected as the point at which the plot curvature plateaus. A regression curve and gridlines were added to the diagrams generated by STRUCTURE HARVESTER to help in determining the point of plateau.
To assess the reliability of the results, CLUMPAK [58] was used to display the barplots from K = 1 to 10 for each of the 20 iterations by means of the implemented software DIS-TRUCT [59].

Analyses of nuclear haplotypes
The nuDNA-GH274 dataset (nuclear haplotypes inferred from the concatenated matrix of 21 introns and 137 giraffes) was used to construct a median-joining network [60] using PopART v1.7 [61]. The robustness of haplotype clusters was evaluated by bootstrapping (1000 replicates) under PAUP � v4b10 [50] using either the Maximum Parsimony (MP) method (heuristic search, faststep option) or the Neighbor-Joining (NJ) method (GTR+I+G model), and under the Maximum Likelihood (ML) criterion using RAxML on CIPRES [62] (http://www.phylo. org). PopART was also used to construct a median-joining network for each of the 21 introns. For six introns (i.e. CTAGE5, NUP155, OTOF, PLCE1, RASSF4 and SOS1), individuals with missing allele(s) were not considered in the analysis in order to avoid any distortion of the results.
Allele sharing among individuals was investigated for each of the 21 introns using the haploweb approach [30] by building raw haplowebs with the online tool HaplowebMaker (https:// eeg-ebe.github.io/HaplowebMaker/). Haplowebs were constructed without the consideration of singletons (alleles encountered only once in the whole dataset, "remove singletons" option) and default settings for the other parameters. In the networks, curves are illustrating alleles found co-occurring in heterozygous individuals. Thereby, a group of alleles linked together by heterozygotes represents an exclusive allele pool, the corresponding groups of individuals is called a field for recombination (FFR) [63]. Based on the allele-sharing information per marker provided by HaplowebMaker, a conspecificity matrix [31] was built using the online tool CoMa (https://eeg197ebe.github.io/CoMa/) with no calculation on heterospecific pairs (option 1), i.e. no value was given to absence of sharing. In this matrix, the conspecificity score for node recovered with significant support in the Bayesian analysis (PP � 0.9), as well as for other nodes discussed in the text, the two values above indicate the Posterior Probability with MrBayes (PP) and the Bootstrap Percentage obtained from the Maximum Likelihood analysis (BP). The three values below were obtained from the SuperTRI analyses of the 21 introns: from left to right: Supertree Bootstrap Percentage (SBP), Mean Posterior Probability (MPP) and the number of markers supporting the node (NRep). The symbol ''-" indicates that the node was not found monophyletic in the analysis, and the letter ''X" indicates that an alternative hypothesis was supported by SBP > 50. The exclusive synapomorphies (including indels; i: insertion; d: deletion), representing fixed substitutions among members of a group, are listed for the nodes discussed in the text. (B) Bayesian tree of the 82 mitochondrial haplotypes detected for Giraffa reconstructed from a fragment covering the complete Cytb gene and the 5' part of the control region (1776 characters) and rooted with Bos, Ovis and Okapia (not shown). For each node supported by PP � 0.95, the BP value obtained from the Maximum Likelihood analysis is indicated. Fixed substitutions among members of a group (exclusive synapomorphies) are listed for the nodes supported by PP � 0.95 and for uncommon mitochondrial haplotypes. https://doi.org/10.1371/journal.pone.0217956.g001 Species delimitation within the genus Giraffa based on multi-locus DNA sequences PLOS ONE | https://doi.org/10.1371/journal.pone.0217956 February 13, 2020 each pair of individuals is the number of markers for which these individuals belong to the same FFR. The conspecificity matrix was illustrated as a heatmap using the hierarchical clustering method implemented in the BioVinci data visualization software (BioTuring, San Diego, CA) and the UPGMA method for the associated dendrogram.
We estimated the species-tree phylogeny using the coalescent algorithm implemented in BEAST v.2.4.4 [65] in order to consider an alternative to the traditional concatenated phylogenetic approach (see Kubatko and Degnan [66] for caveats concerning concatenation). Inferences were based on the nuDNA-G274O6 dataset using an a priori assignment at the level of individuals, i.e. by assigning for each of the 137 giraffes two alleles for the 21 introns. We assumed an uncorrelated lognormal molecular clock for all 21 loci. For each marker, we selected the best suited substitution model inferred in jModeltest [48] (Table 2). Analyses were run with 2x 10 8 generations, with trees sampled every 5000 steps. The .log files were analysed with Tracer v1.7 [67] to assess the convergence of model parameters (effective sample size [ESS] > 200). The species tree was summarized as a Maximum Clade Credibility tree in TreeAnnotator v.1.10 [68] after discarding 25% as burn-in.
The nuDNA-G274 and nuDNA-G274O6 datasets were further used for species delimitation analyses using the STACEY template implemented in BEAST v2.4.4. STACEY represents an improvement to the DISSECT model [69] to infer a "species or minimal clusters tree" (SMC) under the birth-death-collapse tree prior and without the requirement of a guide tree. The tips of the SMC tree represent minimal clusters of individuals that may be collapsed to a single putative species, if branches are shorter than a specified length (collapse height) [24]. A first run was conducted without taxonomic a priori assumptions by assigning two alleles per gene and individual. For the other run, each individual was assigned to one of the six taxa (6S hypothesis) that were found monophyletic with at least one of our phylogenetic analyses: G. camelopardalis sensu stricto C (including only the three subspecies camelopardalis, antiquorum and rothschildi), G. peralta, G. reticulata, G. giraffa (including the two subspecies angolensis and giraffa), G. tippelskirchi sensu stricto and G. thornicrofti. Analyses were done as suggested in the manual, i.e. using a relative death rate of 0.0 for the tree prior, a lognormal distribution with a mean of 4.6 and a standard deviation of 2 to the growth rate prior and a uniform distribution for the relative death rate prior with a lower bound of -0.5 and an upper bound of 0.5. The dataset was partitioned by the 21 genes, with independent strict clock models and individual assignment of the best suited substitution model to each gene ( Table 2). Each analysis was run for 2.5 x 10 8 generations and convergence of parameters was assessed in Tracer v1.7 [67]. Subsequently, the most supported number of distinct clusters was estimated using SpeciesDelimitationAnalyser v1.8.0 [24] by analysing the species trees with a burn-in of 25% and the default collapse height of 0.0001.
Species delimitation analyses with BPP v3.2 were based on a reduced dataset comprising only 66 giraffes due to software limitation. Seventy-one individuals were excluded from the original dataset using the three following criteria: (1) 14 individuals with missing data, (2) 39 individuals sharing the same haplotype and (3) 18 individuals characterized by a long terminal branch in the Bayesian tree. We analysed the support for five taxonomic hypotheses: two species, with two possible geographic patterns (2Sa and 2Sb hypotheses), three species (3S hypothesis), i.e. G. camelopardalis sensu stricto A, G. giraffa and G. tippelskirchi, four species (4S hypothesis), i.e. G. camelopardalis sensu stricto B, G. giraffa, G. reticulata, and G. tippelskirchi, and five species (5S hypothesis), i.e. G. camelopardalis sensu stricto C, G. giraffa, G. peralta, G. reticulata, and G. tippelskirchi (see results for more details). First, we applied the A00 algorithm, the simple MSC model with the species tree fixed to explain the acceptance proportions of MCMC moves [23] under the default gamma prior values G (2,2000) for the tau (τ, root divergence time) and theta (θ, genetic difference among taxa). Then, we assessed the support for each putative species using the A11 algorithm [64]. The three species model priors (SMP 1, 2 and 3) were tested. The analyses were run for 500 000 generations followed by a burn-in of 10%. Convergence between runs was checked for fine tune acceptance proportions between 0.15 and 0.7, as well as ESS > 200.

Nuclear and mitochondrial pairwise distances
The nuDNA-G137 dataset (16966 nt) and mtDNA-GH82 dataset (1742 nt) were used to calculate pairwise distances in PAUP � v4b10 [50] (see Tables A and B in S4 Appendix). For the nuclear dataset, we performed calculations considering the five taxonomic hypotheses mentioned above. For the mtDNA dataset, we primarily performed calculations based on the three main mitochondrial haplogroups, named N, E and S depicted in Fig 1B, but considered also the five possible hypotheses of species delimitation described in the Multispecies coalescent analyses section. The results of separate analyses of the 21 introns showed that none of them supports the monophyly of G. camelopardalis s.s. B and that G. reticulata is found monophyletic only for ACP5, but with insignificant support (PP = 0.03). By contrast, G. tippelskirchi is independently supported by four genes: COL5A2 (PP = 0.96), CTAGE5 (PP = 0.64), RFC5 (PP = 0.75) and UBN2 (PP = 1); and all individuals of this taxon share seven molecular signatures in the UBN2 gene (Fig 1A). The taxa corresponding to G. camelopardalis s.s. A and G. giraffa are the most robust and reliable nodes within Giraffa (Fig 1A, Fig Y in

SuperTRI analyses
The SuperTRI analyses of the 21 introns are highly informative for relationships within Giraffa (S2 Appendix). Indeed, only six nodes are supported by MPP > 0.1 and NRep � 2 ( Fig 1A):

STRUCTURE analyses
Our Bayesian population structure analyses were carried out on alleles inferred for 21 introns and 137 giraffes (0.5% of missing data). We tested different models (admixture versus no admixture, independent versus correlated allele frequency), with and without supporting priors on the subspecies (POPID) or on the geographic origins of the individuals (LOCPRIOR), as well as two values of lambda, fixed (λ = 1) or estimated (λ = 0.45) ( Table 3). For each run, the most likely number of distinct groups (K) was determined using both ΔK and "plateau" methods [57,51].
Using the ΔK method of Evanno et al. [57], 58% of the STRUCTURE analyses (14 / 24) resulted in the highest ΔK value for the separation into two clusters (K) corresponding to a North/ South dichotomy and the comparisons between DISTRUCT barplots indicated differences in the affiliation of both tippelskirchi and thornicrofti giraffes to either the northern or the southern group (Table 3; 2Sa and 2Sb hypotheses). The highest ΔK value for three distinct clusters was obtained for 25% (6 / 24) of the analyses (Table 3), supporting the 3S hypothesis. Finally, the separation into four K clusters was supported by four analyses (17%, Table 3).
Using the "plateau" method of [51], we found that K = 3 is the most probable number of clusters for 12 STRUCTURE HARVESTER diagrams (50% of the 24 analyses), whereas the highest support for four clusters could only be found in 8% of the analyses (2 / 24) (S3 Appendix). For other diagrams, it was difficult to determine at which K the plateau is reached: for 29% of the analyses (7 / 24), it was not possible to choose between K = 3 and 4; for 8% of the analyses (2 / 24) it was not possible to choose between K = 2 or 3; and for 4% of the analyses (1 / 24) it was not possible to choose between K = 2, 3 or 4. We detected incomplete clustering (i.e., 1-3 "foreign" alleles in the cluster, or less than three alleles not included into the cluster) for the following taxa: G. camelopardalis s.s. A (ACP5: one thornicrofti allele; DDX1 and RFC5: two alleles outside); G. giraffa (DDX1: two reticulata alleles; NOTCH2: one tippelskirchi allele; SOS1: two alleles outside; USP54: two thornicrofti alleles); G. tippelskirchi (ACP5: one allele outside; SOS1: two giraffa alleles); G. camelopardalis s.s. B (USP54: three reticulata alleles); and G. reticulata (ACP5 and USP54: three alleles outside). The patterns found for the six other introns (CCT2, MACF1, NUP155, OTOF, PLCE1, RASSF4) do not fit any of the tested taxonomic hypotheses.

Analyses of nuclear haplotypes
The haplowebs constructed for each of the 21 nuclear introns are shown in Fig 4. For the six introns CCT2, MACF1, NUP155, OTOF, PLCE1, and RASSF4, the co-occurrence of alleles Table 3  Species delimitation within the genus Giraffa based on multi-locus DNA sequences Species delimitation within the genus Giraffa based on multi-locus DNA sequences The conspecificity matrix built from the conspecificity scores obtained through the construction of the haploweb for each of the 21 introns (Fig 4; Tables V-Ap in S6 Appendix) is shown in Fig 5. Here, the number of independent markers supporting the hypothesis of the conspecificity for a respective pair of individuals is visualized by various nuances of red: the highest score (21 out of 21) is shown in dark red, whereas the lowest score (0 of 21) is illustrated in white. The conspecificity matrix depicts three dark red rectangles corresponding to the three species G. camelopardalis s.s. A, G. giraffa and G. tippelskirchi (3S hypothesis).

Multispecies coalescent analyses
We constructed a MSC species-tree from the nuDNA-G274-O6 dataset using � BEAST. The topology is similar to the supermatrix topology of Fig 1A, with maximal support (PP = 1) for G. camelopardalis s.s. A, G. giraffa and G. tippelskirchi. However, the monophyly of G. camelopardalis s.s. B, G. camelopardalis s.s. C and four subspecies (antiquorum, peralta, reticulata, and thornicrofti) was also highly supported (PP = 1) in the MSC tree (Fig A in S5 Appendix). The subspecies tippelskirchi was found monophyletic, but with low PP support (= 0.39).
The analyses based on STACEY showed highest support for five distinct giraffe species, i.e., G. camelopardalis s.s. C, G. giraffa, G. peralta, G. reticulata and G. tippelskirchi, a pattern found in 87% of the trees. Other hypotheses of species delimitation were less supported: the 4S hypothesis (G. camelopardalis s.s. B, G. giraffa, G. reticulata and G. tippelskirchi) was found in 7% of the trees; whereas the 6S hypothesis, which recognizes G. camelopardalis s.s. C, G. giraffa, G. peralta, G. reticulata, G. tippelskirchi sensu stricto, and G. thornicrofti, was found in 6% of the trees. Similar results were obtained when outgroup sequences were excluded (data not shown).
Species delimitation analyses based on BPP provided maximal support (PP = 1) for all species recognized according to the 3S, 4S, and 5S hypotheses (depicted in Fig 6). The same results were found with the three species model priors ( Table A in

Phylogenetic analyses of the mitochondrial fragment
The Bayesian tree reconstructed from the mtDNA-GH82O3 dataset (1776 nt) is shown in Fig  1B. It shows the existence of three major geographic haplogroups: northern (N), eastern + southeastern (E), and southwestern (S) giraffes.
The N haplogroup is supported by both Bayesian and bootstrap analyses (PP = 1; BP = 93). It includes all haplotypes detected for G. camelopardalis s.s. A, as well as one divergent recognized are distinguished by different colours. Individuals characterized by a rare allele (in the subspecies) are highlighted with a black frame. The numbers of mutations between alleles are indicated on the branches. https://doi.org/10.1371/journal.pone.0217956.g003 Species delimitation within the genus Giraffa based on multi-locus DNA sequences haplotype of G. tippelskirchi (TIP15, EU088334) sequenced by Brown et al. [29] for nine individuals from Kenya (Athi River Ranch) (see details in Table A   Species delimitation within the genus Giraffa based on multi-locus DNA sequences BP = 99) and rothschildi (PP = 1; BP = 83). The subspecies camelopardalis is found polyphyletic. The reticulated giraffes constitute a polyphyletic assemblage: although most of them are grouped together (PP = 1; BP = 92) as the sister group of the divergent haplotype TIP15 (EU088334) of G. tippelskirchi (PP = 1; BP = 94), the haplotype RET8 sequenced by Fennessy et al. [38] is closely related to rothschildi (PP = 0.89; BP = 46), and the haplotype RET9 (EU088321) sequenced by Brown et al. [29] appears as the sister group of all other northern haplotypes.
The E haplogroup comprises giraffes from eastern and southeastern Africa (PP = 0.99; BP = 87). It contains members of two putative species, G. tippelskirchi and G. giraffa, and can be further divided into three subgroups corresponding to "Masai I", "Masai II", and the subspecies giraffa. The interrelationships between the three subgroups are unresolved. The Masai I subgroup (PP = 1; BP = 95) contains Masai giraffes (subspecies tippelskirchi) from Kenya and Tanzania. The Masai II subgroup (PP = 1; BP = 89) includes Masai giraffes (subspecies tippelskirchi) from Kenya and Tanzania, as well as giraffes of the subspecies thornicrofti from northern Zambia (Luangwa Valley National Park). The third subgroup represents the subspecies giraffa (PP = 1; BP = 99) and includes giraffes from southern Zambia, northern Botswana, northeastern Namibia, Zimbabwe and South Africa.
The S haplogroup contains exclusively individuals of the subspecies angolensis from Namibia and central Botswana. Its monophyly is less supported than the two other mitochondrial haplogroups (PP = 0.37; BP = 60). Our analyses provide a moderate support (PP = 0.94; BP = 65) for an early divergence of the S haplogroup.

Nuclear and mitochondrial pairwise distances
The alignment of 21 nuclear introns was used to calculate pairwise distances between giraffes (Fig 6 and Table B in S4 Appendix). The results show that the mean distance between G. camelopardalis s.s. B and G. reticulata is 0.14% and the mean distance between G. camelopardalis s.s. C and G. peralta is 0.07%, which is significantly smaller than other interspecific distances involving G. camelopardalis s.s. A, G. giraffa and G. tippelskirchi (comprised between 0.35 and 0.51%).
For the mtDNA alignment, we calculated pairwise distances between 82 haplotypes. Three haplotypes (TIP15, RET8 and RET9) were excluded from the analysis due to their grouping outside of their assigned taxon in the phylogenetic tree ( Fig 1B). The distances between the haplogroups identified in Fig 1B are summarized in Table A in S4 Appendix and Fig 6. There are three major haplogroups: haplogroup N = northern (= G. camelopardalis s.s. A); haplogroup E = Masai I, Masai II, and southeastern (= subspecies giraffa); and haplogroup S = southwestern (= subspecies angolensis). The mean distances between these three haplogroups are comprised between 3.07 and 4.16%. Within haplogroup N, the distances between G. camelopardalis s.s. B and reticulata range from 1.29% (ROTH3 versus RET3) to 2.19% (PER2 versus RET13). Within haplogroup E, we found similar distances between Masai I, Masai II and southeastern haplotypes, i.e., between 1.17% (TIP1 versus GFA7) and 2.12% (TIP5 versus GFA9). Within haplogroup S, the distances range from 0 to 0.96% (ANG12 versus ANG16).

Population genetic analyses support the 3S hypothesis
The assessment of population genetic structure has become indispensable in evolutionary biology and conservation to reveal hidden biodiversity. Among freely accessible software provided for this task, STRUCTURE [26] is the most commonly used program, with 17473 citations in Fig 6. The five molecular hypotheses for giraffe taxonomy. The five taxonomic hypotheses that received some support from our analyses on giraffes show the existence of two species, with two possible geographic patterns (2Sa and 2Sb hypotheses), three species (3S hypothesis), i.e. G. camelopardalis sensu stricto A, G. giraffa and G. tippelskirchi, four species (4S hypothesis), i.e. G. camelopardalis sensu stricto B, G. giraffa, G. reticulata, and G. tippelskirchi, or five species (5S hypothesis), i.e. G. camelopardalis sensu stricto C, G. giraffa, G. peralta, G. reticulata, and G. tippelskirchi. In the first column are drawn the geographic distributions of giraffe species for each of the five taxonomic hypotheses. In the second column are summarized the results obtained from STRUCTURE analyses. Barplots were illustrated with DISTRUCT (1 = peralta, 2 = antiquorum, 3 = camelopardalis, 4 = rothschildi, 5 = reticulata, 6 = tippelskirchi, 7 = thornicrofti, 8 = giraffa, 9 = angolensis) and number of analyses supporting each taxonomic hypothesis (in total 24, see Table 3) is indicated beneath barplots. In the third column are illustrated the results obtained in the different haplotype analyses, including the network analysis (Y = yes, the species represents a cluster; N = no, the species is not found as a cluster), the bootstrap values obtained with the phylogenetic analyses based on the Maximum Parsimony (MP), Distance (NJ) and Maximum Likelihood (ML) criterion ("X": support < 50) and the conspecificity matrix (CoMa) (Y = yes, the species is supported by the analysis; N = no, the species is not supported by the analysis). In the fourth column are shown the support values provided by the three Multispecies coalescent (MSC) methods, i.e. BPP, STACEY and � BEAST. In the fifth column are listed the results Web of Science (January 2019). Using Bayesian inference, STRUCTURE is a model-based clustering method to detect population structure and assign individuals to K populations [26]. However, many published results based on STRUCTURE are not reproducible because the genotypes were not available or the parameters used for the analyses were not fully detailed by the authors [52,70].
The program STRUCTURE was previously used to infer genetic structure in giraffe populations, using either genotypes from 14 microsatellite loci of 381 individuals [29] or phased alleles of seven introns for 105 giraffes [38] or rather the extended dataset of 21 introns for 137 individuals [12]. Brown et al. [29] suggested the existence of at least six species, but the optimal K was not determined using either the method of Evanno et al. [57] or that of Pritchard et al. [51], and their results are not reproducible, because the microsatellite data were not made available. According to Winter et al. [12], "K = 4 shows four well resolved groups and is supported as best fitting number of clusters by several statistical methods", but they did not provide any details on the model and method used for their STRUCTURE analyses. Using the same dataset, comprising allelic information of 21 nuclear introns for 137 giraffes, we tested 16 different models under STRUCTURE in order to shed more light on giraffe population structure. Considering the method of Evanno et al. [57], 58% of the analyses provided support for two distinct populations of origin (K = 2), 25% for three distinct clusters (K = 3), and only 17% confirmed the result obtained by Winter et al. [12], i.e. K = 4.
The selection of the appropriate K using the method of Pritchard et al. [51] partly confirmed previously mentioned difficulties to determine the point of plateau [53,57]. We clearly recognize K = 3 as the optimal clustering for 50% of the analyses. For other analyses, it was difficult to identify at which K the plateau is reached (K = 2 or 3?; K = 2, 3 or 4?; K = 3 or 4?; K = 3, 4 or 5?; Table 3, see Figs A-X in S3 Appendix).
Selecting the best suitable model for STRUCTURE is far from simple, especially for taxa with a wide distribution range like giraffes. The choice of an admixture model with correlated allele frequency seems appropriate for populations of East Africa, where hybrids between individuals from divergent populations were previously described (see below). However, such a model may be more questionable for isolated populations, such as the subspecies peralta. In order to better estimate the optimal value of K under STRUCTURE, we recommend therefore for future users of the program to test different combinations of model parameters, to estimate the value of λ, and to make comparison between optimal K estimated with either the ΔK method [57] or the "plateau" method [51]. Using this approach and taking into account that the ΔK method can be biased towards K = 2 [70] and that the smallest value of K is preferred when several values of K give similar estimates of log Pr (X | K) [51], we concluded that K = 3 is the most likely hypothesis for 88% of the analyses (highlighted in grey in Table 3).
Our network and bootstrap analyses of the 274 nuclear giraffe haplotypes (21 introns, 137 giraffes), the networks and haplowebs of the 21 introns, as well as the conspecificity matrix also highly support a division into three divergent haplogroups, representing the three species G. camelopardalis s.s. A, G. giraffa, and G. tippelskirchi (Figs 2, 3, 4 and 5). obtained in the phylogenetic analyses including the markers supporting each taxonomic hypothesis in the separate analyses of 21 introns and mtDNA, as well as the support values obtained from supermatrix (PP = posterior probability; BP = bootstrap percentage) and SuperTRI analyses (SBP = SuperTri Bootstrap Percentage; MPP = Mean Posterior Probability; Rep = Reproducibility Index) ("-": not found). In the sixth column are detailed the mean pairwise distances between individuals of the same taxon calculated using either nuDNA data (concatenation of 21 introns, above) or mtDNA (below) (Since all the mitochondrial sequences of the subspecies giraffa belong to haplogroup E, they were considered as tippelskirchi for distance comparisons; see Discussion for more details on mtDNA introgression). In the seventh column are shown the distribution maps of bovid genera with a similar geographic pattern of speciation. https://doi.org/10.1371/journal.pone.0217956.g006 Species delimitation within the genus Giraffa based on multi-locus DNA sequences

Phylogenetic analyses support the 3S hypothesis
In the nuclear tree reconstructed from the concatenation of 21 introns (Fig 2), four putative species were found to be monophyletic: G. giraffa, G. tippelskirchi, G. camelopardalis s.s. A and G. reticulata. However, the two latter mentioned taxa obtained weak ML bootstrap support (BP = 69 and 81, respectively). To further investigate phylogenetic relationships, we conducted separate Bayesian analyses for all markers and summarized the results with the SuperTRI method [20]. Within Giraffa, the analyses showed that only four nodes can be considered as reliable (SBP = 100; MPP > 0.15; Nrep > 4): G. camelopardalis s.s. A (grouping together northern and reticulated giraffes), G. giraffa (southern giraffes), G. tippelskirchi (southeastern giraffes), and G. giraffa + G. tippelskirchi (Fig 2). All these nodes are supported by the separate analyses of several independent introns (between four and seven), which explain why MPP values are significantly higher than for all intraspecific relationships (between 0.15 and 0.22 versus between 0 and 0.03). By contrast, the SuperTRI analyses provided no support (MPP = 0; Nrep � 1) for the existence of both G. camelopardalis s.s. B and G. reticulata. The monophyly of G. reticulata was found by only ACP5, but with insignificant support (PP = 0.03).

Multispecies coalescent approaches show further geographic structure
Two MSC methods, � BEAST and BPP, showed strong support (PP = 1) for the 3S hypothesis, in which three species can be distinguished, i.e., G. camelopardalis s.s. A, G. giraffa, and G. tippelskirchi. However, STACEY analyses provided support for further species delimitation, i.e., the 5S hypothesis (87%). The five taxa, G. camelopardalis s.s. C, G. giraffa, G. peralta, G. reticulata, and G. tippelskirchi, are also highly supported by both � BEAST and BPP analyses (PP = 1). As recently pointed by Sukumarana and Knowles [71] and Jackson et al. [72], it appears that multispecies coalescent methods delimit structure, not species. In agreement with that, it is important to note that only two of the five putative MSC species can be diagnosed by molecular signatures (Fig 1), i.e. the ones assumed by the 3S hypothesis: G. tippelskirchi is characterised by seven exclusive synapomorphies (ES), all found in the UBN2 gene, which are shared by 19 individuals; and G. giraffa is characterised by 12 ES detected in four independent genes and shared by 61 individuals. For the three other taxa of the MSC 5S hypothesis, we did not detect any fixed mutation in the 21 nuclear introns. This means that the populations of G. camelopardalis s.s. C, G. peralta, and G. reticulata have never been completely isolated genetically. Their grouping into G. camelopardalis s.s. A is however supported by eight ES detected in five independent genes and shared by 57 individuals. The 3S hypothesis is therefore strengthened by the criterion of genetic isolation, as the detection of ES in the three species G. camelopardalis s.s. A, G. giraffa, and G. tippelskirchi indicates that their populations were reproductively isolated during enough time, allowing for the fixation of diagnostic mutations in all individuals.

Interspecies relationships within Giraffa
According to the fossil record, contemporary giraffes first appeared during the Pleistocene around 1 Mya [73], a hypothesis also supported by molecular dating estimates [74]. All candidate species to root the tree of giraffes are highly distant taxa: Okapia, which is the only other extant genus of the family Giraffidae, separated from Giraffa during the Middle Miocene (around 15.2 Mya); other ruminant families, such as Bovidae, Cervidae, Moschidae and Antilocapridae, diverged from Giraffidae at the transition between Oligocene and Miocene (around 23.4 Mya) [74]. The rooting of the giraffe tree can be therefore misleading due to a long branch attraction (LBA) artefact (for a review see Bergsten [75]) between the distant outgroup and one of the longest branches of the ingroup. This problem explains the highly variable root position in our mitochondrial analyses: with MrBayes, the first haplogroup to diverge is either S (Fig 1B, PP = 0.37; BP = 60) or E (if the two bovid species are excluded as outgroup taxa, data not shown, PP = 0.55); with BEAST, haplogroups E and S are found to be sister-groups (PP = 0.74), as in the mitochondrial tree of Fennessy et al. [38].
The nuclear dataset provided more signal for resolving basal relationships within Giraffa. As indicated in Fig 1, our phylogenetic analyses supported a sister-group relationship between G. giraffa and G. tippelskirchi (PP = 0.82; BP = 71). This node was found monophyletic with 6 independent markers (C1orf74, DDX1, COL5A2, SAP130, USP33 and USP54). By comparison, SuperTRI analyses clearly showed that the two other hypotheses (either G. camelopardalis s.s. A + G. giraffa or G. camelopardalis s.s. A + G. tippelskirchi) are less supported (MPP � 0.09; NRep � 2 markers). All these results agree therefore with a deep North/ South dichotomy within Giraffa.

Evidence for introgressive hybridization between giraffe species
The comparison between the mtDNA tree based on 82 giraffe haplotypes and the nuclear tree reconstructed from 21 introns sequenced for 137 giraffes reveals a robust conflict for the evolutionary history drawn from maternal and biparental markers (Fig 1). Some mito-nuclear conflicts can be simply explained by recent hybridization between sympatric or parapatric taxa (species or subspecies), resulting in the transfer of the mitochondrial genome from one taxa to the other, a process referred to as mitochondrial introgression [6][7][8][9][10][11].
A first case of potential hybridisation is represented by the mitochondrial haplotype TIP15, which constitutes the sister-group of the main haplogroup of reticulated giraffes (Fig 2B), from which it differs by a distance of only 1%. The nine Masai giraffes possessing this haplotype were collected in southern Kenya (Athi River Ranch) [29], where wild populations of tippelskirchi and reticulata can sometimes hybridize [76]. We suggest therefore that introgressive hybridization can account for the transfer of the mitochondrial haplotype TIP15 from reticulata to tippelskirchi. The allelic networks of the 21 nuclear introns suggest also past nuclear introgression, this time from tippelskirchi to reticulata, as two individuals of reticulata, ISC04 and RETWil2, are characterized by several rare alleles identical or similar to those found in tippelskirchi: in ACP5 (only for RETWil2), COL5A2 (only for ISC04), CTAGE5 and DDX1 (both individuals) (Fig 3).
The second case of mitochondrial introgression concerns the haplotype RET8 detected in one reticulated giraffe from the Nürnberg Zoo [38]. Its grouping with Rothschild's giraffes may be explained by interbreeding between reticulata and rothschildi either in zoos [77] or in the wild, as field observations have documented the occurrence of reticulata X rothschildi hybrid phenotypes in Kenya [78]. Unfortunately, these hybrid individuals or populations were not yet studied for nuclear genes.
The mitochondrial haplotype RET9, which was detected by Brown et al. [29] in a single reticulated giraffe (accession number: EU08821), is intriguing because it is divergent from all other sequences of haplogroup N. We propose two hypotheses to explain its divergence. The first hypothesis assumes the retention of ancestral haplotypes in wild populations of reticulated giraffes; it will be confirmed if identical or similar haplotypes are discovered in other reticulated giraffes. Another hypothesis implies that the sequence EU088321 is problematic, either because it contains multiple sequencing errors or because it is a nuclear sequence of mitochondrial origin (Numt) [79]. Obviously, further investigations are needed to solve this issue.
The most important and interesting mito-nuclear discordance concerns giraffes from eastern and southern Africa. In the nuclear tree (Fig 2A), these giraffes are divided into two geographic groups corresponding to two different species: giraffes from southern Africa (South Africa, Namibia, Botswana, and southern Zambia) belong to G. giraffa, whereas eastern giraffes (southern Kenya, Tanzania, and northern Zambia) belong to G. tippelskirchi. These two species are not monophyletic in the mitochondrial tree: G. giraffa is polyphyletic, because members of the two subspecies giraffa and angolensis are not grouped together; whereas G. tippelskirchi is paraphyletic, due to the inclusive position of the subspecies giraffa (southeastern giraffes). To interpret these conflicting results, it is crucial to remember that basal relationships within Giraffa are not reliable in the mitochondrial tree, due to a high genetic distance towards outgroup taxa (see above for explanations). Taken this in mind, it can be hypothesized that the three species identified with nuclear data were characterized by three different ancestral mitochondrial haplogroups: N for G. camelopardalis s.s. A, E for G. tippelskirchi, and S for G. giraffa. According to this hypothesis, we can further propose that the common ancestor of southeastern populations of G. giraffa (subspecies G. g. giraffa) acquired a mitochondrial genome from G. tippelskirchi (haplogroup E) by introgressive hybridization between parapatric populations. Using a calibration at 1 ± 0.1 Mya for the common ancestor of giraffes [73,74], we estimated that the introgressive event occurred around 420 kya (see Fig C in S2 Appendix), i.e. during one of the most important glacial periods of the Pleistocene. In sub-Saharan Africa, glacial periods were generally characterized by the contraction of forest areas and the concomitant extension of open areas, such as savannahs and deserts. In addition, river levels were lower, facilitating dispersals and the colonization of new areas. Since Pleistocene environments were more stable in subtropical southern East Africa than in tropical East Africa [80], we suggest that some Masai giraffes migrated around 420 kya from East Africa to southern East Africa, promoting secondary contacts between G. tippelskirchi and G. giraffa, and therefore the mitochondrial introgression of haplotype E into G. g. giraffa. In the latter subspecies, the ancestral haplotype S has been completely replaced by the new haplotype E. By contrast, the ancestral haplotype S has been maintained in southwestern populations of the subspecies G. g. angolensis. The absence of haplotype E in southwestern giraffes suggests that female giraffes were not able to disperse from East to West and reciprocally. Important biogeographical barriers may have been the Kalahari Desert during glacial periods of the Pleistocene, and the Okavango Delta associated with Palaeo-lake Makgadikgadi during interglacial periods. However, nuclear data support gene flow mediated by dispersing males between eastern (G. g. giraffa) and western populations (G. g. angolensis) of southern giraffes. Female philopatry and male biased dispersal are classically observed in mammal species [81]. In giraffes, such different sexual behaviours can be explained by nursery herds, which consist of several females and their offspring [82], and by solitary males, which spend a lot of time to find receptive females. Thereby, males may often have to migrate over long distances to successfully pass on their genes [83]. In this regard, we can assume that males are generally more willing than females to take the risk of overcoming biogeographic barriers, such as deep rivers or large deserts. Markers from the Y chromosome should be sequenced to further address our biogeographic scenario involving a better dispersal capacity for males than females.

Conclusion for giraffe conservation management
The species is the most important taxonomic unit for conservation assessments and for the establishment of justified management plans [84,85]. Giraffes are currently considered as a single species by the IUCN [37], but its status has recently moved from Least Concern to Vulnerable due to a population decline of 36-40% over three generations. Even though, the situation seems to have improved for some populations (e.g. giraffa [86]; peralta [87]) in the course of enhanced conservation management, population numbers of most subspecies continue to decrease [37].
Our taxonomic study indicates that the conservation status should be separately assessed for the three species G. camelopardalis s.s. A (northern giraffes), G. giraffa (southern giraffes) and G. tippelskirchi (Masai giraffes). According to population estimations of the IUCN [37], the southern species G. giraffa, has recently increased by 168% and hence fall into the category "Least Concern"; the East African species G. tippelskirchi has decreased by � 50% over a period of three generations and hence should be listed as "Vulnerable"; the northern species G. camelopardalis s.s. A has decreased by � 70% over the past 30 years and with only 20 000 individuals left in the wild, it should be listed under the category "Endangered" (according to Criterion A1 [88]).