New Insights into Samango Monkey Speciation in South Africa

The samango monkey is South Africa's only exclusively forest dwelling primate and represents the southernmost extent of the range of arboreal guenons in Africa. The main threats to South Africa's forests and thus to the samango are linked to increasing land-use pressure and increasing demands for forest resources, resulting in deforestation, degradation and further fragmentation of irreplaceable habitats. The species belongs to the highly polytypic Cercopithecus nictitans group which is sometimes divided into two species C. mitis and C. albogularis. The number of subspecies of C. albogularis is also under debate and is based only on differences in pelage colouration and thus far no genetic research has been undertaken on South African samango monkey populations. In this study we aim to further clarify the number of samango monkey subspecies, as well as their respective distributions in South Africa by combining molecular, morphometric and pelage data. Overall, our study provides the most comprehensive view to date into the taxonomic description of samango monkeys in South Africa. Our data supports the identification of three distinct genetic entities namely; C. a. labiatus, C. a. erythrarchus and C. a. schwarzi and argues for separate conservation management of the distinct genetic entities defined by this study.


Introduction
The geographical distribution of the arboreal guenon Cercopithecus albogularis ranges from central and eastern to southern Africa where it occurs in different evergreen forest types including rainforest, Afromontane and riparian forests, as well as swamp and coastal forests [1]. The species belongs to the highly polytypic Cercopithecus nictitans group [2] which is sometimes divided into two species C. mitis and C. albogularis. Groves [3,4] recognises both species and uses the classification of C. albogularis for individuals distributed from Ethiopia to South of the animals as well as the sample collection, transport and storage. Animals were not removed from the wild. They were captured and released at the same location.

Sample collection
A total of 72 samango monkey blood and tissue samples were collected from five populations in Southern Africa (Table 1). All individuals included in the morphological analyses were also included in the microsatellite analyses for all five populations, with the exception of Hogsback, where data was unavailable for a subset of captured individuals. Samples were collected from the southernmost C. a. labiatus population in Hogsback (HB; -32.59526, 26.95675), and C. a. erythrarchus from two coastal populations in Cape Vidal (CV; -28.12329, 32.55638) and Sodwana Bay (SDB; -27.54692, 32.66939), a population along the Escarpment in Magoebaskloof (MK; -23.88859, 29.99582) and from two troops from the northernmost population in the Soutpansberg mountains (SM; -23.037881, 29.441949). See 1 for the geographic distribution of sampled populations and their acronyms. Trap-door metal cage traps (1.25m x 0.6m x 0.6m) were placed on the ground and were baited with either oranges or apples. Once secured in a trap, the samango monkeys were anesthetised with an intramuscular injection of a tiletamine/ zolazepam (Zoletil, Virbac, South Africa) combination at an estimated dose of 3.5-5 mg/kg. Three milliliters of blood was collected with a syringe and needle from the caudal saphenous vein and divided into serum and EDTA blood tubes (Vacutainer,BD, United Kingdom). The blood samples were placed on ice for up to 6 hours, after which they were frozen at-20°C until analysis. Several full length hairs were plucked out from between the shoulder blades of each monkey and placed in plastic Ziploc bags. Small (3mm in diameter) skin and cartilage samples were obtained from the pinna of each monkey using a normal leather punch. The punch was cleaned and disinfected (F10sc disinfectant, Health and Hygiene (Pty) LTD, South Africa) between samples. The skin and cartilage samples were placed in 1.5ml eppendorf tubes containing 95% ethanol. DNA was extracted using the Qiagen DNeasy Blood and Tissue Kit (GmbH, Germany) following the extraction protocol as outlined by the manufacturer. Each animal was HB = Hogsback) are shown in relation to indigenous forests (DWAF 2004)  assigned to a gender and placed in one of three age classes, namely juvenile, sub-adult or adult based on tooth eruption where sub-adults have at least one permanent canine and adults have all four permanent third molar. Standard morphological measurements were taken (see below) before the individuals were placed in a darkened cage until they had recovered from the anaesthetic to be released back into the wild. A qualified veterinarian registered with the South African Veterinary Council (registration number D97/4000) and experienced in primate capture and anaesthesia was present at all the capture sites, administered the anaesthetic agents and ensured that animals were fully recovered prior to release.

Morphological measurements
For 101 individuals, body mass (BM, kg) was obtained using a digital hanging scale (25kg Sportsman's digital scale, Rapala, USA), whilst the following standard measurements were obtained (in cm) using both a tape measure and calliper: neck circumference (NC), head and body length from nose-tip to base of tail (HB), tail length (TL), hind foot length, without claw (HF), ear length (EL), ear width at base (EW), canine length (CW) and nipple length (NW). Full body pictures of individuals from different sites were also taken. Measurements made using a tape measure were converted using simple linear regression equations. Apart from nipple length which was only measured in females individuals, variables were log-transformed and analysed statistically for sexual dimorphism and age variation using 2-way Analysis of Variance (ANOVA) in the largest population from Hogsback (n = 67). Owing to significant variation between the three age classes, data from adults were compared between all populations using principal component analysis (PCA). All statistical analyses were carried out using the programme PAST version 1.91 [13].

Hair analysis
In order to assess variation in hair characteristics among the different samango populations sampled we analysed individual guard hairs of adult individuals only. As the Hogsback population was sampled at different months of the year (S1 Table) we chose five individuals per each study month to also allow for analysis of seasonal variation of hair characteristics. The number of adult individuals included for the hair analysis varied as it was depended on how many adults were captured. For each individual monkey we analysed five hairs. We analysed 34 adult individuals (20 females, 14 males) totalling 170 hairs. Hair samples were analysed with the naked eye under a 20 Watt 240 Volt halogen lamp. Hair length was measured using an ordinary ruler with millimetre divisions. Five hairs originating from the same individual were fixed on the base onto a white piece of paper using transparent tape. Analysis under a dissecting microscope proved to be unsatisfactory as the hairs did not fit in the field of view in their entire length. Furthermore, the bright LED light source of the microscope made it much more difficult to detect the subtle colour differences. Hair characteristics analysed included: colour of hair base and tip, number of bands, colour of bands, number of bands per colour, width of bands and total hair length. Bands were counted starting with the first visible light band from the base of the hair. Band length was scored using three categories: bands equal, light/dark band longer and light/dark band double the length. Hair colour was scored using three different categories: white, light yellow and dark yellow. These categories were established after comparing hairs from all different geographic locations (Fig. 2) in order to identify the range of colours (shades and extremes). Furthermore hairs of all three colour categories were fixed next to each other with transparent tape onto the white analysing sheet, constantly visible during hair analysis. In order to statistically test specifically for species/subspecies differences in hair characteristics, three geographical groups were formed: Inland (Soutpansberg Mountain and Magoebaskloof), Coast (Cape Vidal and Sodwana Bay) and Hogsback (remained on its own). All statistical analysis was carried out using the software R version 3.0.1. Full body in situ photographs of individuals from the three populations were compared to examine differences in pelage colouration.

mtDNA sequence analysis
The purified DNA was used as a template to amplify mitochondrial gene fragments using the Cyt B primers L14724 cgaagcttgatatgaaaaaccatcgttg and H15149 aaactgcagcccctcagaatgatatttgtcctca [28] and 16S primers 16SA cgcctgtttaacaaaaacat and 16SB ctccggtttgaactcagatca [29,30]. Polymerase chain reaction (PCR) was carried out using a 25 μl reaction volume and was conducted with Thermo Scientifics' DreamTaq Green PCR master mix which has a 1 x buffer containing 10 mM TrisHCl (pH 9.0), 50 mM potassium chloride (KCl) and 0.1% Triton X100. The final reaction conditions were as follows: 1 X PCR buffer, 2.5 mM MgCl 2 , 200 M of each dNTP, 5 pmol of each of the forward and reverse primer, 1 unit (U) Taq DNA polymerase and 20 ng genomic DNA template. PCR cycling conditions were as follows: Initial denaturation at 95°C for 5 min, annealing at 52°C for 50 sec and extension at 72°C for 1 min, 30 cycles of denaturation at 94°C for 30 sec, annealing at 50°C for 50 sec and extension at 72°C for 1 min. A final extension step of 72°C for 20 min concluded the cycling. After the initial PCR amplification, products were purified according to the Exo/Sap amplicon purification method described by Werle [31]. Cycle sequencing of the PCR products obtained was performed with the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) according to the manufacturer's protocol. Sequencing products were purified with the ZR DNA Sequencing Clean-Up Kit (Zymo Research) and sequenced on an ABI 3130 genetic analyser in forward and reverse. The raw sequence data were analyzed using the ABI Prism DNA Sequencer software v3.4.1.

Analysis of nuclear genetic diversity
MICRO-CHECKER [32] was used to detect possible genotyping errors, allele dropout and non-amplified alleles (null alleles) for each microsatellite locus. Population genetic analyses were carried out at two scales: firstly, with populations defined as recognised subspecies (C. a. labiatus and C.a. erythrarchus), and secondly as four populations, largely reflecting collection localities, that were identified from spatially independent multivariate analyses of the two currently recognised subspecies. The latter populations are Hogsback (HB), Sodwana Bay and Cape Vidal combined (SDBCV), Magoebaskloof (MK) and Soutpansberg Mountains (SM). To estimate the levels of genetic diversity within populations, the mean number of alleles per locus (N A ), observed heterozygosities (H o ), expected heterozygosities (H E and uH E ) and deviations from Hardy-Weinberg (HW) proportions were calculated using MS TOOLKIT [33] and GEN-ALEX version 6.5 [34,35]. Linkage disequilibrium between pairs of microsatellite loci within each population and locus was evaluated using GENEPOP 4.1.4 [36,37]. Associated probability values were corrected for multiple comparisons using Bonferroni adjustment for a significance level of 0.05. To estimate levels of divergence among populations, spatially explicit and spatially independent analyses were carried out. The former included the estimation of various fixation and differentiation indices by permutation, and F ST -and R ST -based analyses of molecular variance (AMOVA) [38], that were carried out in GENALEX [34,35]. Spatially independent analyses included assignment tests; principal coordinates analysis (PCoA), factorial correspondence analysis (FCA) and Bayesian cluster analysis in STRUCTURE.

Population and regional genetic structure
The level of genetic differentiation was determined among and within the currently recognized subspecies, and the four distinctive populations. The genetic relationship between populations and individual assignments of samples was inferred via a Bayesian clustering analysis using the statistical programme STRUCTURE version 2.3.3 [39]. The programme was run without prior population information (option USEPOPINFO = 0, no LOCPRIOR), to ensure that the pre-defined populations were in agreement with the genetic data. STRUCTURE was initially run for all 72 samples with 20 replicates from K = 1-9, with a run-length of 1 million repetitions of Markov chain Monte Carlo (MCMC), following the burn-in period of 100,000 iterations. A second STRUCTURE run that included only C. a. erythrarchus (n = 35) samples was run with the same settings for K = 1-5. The K with the greatest increase in posterior probability (ΔK) [40] was identified as the optimum number of sub-populations using STRUCTURE HAR-VESTER [41]. CLUMPP [42] generated the average of these 20 runs for the appropriate K, and the results were visualised in DISTRUCT [43].

Analysis of mtDNA data
Alignment and vetting of sequence data were carried out using Bioedit v7.0.9.0 [44] and FinchTV v 1.4 (Geospiza Inc.), and published sequences for closely related species were included in alignments to confirm amplification of the correct target region. Standard population genetic diversity and differentiation indices (number of haplotypes, haplotype diversity, h, nucleotide diversity, π, the average number of nucleotide differences, k and φ ST ) were calculated using DNAsp v 5.10.01 [45] based on combined mitochondrial sequence data. Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian MCMC phylogenetic analyses were conducted for each gene region separately, and for combined datasets. Maximum Parsimony (MP) haplotype networks [46] were generated in Network v 4.6.1.0 (www.fluxusengineering.com) and Splitstree v 4.10 [47] (10 000 bootstrap replicates) using the default settings. Statistical model selection was carried out for each dataset in MEGA v 5.0 [48] and jModeltest [49]. ML phylogenetic trees were generated in MEGA v 5.1 [48], with branch support evaluated using bootstrap-resampling (10 000 replicates). Bayesian phylogenetic trees were estimated using MCMC in MrBayes Version 3.1.2 [50] (with one million generations, trees sampled every 100 generations) and BEAUti/BEAST v2.1.1 [51] (twenty million MCMC generations, priors as in Hart et al. (2012) and the whole tree constrained with a lognormal distribution around 6 My, mean M = 1.79, standard deviation S = 0.1), with a ten to 25% burnin. Divergence date estimates were inferred using a Bayesian approach implemented in BEAST [52]. These analyses were run separately for the two gene regions and for a combined (concatenated) dataset. The results of four independent BEAST runs were checked for adequate mixing and convergence using Tracer 1.5. Once convergence was achieved, BEAST tree files were combined using LogCombiner, summarized using TreeAnnotator 1.5.3 and visualized in FigTree 1.3.1. CorelDraw was used to edit the haplotype networks and phylogenetic trees.

Morphometric analyses
As expected, age explained a significant proportion of morphometric variance (8-54%SS) in all variables except for ear length ( Table 2). Significant sexual dimorphism was present in all variables except for ear length and width, neck circumference and head and body length ( Table 2). Significant interaction (age-sex) terms for all variables except ear length ( Table 2) signalled divergent growth projectories in males and females associated with secondary sexual dimorphism and progressively greater increases in most male morphometric variables following puberty. This is exemplified by plots of mean body mass for both sexes in the three age classes; sexual dimorphism is negligible in juvenile and subadults but very marked in adults (S1 Fig.). Based on these significant differences between age and sex classes, for subsequent analyses of population variation we excluded juveniles and subadults and analysed males and females separately. From PCA of seven log-transformed morphometric variables, both males and females showed Soutpansberg animals to be distinctly larger-sized (higher PC1 scores) than other populations, with Magoebaskloof (in females) being somewhat intermediate ( Fig. 3A and 3B; S2 Table). Soutpansberg animals tended to have disproportionately longer tails than in other populations, as indicated by positive scores for PC2 on the scatterplots and positive loadings for tail length on PC2 compared with negative loadings for other variables ( Fig. 3A and B).    Table 3). A Pearson's Correlation analysis of hair characteristics revealed a significant correlation between hair length and number of bands (t = 8.2603, df = 168, p-value = < 0.0001) and we found that, except in two hairs, the total number of bands was always an even number (2,4,6,8,10) and that the number of black bands and light bands was always equal in numbers (Fig. 3C). The base and tip of the hairs were found, without exception, to have the same colour in all individuals across study sites, with the base being white and the tip being black. An analysis of variance (ANOVA) of hair length showed significant differences between the three study sites (df = 2, f = 4.52, p-value = 0.0122). A post hoc Tukey HDS test showed that the Inland and Coast populations differed significantly in hair length (p-value = < 0.01); hair lengths between Hogsback and Coast as well as Hogsback and Inland did not differ significantly (Fig. 3D). Using the Hogsback population only for analysis of seasonal variation in hair length (ANOVA), we did not find significant differences of hair length between the different sampling months (df = 2, f = 0.41, p-value = 0.665). Furthermore, a t-test showed no significant differences between Hogsback males and females regarding hair length (t = -0.56, df = 70.36, p-value = 0.5801). When comparing full body in situ photographs of individuals from the three populations, pelage colouration differences are clearly visible (Fig. 2B to 2D  Genetic analysis: Populations defined as currently recognized subspecies Analysis of nuclear and mitochondrial genetic diversity. In the C. a. labiatus (Hogsback, HB) samples, only two loci significantly deviated from HWE (loci D12S67 and D2S1326), and two loci were monomorphic (loci D11S956 and D6S311). Among C. a. erythrarchus individuals, the pattern was strikingly different, with 11 loci deviating from HWE. Each pair of loci in each population was tested for linkage disequilibrium (420 tests), and 81 of these were significant. However, only eight were significant within C. a. labiatus, and only three locus pairs showed significant evidence of linkage in both populations (loci D5S1466 and D10S1432, D10S1432 and D2S1326, D5S1457 and D2S1326). Genetic diversity was lower among C. a. labiatus than among C. a. erythrarchus in terms of, among others, the number of alleles, heterozygosity and mtDNA haplotype diversity ( Table 4). The mean number of alleles per locus over the two populations ranged between 1.5 and 8, with an overall mean of 4.17 over loci and populations. All 37 C. m. labiatus (HB) samples exhibited at least one of 15 private alleles found at 11 loci in that population. Additionally, at least one of the 50 private alleles found among C. m. erythrarchus samples, i.e. alleles that were not found among C. m labiatus (HB) samples, was exhibited by all individuals (n = 35) from this population. This pattern indicates strong divergence and restricted gene flow between the currently recognized subspecies.
Spatially explicit analyses. Various fixation indices (F ST , G ST ), corrected (standardized) fixation indices (G" ST ) and a pure estimate of differentiation (D EST ) were calculated to estimate genetic divergence between the two putative subspecies. Pairwise estimates of population differentiation were all high and highly significant ( Table 5, P = 0.0001). The maximum possible value for G ST , given the dataset was G STmax = 0.33±0.04. Locus D6S311 consistently showed the strongest signal of differentiation, and locus s was the only locus that did not show significant (P<0.05) differentiation among the two subspecies (locus s P = 0.479) based on G ST . Corrected F ST (F' ST ) and R ST were estimated during AMOVA [53]. The F ST -based AMOVA showed that 25% of the variance in allele frequencies was explained among populations, and 14% among individuals. F STmax was 0.467 and F ST was estimated at 0.25 (P = 0.0001), giving an F'ST of 0.526. F IS (0.188) and F IT (0.391) were also high and highly significant (P<0.001), indicating strong population structure. The R ST -based AMOVA generated a highly significant (P<0.001) R ST -value of 0.157, with 16% of variance distributed among the two subspecies. Estimates based on the combined mtDNA sequence data for the two subspecies also indicated significant divergence (Table 5). Interestingly, these estimates all increased when populations were defined at a finer scale i.e. collection localities. Spatially independent analyses. Bayesian cluster analysis in STRUCTURE clearly identified the two subspecies, with members of each subspecies being highly likely to belong to that subspecies based on their multi-locus genotypes ( Fig. 4A and 4B). Posterior probabilities (Ln) using Bayesian admixture analysis were calculated for K = 1-9, with K = 2 being identified as the most likely true K value based on STRUCTURE HARVESTER results (S2 Fig.; K = 2 displayed the greatest posterior probability). Results indicated two distinct clusters for C. a. labiatus and C. a. erythrarchus (Fig. 4A and 4B). All 72 individuals were correctly assigned to their subspecies based on their multi-locus genotype (S3 Fig.), indicating that the two taxa are highly distinctive.
Multivariate Principal Coordinates and Correspondence analyses. C. a. labiatus individuals from Hogsback formed a tight, distinctive cluster in the principal coordinate's analysis and are clearly divergent from all C. a. erythrarchus individuals (S4 Fig.), and the forest and village troops appear to be genetically homogenous based on their multi-locus genotypes. Samples of the latter subspecies, however, appear to show higher levels of structure, and fall into three groups that largely reflect their sampling locality (S4 Fig.). Factorial correspondence analysis conducted in GENETIX showed a very similar pattern to the principle coordinates analysis in that C.a. labiatus individuals from Hogsback formed a tight, distinct group, and C.a. erythrarchus from other sampling localities were also distinct from each other, although they were clearly more similar to each other than to HB samples (S5 Fig.). However, one Soutpansberg Mountain sample (SMBT6) appears to be more closely related to individuals from Magoebaskloof (S4 Fig.).
Phylogenetic trees. Two distance measures based on microsatellite data were used to construct UPGMA phylogenetic trees of individuals with bootstrapping over loci (1000 replicates): DSW and Cavalli-Sforza and Edwards Dc (chord distance, S6 Fig.). Both methods produced similar results that reflected the multivariate analyses above. All C. a. labiatus individuals from Hogsback form their own distinctive clade (Clade A, S6 Fig.). The Soutpansberg Mountain and Magoebaskloof C. a. erythrarchus individuals appear to be very closely related (Clades C and D

Fine-scale analyses: Populations defined as collection localities
Results based on the full microsatellite dataset of 72 individuals suggest that further structure exists among the C. a. erythrarchus individuals sampled. A second STRUCTURE analysis was therefore run that omitted C. a. labiatus, which is clearly very distinct from all C. a. erythrarchus populations. Settings for this STRUCTURE analysis were 100 000 burnin, 1 million MCMC, admixture model, no LOCPRIOR, and 20 replicates for K = 1 to K = 5. STRUCTURE-HARVESTER identified the most likely number of clusters as K = 2 (Fig. 4C). Structure results for the 20 runs (1 million) were averaged using CLUMPP, and visualized using DISTRUCT. The two C. a. erythrarchus clusters strongly reflect their collection localities in the Soutpansberg Mountains, and in the coastal Sodwana Bay and Cape Vidal populations (Fig. 4C). However, those samples collected in Magoebaskloof are not as distinctive. The observed population structure appears to reflect coastal and inland forms of C. a. erythrarchus, with restricted gene flow between them. However, samples from Magoebaskloof (MK) resemble hybrids in that most are equally likely to have originated in either of the aforementioned coastal and inland clusters.
HWE, LD, genetic diversity, private alleles. Population genetic analyses were repeated with populations defined as the three C. a. erythrarchus clusters identified by STRUCTURE, and the Hogsback C.a. labiatus population. Where deviations from HWE remained the same for C. a. labiatus, three loci were monomorphic in the Soutpansberg Mountain (SM, loci D6S311, D10S1432, D11S956) population, and two deviated from HWE (D4S243 and D1S207). Loci D10S1432, D1S207 and D5S1466 deviated from HWE in the MK population, and D11S956 was monomorphic. Loci D2S1326, D10S1432, D5S1466, D10S611, D8S1106 and D12S67 deviated from HWE in the SDCV population. 37 of the 840 pairwise linkage tests were significant, but no locus pair showed significant linkage in all four populations. Over all four populations, GENEPOP identified six pairs of loci as significantly linked (S3 Table). Although 171 pairwise comparisons were affected by missing data and monomorphic loci, it does not appear that linkage disequilibrium will influence these results. Across all four populations and all loci, the mean number alleles (Na) was 3.4 and mean observed heterozygosity (Ho) was 0.46 (Table 4). Observed heterozygosity and number of alleles was highest in the SDB and CV population. The highest haplotype and nucleotide diversity was found among C. a. labiatus individuals from Hogsback (Table 4) and the lowest among C. a. erythrarchus from Magoebaskloof ( Table 4). All C. a. labiatus individuals still exhibited at least one of the 12 private alleles in that population, but the each of the C. a. erythrarchus clusters also exhibited private alleles: SM exhibited one private allele (exhibited by one individual, SMN8), MK had five private alleles (at least one of which is exhibited by all samples). The combined SDB and CV populations exhibited 20 private alleles, and all individuals except one (SDB 3) exhibited at least one of these private alleles.
Spatially explicit analyses. Most indices of overall population structure increased compared to the previous analysis comparing only the two subspecies (Table 5).
AMOVA-based estimates of population differentiation. Overall, F ST based on AMOVA was 0.29 (P<0.001), F STmax was 0.50, and corrected F ST , therefore, equals 0.58. Also, 29% of variance is explained among the four populations, and 9% among individuals. Pairwise F ST values estimated during AMOVA (Table 6) show that the HB population is consistently significantly and highly differentiated from all other populations, and that within C. a. erythrarchus, Soutpansberg Mountain population is most different from the Sodwana Bay and Cape Vidal population. The analogous analysis of the two subspecies produced an F ST of 0.24 (P = 0.0001),

Spatially independent analyses
Population assignment test. Overall, 99% of individuals were correctly assigned to their population. The only exception was one individual from Magoebaskloof (MK5), which was assigned to the Soutpansberg Mountain population.
Phylogenetic analysis of mtDNA data. A total of 911 bp of mtDNA sequence data (416 bp of cyt b and 495 bp of 16S; Genbank accession numbers KP120559 to KP120615) were generated in 28 samples (HB = 7, SDB = 7, CV = 5 and SM = 4 and MK = 5). Each gene region contained five haplotypes, and when analysed together, there were a total of eight haplotypes (haplotype diversity = 0.87). MP-based haplotype networks largely reflect the pattern of divergence estimates from analyses of population differentiation (Fig. 5). Haplotypes are largely restricted to single populations, and are therefore not likely to represent numts (nuclear copies of mitochondrial genes). Interestingly, some individuals from Hogsback (C. a. labiatus) share 16S haplotypes with individuals from the SM population (C. a. erythrarchus). The pattern of reticulate evolution reflected in the haplotype networks cannot be represented as a bifurcating phylogenetic tree, but is evident as reduced nodal support in the ML and Bayesian trees (S7-12 Figs.). The combined ML analysis placed C. a. labiatus as the basal lineage separate from the C. a. erythrarchus. In addition, ML analysis did not favour a monophyletic C. a. erythrarchus group but rather split the samples into sister groups as shown in S9 Fig. This pattern was identical to that of the cyt b Bayesian tree (S11 Fig.). However, when 16S is incorporated into analyses, the shared haplotypes between HB and SM changes the tree topology (S12 Fig.). The Bayesian tree constructed in BEAST (S13 Fig.) reflects the complexity of the relationships within C. a. erythrarchus and the reticulate nature of these relationships in that it places the SM population as sister to CV. It also clearly shows the distinctiveness of the HB, MK and SDB populations.

Morphological variation
Morphometric variables in isolation, could not distinguish the two genetic lineages; C. a. labiatus and C. a. erythrarchus described by Meester [8] followed by Grubb [2]. Instead, the Soutpansberg (SM) and northern Escarpment (Magoebaskloof, MK) populations currently classified as C. a. erythrarchus are distinctly heavier and larger in most body metrics than both the Hogsback (Eastern Cape, HB) population of C. a. labiatus as well as the coastal forest populations of C. a. erythrarchus from Sodwana Bay (SDB) and Cape Vidal (CV) in KwaZulu-Natal. Body size in many animals is known to be strongly influenced by environmental factors resulting in well established "rules" of geographic variation, e.g. towards increasing body size at higher latitudes and elevations (Bergman's Rule) [54]. Whilst Bergmann's Rule might explain the larger body size of SM animals compared with those from coastal populations (SDB, CV), the data from HB are anomalous since both the high elevation, high latitude and similar climatic conditions to the Soutpansberg of this site would predict the largest or at least equal body size. It may be that nutritional constraints or other environmental factors limit body size in the Hogsback population. We found clear pelage colour polymorphism in South African samango monkeys and were able to identify three distinct geographic colour morphs: Hogsback, Inland (Soutpansberg, Magoebaskloof) and Coast (Cape Vidal, Sodwana Bay). This finding supports the presence of three sub-species as first proposed by Roberts [10] and as currently accepted by Groves [3]. Within the three geographic colour morphs we found that the HB individuals (currently classified as C. a. labiatus) show hardly any yellow colouration, neither on the back nor in the ischial region (observations also made by both Roberts [10] and Groves [3,4] for C. a. labiatus) and clearly setting them apart from Inland (SM, MK) and Coast (SDB, CV) individuals. The most pronounced difference between Coast individuals and all the other populations found was their generally lighter pelage colour, the prominent and comparatively extensive yellow wash on the back and the orange ischial region. These results are very similar to those of Roberts [10] and Groves [3] who found that the coastal C. a. erythrarchus individuals had the most conspicuous reddish colouration in the ischial region. The overall lighter appearance of the pelage in Coast individuals found in our study could be explained by the fact that in 46% of Coast hairs analysed the light bands were equal to the length of the dark bands as opposed to Hogsback and Inland individuals where black bands were either double the length (Hogsback) or longer than light bands (Inland), resulting in an overall darker appearance. The conspicuously black arms found in both Hogsback and Inland individuals further increase the effect of this dark appearance. Roberts [10] and Groves [3] found that the main colour difference between C. a. schwarzi and C. a. erythrarchus is that the ischial region is buffy yellow (Roberts) or reddish (Groves) in schwarzi but not as reddish as it is in coastal C. a. erythrarchus (Roberts). This description compares well to the pelage colouration we observed for the Inland individuals which had a yellow rather than orange (Coast) ischial region. The Inland population was made up of individuals from Magoebaskloof and the Soutpansberg as they showed the same hair characteristics. Morphological results from our study could not be compared to Meester [8] and Grubb [2] as they do not give any other reasoning rather than distribution to delineate the two subspecies, erythrarchus and labiatus.

Nuclear genetic diversity and population structure
Across the whole dataset of 72 individuals, the number of alleles detected per microsatellite locus ranged from two to 11, with a mean of four, and a mean expected heterozygosity (H E ) of 0.5. The total H E was 0.59. When the dataset was subdivided to reflect the two currently recognised subspecies, C. a. labiatus and C. a. erythrarchus, the former showed lower levels of genetic diversity in terms of number of alleles and heterozygosity (H E = 0.38) and fewer deviations from HWE relative to the latter (H E = 0.62). The H E value for C. a. labiatus falls into the lower range of what has previously been reported for Neotropical primate species [55], for example, among others, Capuchin monkeys (Cebus apella, H E = 0.378) and Squirrel monkeys (H E = 0.239). However, among C. a. erythrarchus, genetic diversity estimates are among the highest described for primate species, for example red howler monkeys (Alouatta seniculus, H E = 0.638) and brown woolly monkeys (L. lagotricha, H E = 0.54). The strong deviations from HWE observed among C. a. erythrarchus samples seem to be as a result of population subdivision i.e. the Wahlund effect, as inbreeding and strong genetic drift would have resulted in lower heterozygosity and numbers of alleles [55]. Also, spatially independent analyses detected some structure among samples belonging to C. a. erythrarchus. In terms of genetic structure and divergence between the currently recognised subspecies, we found strong evidence for isolation based on mutually exclusive private alleles in each subspecies i.e. all the individuals belonging to each taxon exhibited alleles that were not found in the other taxon. This pattern was reflected in the highly significant pairwise estimates of population differentiation (e.g. G ST = 0.146, P<0.001; φ ST = 0.55), which were as high, or higher, than those reported for currently recognised primate species [55]: Spatially independent analyses of the nDNA dataset consistently identified strong subdivision between the two currently recognized subspecies, and also within the C. a. erythrarchus samples. The finer scale structure within this taxon co-reflect the collection locality of individuals in the Soutpansberg Mountains and Magoebaskloof (inland), and coastal populations in Sodwana Bay and Cape Vidal. Bayesian clustering analysis indicates that the coastal and inland forms are distinctive based on their multi-locus genotypes, but that the Magoebaskloof population that falls geographically between the coastal and Soutpansberg populations, resembles both inland and coastal forms, indicating a possible "hybrid zone" between these two population units.
Expected heterozygosity within these C. a. erythrarchus sub-populations ranged from 0.34 in the Soutpansberg Mountains to 0.58 in the coastal population (Sodwana Bay and Cape Vidal). Importantly, population differentiation estimates based on the four predefined populations increased compared to analyses comparing only the two currently recognised subspecies. This, together with the results of the finer scale spatially independent analyses e.g. the assignment test (99% correct assignment) and Bayesian clustering, indicate strong structure among coastal and inland forms of C. a. erythrarchus, and exceptionally strong differentiation between these populations and that of C. a. labiatus.

Mitochondrial population structure
Mitochondrial DNA analysis generated discordant patterns from nuclear DNA in terms of reconstructed relationships among groups. In contrast to both phenotypic descriptions and nuclear DNA, the HB and SM individuals were found to be closely related based on 16S and the SM and coastal populations (SDB and CV) were more similar based on cyt b. The coastal and MK populations were differentiated into two separate groups, and MK was consistently distinctive from all other populations. Discordance between mtDNA and patterns observed for nuclear genomes has been widely reported and has been found to be the highest in mammals [56]. Discordance may be due to inadequate data, homoplasy, nucleotide composition or hybridisation [57]. Discordant gene trees have been previously reported in Colobine monkeys where the authors attributed their findings to ancestral hybridisation [57]. In this study, discordance is most likely due to geographic isolation of HB (C. a. labiatus) and SM (C. a. erythrarchus) populations followed by secondary contact which could occur either via hybridisation, due to human mediated interventions or due to sex-biased dispersal. There is evidence that females are philopatric while males leave their troops before sexual maturity at 6-8 years old [58][59][60]. However, further sampling would have to be conducted in the intervening areas to properly ascertain whether clinal variation occurs between these two populations. In this case, a hybrid zone between these populations may be evident.

Phylogeography
Paleoclimatic data of the late Quaternary (last 150.000 years) was used by Lawes [9] to explain the current distribution of samango monkeys and forests in South Africa. Cercopithecus a. labiatus is suggested to represent the first dispersal event from East Africa along Afromontane forests down into South Africa before the last glacial maximum (LGM) whereas C. a. erythrarchus is thought to have entered South Africa in a second dispersal event, after the LGM within the last 12.500 years, primarily along the expanding coastal belt forest on the Mozambique seaboard. Cercopithecus a. labiatus was until then in all likelihood the only representative of the species present in southern Africa prior to and during the last glacial maximum [9]. Our study shows that labiatus and erythrarchus diverged about 1.7 Mya (1.6-2.6 Mya) during the mid-Pleistocene. During this period the subtropical African climate periodically oscillated between markedly wetter and drier conditions, with step-like increases of variability and aridity near 2.8 Ma, 1.7 Ma, and 1.0 Ma and also suggesting more varied and open habitats after 1.8 Ma [61]. These wet-dry oscillations will have had a marked influence on the extent and continuity of forest habitat on the continent and DeMenocal [62] suggests that they offered discrete opportunities for ecologic fragmentation and genetic isolation. The subspecies labiatus was likely isolated by retreating forests during one of the drier Pleistocene periods, indicating that some forest refugia remained south of the Umfolozi system, not only throughout the LGM but also throughout the mid Pleistocene. There is also no genetic indication of secondary hybridisation with later radiations of the species which suggests that a geographic barrier of some sort, such as the Umfolozi system, as suggested by Lawes [9], will have prevented subsequent radiations dispersing further south. Our data supports the theory of separate radiation events of the species into southern Africa. However, the suggestion of only two recent radiation events suggested by Lawes [9] is possibly, considering our results, a bit too simplistic. The first radiation of what we currently refer to as labiatus must have, according to our results, occurred prior to the mid-Pleistocene and it thus seems more likely that several older Pleistocene radiation events occurred before the LGM. There is also no reason to assume that gallery forests of river systems (e.g. Limpopo system) connecting forests did not play a role in the radiation of the species. Our results indicate that there are genetic as well as morphological differences between the coastal and inland erythrarchus populations in South Africa. Future sampling of inland and coastal populations from Zimbabwe (inland) and Mozambique (coast and inland) further north will show if this might be a general pattern within the species range and also give additional insights about possible radiation routes and the number of radiation events.

Taxonomic conclusions
Our study gives the most comprehensive insights to date into the taxonomic description of samango monkeys in South Africa. Spatially independent analyses of the nDNA dataset and data based on pelage colour consistently identified strong subdivision between the two currently recognized subspecies, and also within the C. a. erythrarchus samples. The reticulate evolutionary pathway evident in the mtDNA haplotype networks, discordant from nDNA, likely represents an ancient pattern of hybridisation or introgression among samango monkey populations, and highlights the complex evolutionary history of this group. Differences in morphology and nDNA markers, however, suggest that these populations are on distinct evolutionary trajectories. Our data thus support the two recognised subspecies namely; C. a. labiatus and C. a. erythrarchus. Within C. a. erythrarchus two lineages should be proposed. The Soutpansberg is currently classified as C. a. erythrarchus by Friedmann and Daly [12] and is geographically not explicitly mentioned in any samango subspecies descriptions by either Roberts [10] or Groves [3]. However, as Magoebaskloof geographically falls under the distribution range described for C. a. schwarzi by Roberts and Groves and as the Soutpansberg and Magoebaskloof show the same pelage characteristics, the Soutpansberg population should more accurately be classified as C. a. schwarzi. Whereas, the costal populations (CV and SDB) may be assigned to C. a. erythrarchus (type locality Inhambane on the coast of Mozambique). However, the taxonomy of the Magoebaskloof population remains unresolved. This population is highly divergent based on mtDNA sequence data, but is relatively uniform genetically (nDNA) and phenotypically (morphometrics and pelage colour) to the SM population (C. a. schwarzi). However, it is unclear whether this population can be attributed to C. a. schwarzi or if it is a hybrid zone or a distinct lineage. Therefore, the above hypothesis should be tested by further sampling in intermediate localities and adjacent countries to properly ascertain whether clinal variation occurs throughout the range of these taxa, or if they are isolated, as it currently unclear whether these lineages are distinct species or sub-species. Additional markers (such as X-and Y-chromosomes and mobile elements) are also required to fully resolve the taxonomy of this species and test the validity of C. mitis versus C. albogularis. The final taxonomic outcome is dependent on choice of species concept. According to the Phylogenetic Species Concept (PSC), the three lineages are diagnosable (on three independent characters) and therefore represent species. The PSC has gained widespread acceptance although its application to large mammals has been recently questioned by some conservationists and defended by systematists [63][64][65][66][67][68][69][70]. Regardless of the taxonomic outcome, our data argue for separate conservation management of the three distinct genetic entities defined by this study, as Evolutionarily Significant Units (C. a. erythrarchus and C. a. labiatus) and Management Units (coastal and inland populations of C. a. erythrarchus) following the definitions of Moritz [71]. Distinct genetic entities need to be conserved to protect the loss of genetic diversity, as this diversity is an essential part of biodiversity conservation.  Table. Hairs analysed from the five sampling sites detailing the sampling month, the number of males and females, the total number of individuals and the total number of hairs analysed per site with a total of five hairs analysed per individual.