Combining molecular and landscape tools for targeting evolutionary processes in reserve design: An approach for islands

The importance of targeting ecological and evolutionary processes in reserve design has been widely acknowledged in the literature but rarely implemented on islands. Using Socotran reptiles as models, we aim to relate richness of widespread and restricted-range species directly with landscape variables and to compare the impact of setting conservation targets for lineages versus species. Socotra Island is a UNESCO Natural World Heritage Site, containing high levels of endemism in relation to its area, especially of reptiles, the vertebrates with the most comprehensive available genetic data. We predicted the occurrences of reptile species using distribution models and used a novel approach to interpolate maps of spatial phylogenetic patterns. Patterns of intra and interspecifc diversity and differences between spatial outputs of lineage and species richness were related to eco-geographic variables. We evaluated differences in target achievement for each conservation unit within protected areas (PAs) under the current Zoning Plan (ZP) using gap and reserve design analyses. Although intraspecific richness was strongly correlated with interspecific richness, differences in their spatial distribution reached ~30% in some areas. Differences were more pronounced for wide-ranging than restricted-range taxa. Gap analysis indicates that most conservation units are under-represented in sanctuaries and that intra and interspecific richness were significantly higher outside PAs. This work will guide local-scale conservation planning as the ZP is due to be re-evaluated. This is one of the few studies on islands using genetic data from an entire class of vertebrates to incorporate lineage diversity in reserve design. This study provides an alternative methodological framework for supporting the use of landscape and genetic tools in reserve design, circumventing the use of phylogenetic distances and deterministic spatial interpolation of lineage diversity that can be widely applied to other systems.


Introduction
In face of extensive anthropogenic land cover changes, habitat loss, climate change and accelerating extinction rates, conservation planning needs new and improved tools to prioritize the allocation of scarce resources to maximize successful conservation efforts [1]. Prioritization of areas for conservation to date has mainly focused on representing biodiversity, i. e. including targeted portion of each conservation unit within a Protected Area, rather than enabling biodiversity persistence [2]. Representativeness has been achieved by spatial analysis of quantitative distribution data to design reserve networks that offer a comprehensive representation of biodiversity features (commonly species as conservation units).
The entire range of biodiversity cannot be sampled, and much remains unknown. Therefore, better known taxa or landscape variables are commonly used in reserve design to assess how well a planning unit (i.e. conservation area or grid cell) represents the biodiversity of a wider area [3,4]. Sampling all planning units is time consuming and costly, especially in remote areas. Species distribution models (SDMs), through relating location data to environmental variables, offer an important tool for overcoming the problem of geographically sparse and spatially biased distribution data, allowing to calculate representativeness more precisely [5]. To maximize persistence, the second aim of reserve design, knowledge on genetic diversity and geophysical variables should be taken into account to preserve the evolutionary potential of populations [6]. Genetic diversity is the basis for populations to adapt to environmental changes, disease outbreaks and other extinction risks [7]. Recognizing diversity below species level and incorporating it into modelling frameworks will enable conservation planners to identify areas where populations would potentially persist [6]. Incorporating knowledge on geophysical variables is important for quantifying landscape barriers to gene flow and meta-populations dynamics [6]. Understanding how landscape characteristics structure populations is essential for managing the genetic diversity of species and populations under threat through identifying mechanisms of isolation [8]. Contemporary landscape variables (e.g. land use) can play a role in shaping patterns of genetic diversity at fine geographic and temporal scales, while geological features and historical landscape changes (e.g. rock types) affect genetic diversity across broader spatial and temporal scales [9]. Genetic diversity acts as a reservoir for future adaptations and neutral evolution, and therefore preserving it will protect some of the processes that maintain and generate biodiversity [10]. On islands, the probability of loss of genetic diversity through drift and other neutral processes is generally higher, due to greater isolation, exposure to stochastic catasthopic events, and generally smaller population sizes [7,11]. Thus, identifying spatial patterns of genetic diversity and levels of differentiation among populations is important for guiding the assignment of conservation units and the planning of protected areas, particularly on islands [7]. However, identification of spatial patterns of intraespecific diversity is generally addressed through projecting the pairwise genetic distances between sampled individuals into mid-points between sample locations to interpolate maps of phylogenetic patterns [12]. The practice of mid-point projection is inadequate [13], because it relies on deterministic interpolation methods and assumes that pairwise genetic information is located halfway between the samples, which is unlikely to be the case especially on islands where habitat variables can vary substantially over short distances.
Genetic data can shape conservation priorities, and even predictions of protected areas (PAs) under climate change [14]. Yet, in practice, genetic diversity is often overlooked in biodiversity conservation planning [2], and reserve design studies accounting for intraspecific genetic variability in terrestrial systems are scarce, especially on island systems (e.g. [15,16]). Yet, islands host the highest levels of endemic diversity and extinction rates [11], and genetic variation is most unique, especially on large or remote islands [17]. Consequently, islands represent a large proportion of the global biodiversity hotspots [18]. Some studies have addressed included intraspecific genetic variability through using measures of phylogenetic diversity (e.g. [19]), but this approach has been questioned in case of unresolved phylogenetic relationships of taxa, and because the link between extinction probability and functional diversity, as well as with phylogenetic diversity, is controversial [20]. On islands, it is common for very closely-related radiating taxa to occupy distinct functions in the ecosystem [21], exposing the inadequacies of this framework for island systems. Globally, the distribution of intraspecific genetic diversity follows similar patterns to species richness, with increased diversity in the tropics [22], and global biodiversity hotspots offer good representation of species evolutionary history, both at the clade and species level [23]. However, at finer spatial scales, areas of high species richness are not always concordant with hotspots of genetic diversity. Therefore setting conservation priorities to maximize species representation may not be adequate for preserving genetic diversity within species [24].
The goal of the present work is to address prioritization in reserve design by combining molecular and spatial analyses circumventing the use of phylogenetic diversity and avoiding the caveats of deterministic methods in the interpolation of phylogenetic patterns. This approach was tested on the reptiles of Socotra Island, a highly suitable model system due to the characteristics of the geographic location and taxonomic group. The Socotra Archipelago (a Yemeni governorate; Socotra hereafter) is one of the most inaccessible and biodiversity rich areas worldwide [25]. It comprises four islands, of which Socotra Island is the main one, with maximum length of approximately 130 km, width of 40 km, and an area of circa 3800 km 2 (Fig 1). Located in the Horn of Africa biodiversity hotspot, it is a UNESCO World Heritage Natural site since 2008 and considered one of the world's most irreplaceable protected areas for conservation [26]. Yet, it is also a key under-surveyed area, with limited coverage of genetic data both in terms of data availability and taxonomic coverage [22]. Its conservation Zoning Plan (ZP) classified some areas as sanctuaries (Fig 1; 1.9% of the island's surface) following the Biosphere Reserve Concept, and allocated other areas (26.7%) where exploitation of resources is fully or partially permitted [27]. Studies have called for the ZP to be revised to take into consideration new distribution and systematic data [28,29]. A UNEP-GEP project carried out in collaboration with the local government, has as one of its main goals "improved design and management effectiveness of the existing network of PAs". Therefore this study will serve as the basis for selecting the best new areas to protect reptile diversity on Socotra Island. In this study we aim to: 1) identify and compare spatial patterns of species and lineage richness, determining the effect of landscape variables on these patterns; 2) assess the adequacy of the existing ZP for conserving reptile diversity; and 3) compare reserve designs based on species and lineage diversity. With this novel framework we show the importance of incorporating intraespecific diversity in reserve design.

Study area
Socotra Island (12˚30'; N 54˚E) is located in the Arabian Sea, about 250 km east from the Horn of Africa (Fig 1). Its climate is governed by the Intertropical Convergence Zone and related monsoon cycles. Its vegetation is mainly associated to arid ecosystems, with some differences between the limestone and granitic areas [30].
Socotra has a continental origin from a Gondwana fragment that split from Oman as a result of the opening of the Gulf of Aden 18 million years ago [31]. The granitic complex on Socotra Island, that includes the Haggeher mountains, was always above sea-level (Fig 1). But the limestone plateaus were formed during the Paleo-Eocene and were below sea-level during interglacial transgressions on the western side of the island [32]. Reptiles constitute the most important vertebrate fauna of Socotra. While there are no amphibians, probably only one endemic mammal, and 10 endemic bird species, all 29 native reptile species are endemic to the archipelago, 23 to Socotra Island. The reptile community of Socotra was recently barcoded (using the mitochondrial cytochrome c oxidase 1, COI, region), providing data on their intraspecific genetic diversity [28].

Sampling
Genetic samples and distribution data of the 23 reptile species were collected during expeditions to Socotra Island between 2007 and 2014 (see Vasconcelos et al. 2016 [28] for further details). Using a 10x10 km square grid covering Socotra island, 103 stations (detailed in S1 Table in Supporting Information) were sampled with transects gathering 1299 genetic samples (tail tips) and 2597 presence records (one sample/record per individual). All samples used in this study were collected with appropriate permits from local authorities.

Phylogenetic analyses
Phylogenetic analyses were carried out to determine the number of lineages present in each species based on Sanger sequencing data. Lineages were defined as a sequence or group of sequences within a species which optimized set of nodes defined the transition to inter-specific processes based on the Bayesian implementation of the General Mixed Yule Coalescent Model (bGMYC), detailed below in the lineage delimitation analyses. The datasets used for this analysis consisted of 23 alignments (one per species) of 663 base pairs of the COI gene (number of sequences per species detailed in S2 Table), downloaded from GenBank (accessions KU567304-KU567683; details in Vasconcelos et al. 2016 [28]). This marker was shown to detect intra-specific diversity and to have a good phylogenetic signal despite its reduced length [28]. Phylogenetic analyses could not be performed on three of the species [Myriopholis filiformis (Boulenger 1899), Pachycalamus brevis Günther 1881, and Xerotyphlops socotranus (Boulenger 1889); see [33] for all naming authorities] because they were represented by a single haplotype. After removing identical sequences, alignments were performed with the online application of MAFFT v.7 (http://mafft. cbrc.jp/alignment/server/). Removal of identical haplotypes is necessary because they may cause problems with downstream bGMYC analyses [34]. Best-fit models of sequence evolution were inferred in jModeltest v.2.1.3 [35] with the Akaike Information Criterion (AIC) for model selection.
An ultrametric tree was produced for each alignment using Bayesian Inference implemented in BEAST v1.8.0 [36] to feed the bGMYC analyses, because it allowed to accommodate the uncertainty associated with the estimation of a phylogeny [37]. The bGMYC R package [37] was subsequently used to determine the number of deep mitochondrial lineages per species using an R-library version 3.3 [38]. Two individual runs of 5x10 7 generations were carried out, sampling at intervals of 10,000 generations. Preliminary tests were performed to study models and piors. Models and prior specifications applied were as follows (otherwise by default): model of sequence evolution as calculated by jModeltest [35]; constant size coalescent tree prior; uniform strict clock prior (0-100); random starting tree; base substitution prior Uniform (0,100); alpha prior Uniform (0,10) [39]. Posterior trace plots and effective sample sizes (ESS) of the runs were monitored in Tracer v1.5 [40] to ensure convergence. The results of the individual runs were combined in LogCombiner discarding 10% of the samples and the maximum clade credibility (MCC) ultrametric tree was produced with TreeAnnotator (both provided with the BEAST package). All information related to each species alignment including model selected, number of variable and parsimony-informative positions, and the results of bGMYC are presented in S2 Table.

Lineage delimitation analysis
Deep mitochondrial lineages within Socotran reptiles were objectively identified and delimited using the General Mixed Yule-Coalescent Model implemented in a Bayesian framework (bGMYC) [41,42]. This model detects the most likely tree depth (in the single-threshold approach) at which the pattern of tree branching shifts between a Yule process (reflecting inter-specific phylogenetic structure) and a coalescent process (reflecting intraspecific structure). Because inference of lineages relies on point estimates of the topology and branch lengths, the associated phylogenetic error could decrease the accuracy of delimitation results. Therefore, we assessed uncertainty in phylogenetic tree estimation and model parameters with a Bayesian implementation of the GMYC model, which integrates over these potential sources of error via MCMC simulation [37]. We used the R-package bGMYC to calculate marginal posterior probabilities of lineage limits from the posterior distribution of ultrametric trees reconstructed with BEAST. A post-burn-in sample of 250 trees resampled from that posterior was used to calculate the posterior distribution of the GMYC model. We ran the bGMYC analysis for 100,000 generations with a burn-in of 10,000 generations.

Spatial modelling and analysis
SDMs were generated for 20 of the 23 reptile species of Socotra Island, using Maxent v3.3.3 [43]. The initial dataset included a total of 1774 presence records and 14 landscape variables (LVs; S3 Table). Presence location records were obtained from bibliographic sources and fieldwork observations (detailed in Vasconcelos et al. 2016 [28]), and were mapped in ArcGIS v10 (ESRI). LVs were extracted from previous studies [30,32,44], freely available websites (http:// srtm.csi.cgiar.org) or derived from other LVs (detailed in S3 Table). LVs data came from Digital Elevation Models, local weather stations, geological maps, and habitat cover data at a spatial resolution of approximately 100 m (0.0009˚) to match the resolution of the location records. All habitat variables are presented as percent cover in each cell, except for linear water features (wadis), for which the Euclidean distance of each cell from the closest wadi was calculated using the 'Euclidean Distance' tool in ArcGIS.
A total of 10 model replicates were run with random seed which allows a different random training/testing data partition in each run. For species with <20 location records, we ran either eight (for N<20) or four (N<10) replicates. In each run, 30% of the data were set aside for model testing using bootstrapping. After testing test different values of regularization and number of features, models were run with the default settings and features. The area under the curve (AUC) of the receiver-operating characteristics (ROC) plot was used to measure model fit (good if >0.75). Standard deviation between individual model replicates was used as an indication of prediction uncertainty. The final models included eight LVs with the lowest absolute correlation scores (R 0.75) and 657 presence records (available at figshare DOI: 10.6084/ m9.figshare.4978223) after attempting to minimize the autocorrelation by removing locations that were clustered in the tridimensional LVs PCA space using the 'near 3D' function in Arc-GIS. SDMs could not be generated for three of the 23 species (Myriopholis filiformis, M. wilsoni (Hahn 1978), and Xerotyphlops socotranus) because they had too few number of location records ( 5), therefore only grids with location records were used.
The minimum training threshold was used to transform probability of occurrences into binary maps, representing the predicted suitable range of each species. This thresholding uses all training points and is preferred when data quality is high. Moreover, when designing PAs, it is more appropriate to use a less restrictive threshold [45]. We calculated the true skill statistics (TSS) as an alterative measure of accuracy of the binary models. TSS ranges from -1 to 1 and if values are higher than 0 indicates a performance better than random [46]. To determine the suitable range of each lineage for species with more than one lineage, PHYLIN R-package [13] was used to spatially interpolate genetic distances from phylogeographic data with kriging [13], circumventing the projection of pairwise genetic distances into mid-points between sample locations [13]. The output of PHYLIN was transformed into binary data, using 0.5 probability as threshold, and intersected with the binary SDMs of the respective species to keep only areas with high probability of occurrence in lineage binary models. In total, 45 lineage suitability maps were obtained. Those, together with the species binary SDMs, comprise the dataset for 57 biodiversity features. SDM outputs were overlaid in ArcGIS to generate richness maps for all species, all lineages, as well as wide-range and restricted-range biodiversity features (that occupy less than 30% of the Socotra Island area, following recommendations in [ [47]), resulting in a total of six layers of relative richness. Differences between species and lineage richness output maps (all species/lineages, wide-range and restricted-range species/lineages) were calculated in ArcGIS. Relative richness maps were reclassified into binary SDMs using as threshold the upper quartiles values (75th percentiles). To identify landscape variables associated with high levels of species/lineage diversity, the diversity values of each grid cell in the richness layers for total, wide-range and restricted-range features were directly correlated against the values of the same grid cells for the 14 available LVs, as well as latitude and longitude using the Pearson correlation coefficient (ρ) in R [38].

Gap analysis
A gap analysis was carried out to assess how much the different categories of the ZP encompass biodiversity features and richness [48,49]. The ZP map [27] was georeferenced and the four categories were digitized into polygons. These categories present increasing levels of protection: 0 = general use, allowing a significant level of habitat modification; 1 = resource use reserve, resource use hereafter, allowing a moderate level of sustainable habitat modification; 2 = national park, allowing sustainable use of natural resources; 3 = nature sanctuaries, sanctuaries hereafter, requiring special permits to enter (Fig 1). Category 1 to 3 were considered as Protected Areas. This layer was converted into a grid with the same spatial resolution as the SDMs (~100 m) and intersected with the 57 binary SDMs of all biodiversity features in ArcGIS to calculate the number of cells that fall within each ZP category. Percent inclusion of each species/lineage suitable range within each category was calculated through dividing the number of cells within each category by the total number of grid cells in the binary SDM. This analysis was repeated for all species, lineages and restricted-range biodiversity features richness SDMs. Richness of wide-range biodiversity features was not included in the gap analysis because of their lower conservation value.
Representation targets (percentage of suitable range that should fall within PAs) were calculated according to biodiversity feature attributes, including threatened status, level of protection, and distribution ranges. Following the approach used in previous studies [50][51][52], Critical Endangered (CR) species should be entirely within a PA and higher proportions of representation should be achieved for restricted-range biodiversity features. A minimum target of 10% was set for the feature with the widest range and 100% for the narrowest, then targets of features with intermediate geographic ranges were log-linearly interpolated between these extremes. Only one of the study species is threatened, Hemidactylus dracaenacolus Rösler & Wranik 1999, and classified as Critically Endangered (CR), so a 100% target was set for this species. Four additional restricted-range species are considered Data Deficient (DD), and for those a plus of 30% of representation target was also set.
We did not consider phylogenetic diversity because most of the phylogenetic relationships of Socotran reptiles with other taxa are still unresolved. Additionally, examples of niche divergence and ecological diversification were already detected among closely-related reptiles in Socotra [53,54], suggesting a detach between branch lengths and functional diversity that would hamper a phylogenetic diversity approach. Biodiversity features were considered to be protected if percent cover within PAs was higher than the set representation target. Representation targets for relative richness of total species/lineages and restricted-range features were set as 100% of the 75th percentiles, and were considered to fail target achievement if below this threshold. Target achievement for each biodiversity feature was calculated as a percentage of the sum of the proportion of cells within PAs in respect to the representation target.

Reserve design
Two reserve design analyses were carried out separately for species and lineages, using Zonation v4.0 [55], a software designed for spatial conservation prioritization that is frequently used in reserve design studies [56]. This software uses a gradient-like iterative heuristic to produce a sequential removal of grid cells throughout the ZP for achieving optimal solutions [55]. Target-based planning was chosen as removing rule because the goal was to find the best solution in which the maximum number of lineages met different conservation targets for both runs. In order to generate spatial aggregation within the solution, the rule 'only remove from edges' was selected for both runs. The Boundary Length Penalty, which devalues reserve structures with a high extent of edge, was chosen as the method for inducing reserve network aggregation, because some biodiversity features have fragmented distributions and it is one of the best methods for generating aggregation for management purposes [55]. The warp factor was set to one with an aggregation level of 0.4 based on several runs with different values. A cost layer was added to both analyses, so that grid cells with main roads and urban areas (with a buffer radius of 112 m or 1 km, respectively) were given a high cost when selected (100), secondary roads and resource use areas a middle cost (50), national parks a cost of 25 and sanctuaries a minimum cost (1).
Grid cells with highest rank have highest conservation value. The minimum set of grid cells with higher rank in the final solutions, which assured that all species or lineages were represented and achieved the targets, were selected for reclassifying the solution into binary files.

Phylogenetic and spatial analyses
In total, we identified 45 lineages (mean = 2.0±1.3 lineages/species; range: 1-6). The species with highest number of lineages (6) was Hemidactylus pumilio (Boulenger 1899), while 11 of the 23 species only had one lineage (S2 Table). The number of lineages identified per species with bGMYC was always within the range of number of entities identified using different distance and tree-based techniques in Vasconcelos et al. (2016) [28]. Theoretical variogram models fitted heuristically to the empirical variogram obtained from PHYLIN showed the best representation of the genetic structuring in space.
SDMs performed well and had high discrimination and predictive abilities (mean AUC train = 0.90±0.06 [range: 0.73-0.99], mean AUC test = 0.83±0.006 [range: 0.73-0.98]) and low standard deviations ( Table 1). The TSS mean values (S4 Table) are also good (mean TSS = 0.40), even for restricted-range taxa (mean TSS = 0.48). The LVs with highest contributions to all species' SDMs were shrubland, slope and woodland ( Table 1). The LVs with higher contributions to the SDM of the CR gecko H. dracaenacolus are temperature, total solar radiation and slope, and to the DD trogonophid P. brevis are distance to wadis and garrigue cover. The LVs with higher contributions to the SDMs of the remaining species are highlighted in Table 1, and LVs correlated with richness are detailed in Table 2 (see following section). On average 37.0±26.5% (range: 0.7-89.2%) of Socotra Island was predicted to be suitable for the different reptile species and lineages ( Table 3). The species with the smallest predicted suitable range is H. dracaenacolus, while Mesalina balfouri (Blanford, 1881) has the largest predicted range (Table 3).

Spatial pattern of species and lineage diversity
Even though total species richness was strongly positively correlated with lineage richness (ρ = 0.93, P<0.0001), differences between outputs reached circa 30% in some areas (Fig 2). The area with the highest overall species richness was located in the Diksam Mountains, whilst highest levels of overall lineage richness were found in the northeast and east of the island (Fig  2). Slope was most frequently the highest correlated LV with levels of richness, followed by ruggedness (highly correlated with slope) and total solar radiation ( Table 1). The richest areas for both total species and lineages were positively associated with solar radiation (ρ = 0.32 and 0.22, respectively, P<0.0001 for both), and negatively associated with garrigue-like habitat (basal alluvial grasslands and shrublands with Dactyloctenium robecchii, Justicia rigida and Croton socotranus, including mangroves; garrigue hereafter), ruggedness, and slope (ρ = -0.17 and -0.26, -0.34 and -0.24, -0.40 and -0.29, respectively, P<0.0001 for all; Table 1). The areas with the highest differences between species and lineage richness maps were located around the Zahr Basin, in the central plateaus, where relative species richness was considerably higher than lineage richness, and in the extreme east tip of the island, where relative lineage richness was higher (Fig 2). Differences in outputs were strongly negatively associated with longitude (ρ = -0.51, P<0.0001), and, unlike outputs for total species and lineages, also to shrubland (ρ = -0.27, P<0.0001), and positively associated with geology (ρ = 0.28, P<0.0001) and evapotranspiration (ρ = 0.26, P<0.0001; Table 1).
Spatial differences were much stronger when considering wide-range features than when considering restricted-range features (maximum of 43% versus 17%, respectively; Fig 2). While wide-range features showed similar spatial patterns as total features, restricted-range features showed the greatest differences between species and lineage richness in the Haggeher and Diksam Mountains, where relative species richness was considerably higher, and on the eastern part of the island, where the opposite pattern was evident. Restricted-range diversity (for species and lineages, respectively) was 5 to 7 folds more positively correlated with moisture and precipitation, and negatively correlated with temperature, shrubland and distance to Table 2. Correlation tests. Correlation values (ρ) of richness of total, wide-range and restricted-range species/lineages, and of the differences between species and lineage richness, with latitude (lat), longitude (lon) and 14 landscape variables (described in S3 Table). Absolute ρ>0.20 are highlighted in bold. All correlations were significant (P<0.05).  Enabling persistence in reserve design: An approach for island wadis than wide-range diversity (Table 1). On the other hand, wide-range diversity was more strongly negatively correlated with garrigue than restricted-range diversity (Table 1). Differences in wide-range outputs followed the same association trends with LVs as differences between total species and lineages, while differences in restricted-range outputs were not significantly associated with either one of the LVs (Table 1).

Gap analysis and reserve design
Almost all biodiversity features, except for one restricted-range species, Pristurus insignoides (Arnold 1986) and two species with higher levels of threat, the CR H. dracaenacolus, and the Near Threatened (NT) Hemidactylus granti (Boulenger 1899)-had very low (below 5%) representation within sanctuaries (   Fig 1), including mean (thick line), first and third quartile (box), 1.5 interquartile range (whiskers) and outliers (dots). Categories include species richness (sp, in red) and lineage richness (ln, in blue) for total (black thick lines) and restricted-range biodiversity features (restr., grey thick lines). There was almost no difference in the representation of richness within categories 1 to 3 of the ZP (Fig 3). Nevertheless, upper quantiles are higher in sanctuaries for restricted-range species than in general use areas (Fig 3 and Table 2). Except for the latter, upper quantiles of richness representation within category 0 and 3 were similar and always below 7% of target achievement. In fact, target achievements for binary richness were all below 98% in any ZP category, and therefore did not reach assigned targets (Fig 3 and Table 2).
An optimal solution for designing new sanctuaries to protect a targeted proportion of all species and lineage distributions was achieved using the reserve design software (Fig 4). The binary outputs for species and lineages encompassed 12.94 and 18.62% of the island area, of which 13.22 and 8.10% were already included in sanctuaries, and 47.00 and 65.92% in national parks, respectively. On the other hand, 37.35 and 24.97%, and 2.43 and 1.03% of the species and lineages outputs intersected resource or general use areas, respectively. These values were significantly different between outputs (Chi-square = 6871.25, d.f. = 3, P<0.001).

The importance of considering intraspecific diversity for characterizing spatial patterns
We identified and compared spatial patterns of species and lineage richness, showing that, even though total species and lineage richness are strongly positively correlated, differences between outputs reach circa 30% in some areas of Socotra Island, especially for wide-ranging biodiversity features. This highlights the importance of including lineages in conservation planning. Similarly, other studies found that conservation priorities for rainforest vertebrates in northeast Australia and large African mammals set to maximize species representation do not adequately represent patterns of intraspecific genetic diversity [14,24].
We determined the effect of environmental heterogeneity on inter and intraspecific patterns of diversity. We found that, overall, areas with highest reptile species and lineage richness are characterized by low habitat ruggedness, gentle slopes and high total solar radiation. Reptiles are ectotherms and depend on exposure to the sun for thermoregulation. In places with very steep slopes and high ruggedness sun exposure is limited. In addition, only reptiles adapted to climbing vertical surfaces could occupy those areas. Therefore observed patterns of Enabling persistence in reserve design: An approach for island diversity follow the expected habitat preferences of reptiles based on their ecological and physiological requirements [57]. Detailed studies on thermal and habitat ecology of the reptile species of Socotra should follow to learn more about the possible ecological and physiological drivers and constrains to intra-and interspecific diversification, similarly to what was found in other island reptiles [58]. In this study, high levels of richness are mainly associated with mountainous landscapes and areas that remained environmentally stable across geological times. Theory proclaims that species should show higher levels of genetic diversity in stable areas where suitable conditions were maintained across glaciation cycles due to long-term persistence of populations and consequent opportunities for diversification and population substructuring [59]. As such these refugia areas, with their higher levels of species and genetic diversity, are important for maximizing the persistence of biodiversity, particularly under global environmental change [60], and for guiding conservation planning. The pattern of increased species richness in mountain regions has been described for other islands and island taxa, including Cabo Verde plants [61], Madeiran beetles [62], and Cabo Verdean reptiles [15]. The elevational gradient on islands, associated with steep changes in climatic and habitat variables, is a strong selective pressure and an ecological barrier that promotes long-term diversification. Mountains act as islands-within-islands, with greater isolation by distance over smaller areas than lowland ecosystems, which results in higher speciation rates [61]. For example, in Socotra, the two sister taxa pairs P. insignis (Blanford 1881) and P. insignoides, and H. dracaenacolus and Hemidactylus granti [63] occupy different elevational ranges [34]. This phylogeographic pattern was also recovered for those species pairs when using only the COI marker [28]. The same pattern was also detected in closely-related reptile taxa in adjacent mainland Arabia [39,64].
Areas of highest lineage richness are also located in the northeast and east of the island, and largely coincide with the oldest rocks of the basement complex and Jurassic-Triassic, dated at around 476 and 145-252 million years ago (Mya), respectively [65]. Lineage richness is associated with similar landscape variables as species richness. However, lineage richness is also strongly negatively correlated with garrigue around the Zahr Basin, a shrubland habitat type found on limestone areas associated with dry climates and oceanic influence with high tolerance to salt. The distribution of this habitat on the island generally coincides with the most recent Quaternary rocks on Socotra Island that were submerged under water during interglacial periods [32]. These submersions likely resulted in population bottlenecks [66], reducing genetic diversity, that were followed by more recent population expansions, which are still observed in the star-shaped-patterns of the genetic network of most reptile species in Socotra [67,68]. Thus, older and geologically more stables areas of the island appear to have allowed the diversification and longer maintenance of lineage diversity in Socotran reptiles, including recent diversification, as seen for Hemidactylus [69]. Hence, the areas with the highest differences between species and lineage richness outputs were located around the Zahr Basin, where relative species richness is considerably higher than lineage richness, and in the extreme east tip of the island, where the opposite pattern is detected. Because the geological ages of areas of the island that are associated with high lineage richness, but not species richness, have a broad west-east pattern, longitude is, in turn, strongly negatively associated with differences between species and lineage richness.
Spatial differences are much stronger when considering wide-range than restricted-range biodiversity features. In general, wide-range diversity richness patterns correspond to overall diversity richness patterns, while restricted-range diversity shows a different pattern and landscape variable associations. Most LVs with important dissimilar associations with wide-range versus restricted-range diversity are water related (high moisture and precipitation, low distance to wadis and avoidance of dry shrublands). This highlights that many restricted-range species and lineages are specialists, adapted to the rare conditions of higher water availability on this arid island, as well as the importance of temperature for the diversification of reptiles [70]. Similar patterns were observed in other taxa from desert environments, such as in birds and bats [71,72].

The effectiveness of representing intraspecific diversity
Several biodiversity features have reduced predicted suitable ranges, potentially occupying less than 10% of the island's area, including the CR H. dracaenacolus, the NT H. granti and the DD P. brevis. This highlights the need to assess the adequacy of the existing ZP for conserving reptile diversity. Even though most biodiversity features are included within PAs, we found that additional sanctuaries (the only areas offering high levels of protection) need to be designed in order to protect a higher percentage of lineage distributions and to cover areas with highest species and lineage richness of reptiles (the main vertebrate group on Socotra). Contrary to the patterns shown in studies performed on other islands across the world [15,49,56], all individual species and lineages are well-represented within at least one of the categories of PAs in Socotra Island. Nevertheless, almost all biodiversity features have very low representation within sanctuaries, and twelve of them are better represented outside PAs than in sanctuaries. Moreover, although the suitable range of all species and lineages fall within PAs, substantial parts of the ranges of some species (e.g. P. brevis) are located partially outside national parks or sanctuary areas. Consequently, construction of infrastructures allowed in the resource use area could affect their protection in the future, especially taking into account human population growth and increasing demand for natural resources. In fact, a similar proportion of areas with the highest values of total and restricted-range biodiversity features is represented in general use areas and sanctuaries, even though the ideal situation would be for sanctuaries to encompass the richest areas, especially of restricted-range lineages. Similar to patterns observed in other parts of the world, under the existing reserve system, both average species and lineage richness is higher outside than inside PAs [73,74]. Higher richness outside PAs cannot be attributed to a low coverage of PAs on Socotra Island because this area is exceptional in that national parks and sanctuaries occupy 73% of its surface, but instead to the fact that the planning units chosen to be protected were not optimized for vertebrates.
Our reserve design study shows that using intraspecific diversity improves the results relative to the use of species diversity. Achieving targets for species requires similar number of grid cells than for lineages (12.94 versus 18.62% of the island area), however the overlap of proposed reserve areas for species with present sanctuary and natural park areas is proportionally 19% lower that with lineages, while overlap with general use areas is 2.4 fold higher. We therefore recommend the allocation of new sanctuaries for fauna, additional to the existing ones, according to our reserve design results based on lineage richness patterns (blue areas in Fig 4) because they maximize representativeness of all biodiversity features for reptiles and largely coincide with many important bird areas [29]. Lineage sanctuary areas identified in our study are generally located in close proximity to existing sanctuaries, with several new proposed areas partially falling within the boundaries of existing sanctuaries (Fig 4). Proximity to existing PAs is an important consideration because it will reduce PAs implementation and management costs, as well the renegotiation efforts with tribal chiefs of those regions. The next step will be to integrate data from other groups that could present different patterns from vertebrate taxa (e.g. invertebrates), and to use genetic markers with faster mutation rates (e.g. microsatellites) to assess connectivity (in terms of gene flow patterns) between proposed PAs.
In conclusion, we show that genetic diversity, here measured as number of lineages, represents distinct spatial patterns from the commonly used measure of species richness, which disregards evolutionary potential. Our approach to reserve design, which is based on spatial patterns of lineage diversity, enables the inclusion of the evolutionary processes of diversitification and speciation. Moreover, we highlight the importance of considering restricted-range species/lineages because their spatial patterns of richness and associations with landscape variables vary from those of wide-ranging biodiversity features and overall species/lineage richness patterns. Our proposed approach to reserve design can be readily applied to other archipelagos and terrestrial systems with available intraspecific-level genetic information.
Supporting information S1 Table. Details on sampled stations. The codes, region name and coordinates (latitude and longitude in WGS 84) are given for each station.