Species Delimitation and Morphological Divergence in the Scorpion Centruroides vittatus (Say, 1821): Insights from Phylogeography

Scorpion systematics and taxonomy have recently shown a need for revision, partially due to insights from molecular techniques. Scorpion taxonomy has been difficult with morphological characters as disagreement exists among researchers with character choice for adequate species delimitation in taxonomic studies. Within the family Buthidae, species identification and delimitation is particularly difficult due to the morphological similarity among species and extensive intraspecific morphological diversity. The genus Centruroides in the western hemisphere is a prime example of the difficulty in untangling the taxonomic complexity within buthid scorpions. In this paper, we present phylogeographic, Ecological Niche Modeling, and morphometric analyses to further understand how population diversification may have produced morphological diversity in Centruroides vittatus (Say, 1821). We show that C. vittatus populations in the Big Bend and Trans-Pecos region of Texas, USA are phylogeographically distinct and may predate the Last Glacial Maximum (LGM). In addition, we suggest the extended isolation of Big Bend region populations may have created the C. vittatus variant once known as C. pantheriensis.


Introduction
Scorpions are an ancient and widespread arthropod order well known for their medical importance as venomous arachnids [1]. Less known is their importance as model organisms for ecological research comprising key components of desert food webs [2]. These desert scorpion species have also been shown to exhibit ecomorphological specialization upon specifc habitats and possess morphological adaptations to unique edaphic substrates such as sand [2][3][4]. These edaphic specialist species illustrate the role of environmental effects upon scorpion morphological divergence and speciation. Other orogenic features such as mountain ranges can also produce profound effects upon scorpion species diversification and, until recently, these isolation effects were not fully understood. For example, geographic heterogeneity has been shown to differentiate the singular Buthus occitanus into several cryptic lineages [5]. Other recent studies have shown the importance of mountainous terrain and riverine barriers on the diversification of scorpions [6][7][8][9]. These recent studies also illustrate the impact of molecular taxonomy in revealing patterns of diversity unrepresented through traditional morphological analyses.
The current appreciation of scorpion diversity underscores the need for multiple lines of evidence to establish species delineation in scorpions. The importance of accurate species delineation is of paramount importance in the medically important Buthid family as accurate identification is needed for medical treatment of envenomation. One example of taxonomic ambiguity is illustrated in the Buthid genus Hottentotta. Sequence analysis of this genus, with the mtDNA Cytochrome Oxidase I, has challenged its current taxonomy as the COI sequences suggest a paraphyletic relationship in this genus' mtDNA [10].
The Buthid genus Centruroides, widely distributed in the Western Hemisphere and well known for its medical importance to humans, has also confounded scorpion taxonomists [1,11,12]. Species of this genus can exhibit considerable intraspecific morphological variability leading to taxonomic confusion [13,14]. For example, Centruroides exilicauda (Wood, 1863) in Baja California Sur, Mexico not only exhibits dramatic size variation from small, mainland individuals to gigantism on offshore islands near La Paz, but color variation ranging from pale forms north of La Paz to striped populations south of La Paz [13,15]. An example of taxonomic uncertainty within Centruroides, occurred when C. sculpturatus (Sonora, Mexico and Arizona, USA) was synonymized into C. exilicauda (Baja California) [13,16]. Valdez-Cruz et al. [17] unravelled this taxonomic mess with venom and molecular data, separating C. sculpturatus from C. exilicauda once again.
In the United States of America, not only has taxonomic uncertainty existed in the western Centruroides species, but also in the eastern Centruroides vittatus (Say, 1821). This species was separated into three different species (C. vittatus, C. chisosarius, & C. pantheriensis) after morphological investigation, but later synonymised as C. vittatus [18][19][20]. Both C. chisosarius (Gertsch,[18]) and C. pantheriensis (Stahnke,[19]) were originally described from the Big Bend region of Texas. C. chisosarius was recognized as exhibiting dark spots on its carapace with darker pigmentation on the tergites [20]. C. pantheriensis was the most morphologically distinct, with a pale color and resembling the more medically significant Arizona C. sculpturatus [19]. C. vittatus, in the eastern portion of its geographic range exhibits dark coloration with dorsal metasomal stripes [20,21].
Due to the confusing taxonomic history and the documented morphological diversity within this species, employing a phylogeographic approach to studying C. vittatus lineages should illustrate how pieces of evidence from diverse datasets can assist with species delimitation. We conducted phylogeographic, morphometric, and Ecological Niche Modelling analysis (ENM) of Centruroides vittatus to further understand the evolution of the C. pantheriensis variant within C. vittatus. Within our phylogeographic analyses, we conducted additional tests of several alternative tree topologies (hypotheses generated through initial review of results) based upon significance of Bayes factors [22][23][24].
As stated above, C. chisosarius, and C. pantheriensis are currently recognized as color variants of C. vittatus; yet, several questions remain regarding these variants. First, both were initially described and regarded as inhabitants of the Texas Big Bend region. Does C. vittatus from this area represent a unique phylogeographic clade? That is, do all color variants represent morphological variants within a distinct regional clade? The placement of the C. pantheriensis variant within such a clade appears possible as this variant appears to be restricted to the Big Bend region. In addition, is the pale color variant (C. pantheriensis) associated with a clade exhibiting deeper divergence (earlier evolutionary separation) when compared to other phylogeographic clades within the species? The Big Bend region possesses higher scorpion species diversity than other Texas regions [25] and may be the result of a long evolutionary history within the region. Furthermore, are there additional morphometric characters that distinguish the pale C. pantheriensis color variant from other C. vittatus populations? Is the pale form associated with a specific environmental habitat within C. vittatus' geographic range? Investigating these questions from a phylogeographic perspective can provide insight into the evolution of morphological variation and species delimitation in this genus.

Study Species
Centruroides vittatus encompasses a large geographic range within the U.S.A. that includes Texas, Oklahoma, New Mexico, Colorado, Kansas, Nebraska, Missouri, Arkansas, and Louisiana as well as sections of several states in the United Mexican States (Figure 1). This species, as do most Centruroides species, comprise errant or wandering scorpions that do not construct a burrow and commonly invade human habitations [26,21]. Throughout its geographic range, C. vittatus is commonly found in diverse ecological habitats, but in populations across the northern and eastern geographic distributions it appears to prefer dry, rocky south facing slopes or glade areas. Human introduction of this scorpion appears to also have created additional populations outside its known geographic range [21].

Field Collections
C. vittatus individuals were field collected throughout its US geographic range or obtained from collections through other personnel ( Figure 1 & Table S1). All necessary permits were obtained for the described field studies. After field collection, all samples were stored in 95% alcohol at 220uC. GPS coordinates were identified with a hand-held unit (Magellan GPS 315, Santa Clara, CA, USA). The appropriate populations sampled were identified through distribution records published in Shelley & Sissom [21]. As this scorpion inhabits a variety of habitats across its geographic range, a population can be difficult to define. Based upon previous scorpion dispersal estimates, we designated sampling sites less than 20 km apart as a single historic population [25]. This pooling of samples was conducted in the northern portion of the scorpion's geographic range where initial analyses indicated little genetic separation among sites. Voucher specimens were deposited in the Zoology collection at Arkansas Tech University. Completely pale C. pantheriensis individuals as described by Stahnke [19] were identified from our collections to test the hypothesis of a distinct clade that includes these individuals. For all phylogenetic analyses C. sculpturatus, C. exilicauda, C. gracilis, C. infamatus, C. suffusus mitochondrial Cytochrome Oxidase I (COI) sequences were obtained from GenBank for use as outgroups (Table S1). These Centruroides species inhabit the northern regions of Mexico along with C. vittatus and represent potential sister species to C. vittatus [14].
In addition, we amplified and sequenced an anonymous nuclear sequence (noncoding genomic locus) from a RAPD investigation for C. vittatus microsatellites [32][33][34]. From 24 cloned RAPD fragments, we selected a 728-bp region (Locus 1075) that showed sequence variability in an initial sequence survey among populations. This region was considered a unique locus as it showed no homology with any mitochondrial or nuclear regions in a GenBank BLASTN search. We developed PCR primers to amplify an internal 554 bp region from Locus 1075: Forward 59-GAA GGG CAG GTT TTC CTG TT-39 & Reverse 59-CAT TGC ACA AGT TCG TGA GG-39. This primer combination only produced amplicons from C. vittatus, but not C. sculpturatus DNAs.
Each PCR reaction for COI and Locus 1075 was performed in 25-mL aliquots with the following ingredients: 10-mL total genomic DNA, 2X Taq Buffer (150 mM Tris-HCl pH 8.5, 40 mM (NH 4 ) 2 SO 4 , 3.0 mM MgCl 2 . 0.2% Tween 20 ), 1 mM for each dNTP, 0.5 mM of each primer, 6.25 units REDTaq DNA polymerase (Sigma Chemical Co.), 1.6% Dimethyl sulfoxide, 0.6% BSA, and 1.6% Formamide. The cycling conditions consisted of an initial denaturation period of five minutes at 94uC followed with 30 one-minute cycles of 94uC, 50uC annealing, 72uC extension, and a final seven-minute extension at 72uC. After PCR products were verified with agarose electrophoresis in a 0.9% agarose concentration, they were GeneCleaned (Bio 101, Inc.). Forward and reverse DNA sequencing was performed with PCR primers for both sequences at the UAMS DNA Core Sequencing Facility on an Applied Biosystems 3100 Genetic Analyzer, Big Dye Terminator Chemistry, Kit version 1.1 (Foster City, CA, USA). For COI, two additional internal primers along with the previous PCR primers were employed to provide additional sequencing products for a more complete sequence contig: COI- After sequencing, all trace files were reviewed by eye and all ambiguous bases removed from further analysis. Alignment of the sequence data was conducted with Clustal X and Geneious Pro 3.7 [35,36]. After the initial alignment, all COI sequences were converted into their amino acid sequences to verify if any internal stop codons existed. All sequences were deposited in GenBank with the following accession numbers for COI: EF122605-EF122704, EU404114-EU404118, and EU381046-EU381110. GenBank accession numbers for Locus 1075 sequences are: EF122705-EF122787 & JF419172-JF419238. As recombination within mtDNA is reported for scorpions [37], we conducted an analysis for recombination detection with the online version of GARD with default settings [38]. This program can both detect recombination sites and recombinant sequences.

Phylogenetics
Aligned COI DNA sequences were entered into MODELTEST version 3.7 in HyPhy, and the model of nucleotide sequence evolution (GTR+I+G, -lnL = 7806.20) was chosen with the Akaike (AIC) criteria [38][39][40]. We analyzed these sequences with Maximum Likelihood and Bayesian methods. Maximum likelihood analysis was completed in PAML version 3.14 [41] as it performs phylogenetic analysis with explicit models of nucleotide evolution [41][42][43][44]. A Neighbor Joining tree was created in PAUP for the likelihood analysis. In addition, 1000 Maximum Likelihood bootstrap repetitions were conducted with the PhyML plugin for Geneious 3.8.5 [45]. Bayesian analyses were conducted with MrBayes 3.1.2 [46] with these parameters: four separate Metropolis-coupled Monte Carlo Markov chains, random starting trees with 20 X 10 6 generations with samples taken every 100 generations, and 25% of the resultant trees removed as burnin. We produced a 50% majority-rule consensus tree with nodal posterior probability support from the four runs post burn-in. Model parameters for the Bayesian analyses were the same as those in the Maximum likelihood analysis. The average standard of split frequencies was examined to determine if they dropped to a low, convergent value below 0.005. We also reviewed the outputs from the Bayesian analyses with TRACER v1.5 [47] to evaluate the robustness of the Bayesian analyses with respect to burn in, Effective Sample Size, stationary distribution, and posterior.

Population Statistics
As scorpion population divergence was considered to be potentially minor, additional analyses were conducted to better understand population structure and evolution. Analyses that consider population level processes such as a multitude of haplotypes in populations and recombination encompass parameters that may not be considered in strict phylogenetic analyses [48][49][50]. Haplotype network analysis was conducted on both COI and Locus 1075 sequences in TCS with 90% and 95% connection limits, respectively [48]. We lowered the COI connection limit to 90% as the 95% limit separated individuals from one population into two smaller networks. Any network loops that caused ambiguities were resolved according to Pfenninger & Posada (2002) [51].
To further explore patterns in our data, we conducted several population genetics statistics. These analyses were conducted with Arlequin 3.01 [52]. Populations were grouped into six regional groups based upon clade separations from previous Parsimony, Likelihood, and haplotype network analyses: Northeastern populations (east KS, MO, AR, LA, east TX, and OK); Laredo, TX; Aguirre Springs, NM; central populations combining Trans-Pecos and Central TX (west TX, NM, west KS, CO, and NE); Big Bend National Park, TX; and Hueco Tanks, TX area populations (Hueco Tanks and Chinati Hot Springs). As no clear geographic evidence exists to separate the scorpion populations into geographic regions and mountainous terrain appears to isolate them, we considered the regional groups based upon distinct networks created through the mitochondrial and nuclear haplotype network analyses as robust. In addition, other phylogeographic studies with species in the same geographic region also show similar population structure [53][54][55]. These analyses were conducted with scorpion populations to determine if any evidence of recent expansion and non-neutrality of DNA sequences existed in these regional groups. To test this hypothesis, Fu's F s and Tajima's D were calculated in Arlequin 3.01 [52,[56][57].
Significant negative values of these statistics indicate nonneutrality and population expansion: Fu's F s below a p-value of 0.02 indicate population expansion [52,57].  Fig. 4. and results for identification). Asterisked individuals (*) represent those identified as the C. pantheriensis variant with double asterisked clades (**) as those clades with all C. pantheriensis variants. Individuals or clades marked with a ''+'' symbol represent those specimens we identified as completely pale forms. Each color represents a general collection region: see results for further details. The inset box shows C. vittatus female morphologic diversity in four populations. The population designations are the same as in the Bayesian phylogenetic tree. doi:10.1371/journal.pone.0068282.g002

Divergence Dating
To further investigate migration and date population separation, coalescent analyses were conducted with Nielsen's MDIV and *BEAST v.1.6.1. Nielsen's MDIV was implemented in the Suite of Nucleotide Analysis Programs (SNAP Workbench) [58][59][60]. MDIV has been employed to estimate divergence and migration rates of single genes with a Bayesian model [58,[61][62][63]. MDIV is limited to two models of nucleotide substitution: HKY and Infinite sites. We followed the recommended HKY model for our analysis. COI sequences from selected population pairs were entered with a Hasegawa, Kishino, Yano (HKY) model of nucleotide substitution, 5 X 10 6 cycles for the Markov chain length, a burn-in time of 5X10 5 generations, and Mmax and Tmax values of 5 and 10, respectively. In MDIV, M = migration rate, T = divergence time, and TMRCA = time since the most recent common ancestor. Each population pair was run multiple times, changing the random starting seed each time to produce a more robust analysis. The output from each run was viewed graphically with MS Excel to determine credibility intervals for each population pair (M and H, respectively). Population pairs were reanalyzed with higher Mmax or Tmax values if the graphs did not indicate equilibrium in these values. The estimate T div was calculated with the formula T div = TH/(2m). Here H and T, the scaled divergence specify H and T (time), were estimated with MDIV; a m value of ,1% divergence per million years for scorpion mtDNA COI was obtained from rate estimates from the Mediterranean Mesobuthus scorpion genus as it represents a robust rate estimate for this mitochondrial gene [29,64].
The divergence date estimates with the BEAST software were produced with similar parameters to Bayesian analysis done in MrBayes but increased generation time (30X10 6 generations & 20% burnin) [47,65]. In these estimates, we reduced the outgroup species to Centruroides sculpturatus, C. gracilis, and C. infamatus. This analysis estimates several parameters (phylogeny & divergence dates) using a relaxed clock model. We calibrated the clock model with two dates: a Pleistocene divergence (1.561 mybp) of the Aguirre Springs, NM population as well as the mean estimate (6,00062,000 ybp) of the Hypsithermal climatic interval that may have affected the scorpion's distribution in its Northeastern range limits. Individuals from these populations were constrained into two clades with a normal distribution in the nodes with a Yule tree prior. C. vittatus has been introduced into areas outside its geographic range and exists in a wide range of climatic conditions; therefore, tree calibration with geologic or geographic barriers may be inappropriate for this species. The Pleistocene divergence of west Texas populations from those in New Mexico is documented in several species that inhabit this region: velvet ants (Dilophotopsis concolor) [66], snakes (Diadophis punctatus) [54], taran- tulas (Aphonopelma sp.) [55], and flightless longhorn cactus beetles (Moneilema armatum) [67]. We averaged the Pleistocene divergence dates from these species for our 1.5 mybp estimate. We chose the second calibration date of the Hypsithermal warming interval as it allowed an eastward expansion of many arid adapted species into the Interior Highlands of Missouri, Kansas, and Arkansas approximately 4,000 to 8,000 years before present [68][69][70][71][72][73]. The Hypsithermal interval calibration represents a well-studied eastward expansion period and has been associated with a potentially singular, rapid northeast expansion of species such as C. vittatus [68][69][70][71][72][73]. A pattern of recent, rapid expansion of scorpion populations coincident with a Hypsithermal expansion was evident in the phylogenetic analyses with very limited haplotype distribu-tion across the Interior Highlands. In addition, we conducted two other separate analyses in BEAST: an analysis with a singular calibration date at the Hypsithermal expansion (6,000 ybp) and another with no constraints but with the 1% scorpion COI mybp sequence divergence. The first dated run with the two calibration points is considered; however, we include the two additional results as supplementary materials to show variation in our calibration estimates. All Bayesian outputs produced through BEAST were also reviewed in TRACER for robustness in a similar manner to the Mr. Bayes simulations. We summarized the resultant trees in TreeAnnotator v1.6.1 to create a 50% majorityrule consensus maximum clade credibility tree.

Hypothesis Testing
We tested several alterative phylogenetic tree topology hypotheses with Bayes factors in the BEAST software through comparing constrained versus unconstrained clades [22][23][24]. First, we tested if any evidence exists for a C. pantheriensis clade. This variant was recognized in the Big Bend region and may represent an ecomorph restricted to this region. We identified those individuals that represent C. pantheriensis and constrained these samples as a single clade in BEAST. Second, we tested if the two New Mexico populations separated by the Tularosa basin (NM1 & NM2) could exhibit an alternative phylogenetic relationship in the same clade instead of their location in two distant clades in the unconstrained tree ( Figure 2). The White Sands formation in the Tularosa Basin has created rapid divergence between lizard taxa in spite of its recent age of 6000 years [74]. Third, we tested if populations in Big Bend National Park could exhibit a tree topology that places them with populations further east in Black Gap WMA (TX16) and Seminole Canyon (TX22). Ecological Niche Modeling analysis suggests optimum environmental contiguity between Big Bend populations and those within 200 km east. Lastly, we tested if C. vittatus populations could be placed into Eastern versus Western populations as suggested from a previous allozyme analysis of 15 populations [75]. This work placed C. vittatus populations into two distinct clades: a Western clade with those in Big Bend National Park and west from Guadalupe National Monument (generally west of 104 degrees longitude) and a Eastern clade east of the Texas Trans-Pecos region. In all these analyses, a Bayes factor is calculated as twice the -lnL harmonic mean difference between constrained and unconstrained analyses post burnin with differences above 10 suggesting strong evidence for hypothesis rejection [76,77]. The parameters for these analyses were equivalent to other previous phylogenetic analyses conducted in the BEAST program.   Figure 6. A tree created in BEAST to show regional clade divergence. Letters represent regional network clades as in Figure 4 with northeastern populations further divided into M1 (populations proposed to have been affected through the Hypsithermal expansion), M2 (Louisiana population), M3 (east central Texas population) and M4 (southeast Texas population). Node values represent divergence times in million years with MDIV divergence times in parentheses (see Table S2 for further MDIV divergence statistics and

Morphological Data
We separated individual scorpions into regional population groups delineated from the haplotype network analysis for morphological measurements (see Protocol S1 for a detailed measurement discussion). The scorpions measured included samples from across the range of C. vittatus in both the United States and adjacent states of Mexico. We also identified two population groups that consisted of the C. pantheriensis variant individuals. Individuals for these analyses were obtained from these museum arthropod collections: the American Museum of Natural History, the California Academy of Sciences, and the Arkansas Tech Museum of Zoology. We measured 356 males and 333 females, then entered each dataset into the NCSS 2007 statistical package (NCSS, Inc.) for Principal Components Analysis (PCA) and Discriminate Function Analysis (DFA) [78]. Although the PCA was able to discriminate among several groups, it could not clearly distinguish population partitions; therefore, we conducted Discriminate Function Analysis (DFA) as it allows prior group prediction then tests the robustness of each individual as a member of the predicted group. Initially, the first three factor residuals for each individual incorporating .75% of the variability from the PCA was entered into the DFA analysis; however, we switched to initial morphometric measurements as they created a better fit between predicted to actual categories. Additionally, we explored base 10 log transformation of the data but the transformations also yielded a lower fit between predicted to actual category. For this statistical analysis, we created 17 male    32  18  20  54  19  17  33  21  15  8  39  9  15  12  9 12 333 Each row represents actual groups whereas the columns represent predicted groups. Groups identified through the TCS analysis are shown in parentheses. The reduction in classification error (47.5%) shows the accuracy of the DFA compared to random classification. See Protocol S1 for further information concerning population designations and statistical analyses. doi:10.1371/journal.pone.0068282.t003 predicted groups and 16 female predicted groups. We also tested the robustness of the DFA through separation of 301 male and female individuals from several random groups.

Ecological Niche Modeling
We conducted Ecological Niche Modeling in MaxEnt 3.3.3a to determine suitable contemporary habitat and paleoclimatic distributions [79]. The 19 environmental layers were taken from the WorldClim data set (for a 30 arc-second resolution, http:// www.worldclim.org) and clipped into ArcGIS 10 (ESRI, Inc.) to encompass the entirety of the C. vittatus range [80]. The projected paleoclimatic distribution was produced with a matching climate dataset representing the Last Glacial Maximum (LGM), 21,000 years before present. This data set was created from clipping climate layers available in the Community Climate Model [81]. Both environmental data sets (14 climate layer clips -Protocol S2) and GPS positions for 96 scorpion collection localities were entered into MaxEnt for Ecological Niche Modeling. In the MaxEnt program, we conducted five replicate runs with these parameters: default convergence threshold, maximum iterations (1500), and 25% of the sites for model training [82,83]. We evaluated the model with the Area Under the Receiving Operating Characteristic (ROC) curve (AUC) that varies from a random prediction of 0.5 to 1 for maximum prediction. We chose 5% as the threshold for the continuous probability produced from the program for suitable climate conditions [82,83]. The contribution of each climatic variable was assessed through several variables: the percent contribution (the increase in gain of the model for an environmental variable), the permutation importance (the random permutation of a climatic variable to determine the degree the model depends upon the climatic variable), a jacknife of the environmental variables (to determine how well the model operates with only a specific environmental variable, and how well the model operates with the variable omitted and other variables included compared to all environmental variables in the model). The sample sites for the model were determined through field collection localities, museum collection locality data from the morphological data analysis, and collection records from Shelley & Sissom [21]. We identified GPS coordinates for the museum collections from collection locality notes and verifying locations through Google Earth 6.0 (https://earth.google.com). Any collection sites with unidentifiable or uncertain locality data were removed from this analysis.

Phylogenetic
We amplified and sequenced 161 COI samples. Each sample produced a sequence of 1450 nucleotides with 111 total haplotypes. The program GARD detected no recombination within the COI or Locus 1075 sequences. The Parsimony analysis with 1451 characters identified 1072 constant characters, 100 uninformative characters, and 279 informative characters. As the parsimony consensus tree reflected Maximum Likelihood and

Population Statistics
In the COI haplotype network analysis, with a 90% confidence limit of 24 steps, 13 networks were created that mirrored clades in the phylogenetic analyses (Figure 2 & Figure 3). We present three COI networks out of the 13 networks created from 106 haplotypes ''E''), but exhibits no more than three mutational steps between haplotypes ( Figure 5). The small haplotype numbers from the Northeastern populations mirrors the network created with COI and supports the rapid expansion into this region. Haplotypes from the central TX populations were distributed most frequently across the network with south TX haplotypes in three of the four subnetworks. Haplotypes from Northeast and Big Bend regional populations exhibited a strong association to their respective regions with few haplotypes in other subnetwork regions. Lastly, none of the haplotypes associated with the C. pantheriensis variant were in a separate sub network, but scattered throughout the Big Bend network.
The population diversity analysis in Arlequin for the COI data showed fewer haplotypes with little separation among haplotypes within the Northeastern region of the geographic range, indicating possible bottlenecks and recent expansion (Table 1). Moreover, Tajima's D and Fu's Fs supports the trends presented above, for the COI data. Significant negative values occur in populations in the northeast and Big Bend (Table 1).

Population Divergence and Hypothesis Testing
Both the BEAST and MDIV coalescent analyses support the trend of southwestern populations exhibiting greater divergence than those in the north ( Figure 6). Here, we report the results from our first BEAST analyses with the two calibration dates because estimates with a single date near a terminal node may not accurately date deeper divergent nodes [84]. We present estimates from the two other divergence analyses in the Supplementary data section (Table S3). The terminal clades from the BEAST analyses with substantially deeper origin times than the LGM of 21,000 years are populations adjacent to the Rio Grande River (''J & K ''-Laredo, Falcon, & Seminole; Chinati HS-''G'') ( Figure 6). The deeper clade divergence times from the calibrated BEAST analysis were markedly shorter than those calculated from the MDIV program, but those for more recent divergence (i.e., Northeastern expansion) fell within the ranges of both methods. MDIV divergence times (T div ) ranged from 78,000 years for AR/north Texas populations to 900,000 years between the Hueco Tanks/ Chinati populations in the southwest. BEAST divergence dates for these populations placed divergence at 7,000 for AR/north Texas to approximately 42,000 ybp for the Hueco Tanks/Chinati populations. All four alternative hypotheses tested with Bayes Factors indicated strong values for rejection (Table 2).

Morphological Data
We measured 356 males and 333 females for all morphometric analyses (Protocol S1). The PCA scree plot showed all morphometric measurements for both males and females except for pectine teeth number were equally represented in the first PCA factor; pectine teeth number composed the bulk of the second PCA factor. For the Discriminate Function Analyses (DFA), the trial separation of 301 males and females yielded 153/167 (91.2%) males predicted correctly and 130/134 (97.0%) females predicted  correctly. For the subsequent DFA analysis, all predicted groups in both male and females were generally placed in actual group categories with greater predicted to actual placement in Trans-Pecos populations (Table 3 & 4). Populations in this region designated as containing C. pantheriensis (G & F in Table 3 & 4), exhibited distinct morphometric identities from each other suggesting little evidence beyond color variation, to create a single, unique C. pantheriensis designation. Categories representing more northern populations (''E'' & ''M'') and those in south Texas (''K'' & Brownsville), generally exhibited a lower predicted to actual group relationship. We used F values to measure the significant impact of removing characters from the analysis [78]. For males, the three most important variables for separation were movable finger length (removed F-value: 10.82), chela width (removed F-value: 9.64), and chela depth (removed F-value: 8.74). These F values all represent F-probabilities #10 26 . All other characters exhibited removed F-values of ,6.50. For females, the three most important variables were the following: chela width (removed F-value: 9.29), carapace length (removed Fvalue: 6.48), chela depth (removed F-value: 5.72). These F values also all represent F-probabilities #10 26 . All other characters for females exhibited F values of ,4.25.

Ecological Niche Modeling
The ecological niche modelling (ENM) analysis shows environmental conditions are favourable for C. vittatus across much of its geographic range (Figure 7a). In five separate runs, the average training AUC value from 1500 replicates was 0.885 (sd 0.005; range 9.26-9.38) with the following top predictors: mean temperature of the warmest quarter (18.4%), mean annual temperature (15.6%), mean temperature of the driest quarter (14.1%), temperature seasonality (9.9%), mean temperature of the coldest quarter (9.8%), and precipitation seasonality (7.6%). The jacknife test for variable importance showed 'mean temperature of the driest quarter' as the most important variable when used alone. The most optimal conditions for this species exist in the Big Bend region of Texas with a second optimal area further east in Louisiana. Paleoclimatic modelling in MaxENT suggests two separate refugia for this species: one in Chihuahua, Mexico and another in Nuevo León and Tamaulipas, Mexico (Figure 7b).

Discussion
Our Bayesian phylogenetic tree ( Figure 2) shows strong support for the monophly of C. vittatus and generally corresponds to the Cuban Buthid phylogeny produced by Teruel et al. [85]. The Bayesian tree indicates strong support for C. gracilis as the sister species to C. vittatus and support for the C. exilicauda/C. sculpturatus separation proposed by Valdez-Cruz et al. [17]. Within C. vittatus, all phylogenetic trees showed marked separation among populations in the Trans-Pecos region and other western populations with reduced separation among populations in the northern and eastern regions of the C. vittatus geographic range ( Figure 2).
C. pantheriensis appears to be a morphological variant within C. vittatus. The individuals we identified as the C. pantheriensis variant are all found in clades centered in the Big Bend region and further west, and are generally associated with the Rio Grande River basin. The Bayes factor hypothesis test provides strong evidence against the existence of a C. pantheriensis clade (Table 1). Although superficial evidence exists for a separate C. pantheriensis clade, close inspection of the Bayesian tree indicates that within the Big Bend clade (''F''), several individuals exist within the clade that do not correspond to the C. pantheriensis variant. In addition, the Black Gap clade (''A'') is geographically adjacent to Big Bend, yet the Bayesian tree places these clades in markedly different branches. These individuals from the Black Gap clade were all initially identified as those belonging to the C. pantheriensis variant. However, all the individuals we identified for the phylogenetic analysis as the colorless, pale form of C. pantheriensis were those from the Big Bend, Chinati HS, or Hueco Tanks populations. These populations form a separate clade in the C. vittatus phylogeny. The nuclear dataset also supports the separation of Big Bend regional populations ( Figure 5). Therefore, we cannot exclude the possibility that the colorless, pale form is the result of a unique Big Bend and western Rio Grande basin phylogeographic history and may exhibit incipient speciation.
It is likely that the historic factors that created the Big Bend regional clade produced this unique pale morphological variant along with the other C. vitttaus variants observed in this clade, e.g., C. chisosarius. Morphometric analysis (DFA, Table 3 & 4) suggests that populations in this region are distinct from each other and neither create a separate Big Bend cluster nor C. pantheriensis cluster. This result, when coupled with climate data (ENM), suggests these populations evolve morphological distinction, even in similar environmental conditions. The DFA markedly separates the morphologically similar Aguirre Springs and Oliver Lee populations from each other and also from the pale, colorless individuals in the Hueco Tanks population. These populations also exhibit phylogenetic distinction from each other, yet all are within 100 km. In addition, ENM shows contiguous optimum environmental conditions exist in the Big Bend region and populations further east, yet historic factors appear to over ride contemporary environmental conditions by creating distinct phylogenetic breaks among clades in adjacent populations to the east. This result is noteworthy as the Big Bend region is reported to contain the highest scorpion species diversity in Texas [25]. This high diversity and population isolation is likely a consequence of the region's topographic complexity [14]. The scorpion fauna in the Big Bend region and adjacent states in Mexico are more likely to show endemism due to scorpion's low vagility [14], and our results suggest even errant species such as C. vittatus evolve marked isolation among populations despite little environmental heterogeneity among populations.
The paleoclimate reconstruction for C vittatus shows optimal environmental conditions are predicted to have been restricted to a large area in the southern Rio Grande valley and one further west in Chihuahua, Mexico (Figure 7b). This western and eastern separation is mimicked in other southwestern US desert adapted organisms [83,86]. However, divergence dating suggests several populations existed prior to the 21,000 LGM date and these scorpions were not restricted to such refugia ( Figure 6 & Table S2). The earlier divergence date of several populations also suggests the C. pantheriensis variant arose independent of population age and does not appear to be associated with an early divergence date. Although our MDIV estimates are greater than BEAST dates, they are within the estimates for beetle divergence intervals in the same region [67,87]. MDIV divergence estimates, that include some migration, can indicate deeper divergence than other models that assume no migration between population pairs [58,88]. Furthermore, MDIV may be more appropriate for species that experienced Pleistocene divergence, i.e., those with finite population ages [88]. It is important to note that credibility intervals created in MDIV analyses show wide ranges; however, we present these results to suggest scorpion populations may have existed in much of their range for many years with historical population divergence and recent expansion into the most northeastern portions of their current geographic range.

Centruroides vittatus Phylogeography and Species Delimitation
The larger phylogeographic analysis of C. vittatus suggests many populations expanded after the LGM, but populations along the Rio Grande river existed outside predicted refugia ( Figure 6). In addition, distinct phylogeographic breaks occur in the Trans-Pecos and Big Bend regions, as well as within the central Texas and Northeast populations (Figure 4 ''E'' vs ''M''). In this species, the populations along and adjacent to the Rio Grande valley in the western section of the scorpion's geographic range show the most unique and complex phylogenetic relationship to each other with reciprocal monophyly among populations and separation into markedly distinct clades and haplotype network networks in spite of similar environment and limited geographic distance (Figures 4 & 5). Bayes factor testing (hypothesis 2: single origin of Aguirre Springs and Oliver Lee populations in the Tularosa basin), ENM results, and divergence dating suggests the Tularosa basin was a barrier and these populations independently expanded into these areas after the LGM. The rejection of the third Bayes factor hypothesis (single origin in similar ecological niche of Big Bend area populations) suggests contemporary climate conditions have little effect in homogenizing populations even with optimum environmental conditions. The morphometric data also support the separation of Trans-Pecos populations, as these populations (D, F, G, & I- Table 3 & 4) exhibited the highest predicted to actual numbers. Interestingly, populations south of the Trans-Pecos (Laredo, & Falcon Lake-K & Brownsville) also within the region of a LGM refugium exhibit lower predicted to actual numbers with individuals placed in surrounding and nearby regional populations.
Populations in the Northcentral (''E'') and Northeast (''M'') appear to originate at approximately the date given for the LGM, but the direction of expansion occurred independently into the range extremes ( Figure 6). Most northeastern populations in the Interior Highlands of Arkansas, Missouri, and Oklahoma appear to have expanded rapidly coincident with the Hypsithermal expansion of prairie species into this region. Both the mitochondrial COI and nuclear 1075 data support many populations in this region sharing the same haplotype with little genetic separation among populations (Figure 3, 4,5). This rapid expansion is also verified through the significant Fu's F and Tajima's D statistics. No other regional category exhibits significant values for both statistics ( Table 1). The morphometric analysis also shows the two northernmost regions (''E'' & ''M'') exhibit the lowest actual to predicted numbers. When viewed with respect to the ENM modelling, we conclude morphometric similarity among the populations in these regions is due to rapid expansion rather than shared environment. The fourth Bayes factor hypothesis test rejects the division of C. vittatus populations into eastern and a Trans-Pecos cluster as proposed in Hedgecock [75]. Our phylogenetic analysis suggests a more complex population subdivision across the vittatus geographic range with greater divisions among the Tran-Pecos populations. Lastly, the Wichita Mountain population (''L'') is distinct in all analyses. In the Bayesian tree (Figure 2), it appears as a unique clade: within the BEAST divergence trees (Figure 6.), it appears as a sister clade to the Northeastern population clade (''M''). This population may represent an independent Great Plains expansion, also recovered in the Nightsnake [89].
As seen in other scorpion phylogeographic studies, C vittatus exhibits distinct clades that represent isolation and population divergence. Although restricted to a specific geographic region, we find no support for a distinct C. pantheriensis clade. However, several populations within the C. pantheriensis variant show phylogenetic and morphological distinction and may fall within the parameters for a cohesion species, i.e., populations that exhibit common ancestry, are genetically exchangeable, and ecologically interchangeable [90,91]. We refrain from delimiting these populations as distinct species as further data from venom analysis and intervening populations are lacking and these data would strengthen any support for further species delimitation [17]. We also stress that a multifactoral approach is important for scorpion species delimitation as scorpion population isolation and speciation appears to be the result of several interwoven factors.

Conclusions
We conclude the pale C. pantheriensis variant is due to C. vittatus' unique phylogenetic history in the Big Bend region. In this region, these scorpions exhibit marked phylogenetic and morphological separation despite similar environmental conditions among populations. We show errant scorpions such as C. vittatus exhibit a surprising and diverse phylogeographic structure. Our results suggest the Texas Big Bend and Trans-Pecos region house divergent scorpion populations and may represent a region significant for scorpion speciation. This study argues for further phylogeographic research to understand scorpion diversity in this region and the evolution of the morphometric diversity within the Centruroides genus. Additional sampling throughout the Trans-Pecos is needed to characterize the diversity and extent of the Big Bend C. vittatus clade.