Population and Demographic Structure of Ixodes scapularis Say in the Eastern United States

Introduction The most significant vector of tick-borne pathogens in the United States is Ixodes scapularis Say (the blacklegged tick). Previous studies have identified significant genetic, behavioral and morphological differences between northern vs. southern populations of this tick. Because tick-borne pathogens are dependent on their vectors for transmission, a baseline understanding of the vector population structure is crucial to determining the risks and epidemiology of pathogen transmission. Methods We investigated population genetic variation of I. scapularis populations in the eastern United States using a multilocus approach. We sequenced and analyzed the mitochondrial COI and 16S genes and three nuclear genes (serpin2, ixoderin B and lysozyme) from wild specimens. Results We identified a deep divergence (3–7%) in I. scapularis COI gene sequences from some southern specimens, suggesting we had sampled a different Ixodes species. Analysis of mitochondrial 16S rRNA sequences did not support this hypothesis and indicated that all specimens were I. scapularis. Phylogenetic analysis and analysis of molecular variance (AMOVA) supported significant differences between northern vs. southern populations. Demographic analysis suggested that northern populations had experienced a bottleneck/expansion event sometime in the past, possibly associated with Pleistocene glaciation events. Conclusions Similar to other studies, our data support the division of northern vs. southern I. scapularis genetic lineages, likely due to differences in the demographic histories between these geographic regions. The deep divergence identified in some COI gene sequences highlights a potential hazard of relying solely on COI for species identification (“barcoding”) and population genetics in this important vector arthropod.

Multiple DNA sequences have been used to examine the evolution and population genetics of I. scapularis [5][6] [7][8] [9][10] [11] [12].With the sequencing of the I. scapularis genome, it has become much easier to screen for novel genetic markers for this species [12].The taxonomic history of I. scapularis has been somewhat contentious, with some researchers claiming that it is actually a species complex consisting of I. scapularis in the Southern USA and I. dammini in the north [13] [14][15] [16][17] [7][8] [18][19] [20], While scientific consensus has for the most part rejected this interpretation, it is clear that there are significant genetic and demographic differences between northern and southern populations of this tick [21][3] [12].To address this issue, we investigated genetic variation in I. scapularis populations in the eastern United States using a multilocus approach in which we sequenced and analyzed the mitochondrial COI and 16S genes, and three nuclear genes (serpin 2, ixoderin B and lysozyme).

Field tick collections
Tick samples were collected during 2006-2012 by flagging with a 1 m 2 canvas cloth.Samples were catalogued, surface-disinfested, and extracted immediately or stored at 280uC until extraction (see below).I. scapularis adults or nymphs were collected from populations from Wisconsin, New Hampshire, Pennsylvania, Mississippi, North Carolina and New York (Table 1, Figure 1).GPS coordinates were inputted into an online GPS generation program to generate a map of the collection sites on a map of the United States (http://www.gpsvisualizer.com;map image from the public domain [http://nationalmap.gov]).No ethical clearance was required to conduct research on invertebrate ectoparasites.All samples were either submitted by private collectors or collected by the authors after obtaining appropriate permissions.

DNA extraction
Prior to DNA extraction, each tick sample was individually surface-disinfested with 95% ethanol for 15 s, 10% bleach for 60 s, washed sequentially in 3 baths of sterile nuclease-free water, and dried on autoclave-sterilized filter paper in a sterile petri plate.Each adult tick was bisected and one half archived at 280uC, while the other half was used for DNA extraction.Nymphs were extracted in their entirety rather than bisected.Bisected samples were frozen briefly (30 min/280uC), macerated with a sterile micropestle, and the DNA extracted according to the respective manufacturer's guideline.Genomic DNA was extracted from legs or small fragments stored in 95% ethanol from the ''unknown'' samples (MSU A-D).The ethanol was completely evaporated from the samples prior to DNA extraction.Genomic DNA was extracted from samples using either the DNeasy Blood and Tissue kit (Qiagen 69506) or the GenElute Bacterial Genomic DNA kit (Sigma NA2110), following the manufacturer's guidelines.DNA concentration was determined with a Nanodrop spectrophotometer and adjusted to 5 ng/ul prior to use in PCR.For comparative purposes, genomic DNA was extracted from egg masses from 3 individual females derived from the Wikel colony, which was used to generate the I. scapularis whole genome shotgun sequences on Vectorbase.

Amplification of mitochondrial DNA (cytochrome oxidase c subunit I and 16S)
Primers COI907F and COI907R (Table 2) were designed to amplify a 907 bp fragment of the I. scapularis cytochrome c oxidase subunit I gene (GenBank Accn# ABJB010748661.1).Primers were designed using Primer3 [22].Samples were initially screened using 10 ul reactions contained 1 ul genomic DNA, 0.2 ul each forward and reverse primer (10 mM each), 5.0 ul of 2X Taq Master mix (New England Biolabs, M0270), and 3.6 ul of sterile nuclease-free water.Amplification conditions were as follows: 95uC/5 min, 35 cycles of 95uC/30 s, 60uC/30 s, 72uC/60 s, and a final extension of 72uC/10 minutes.Samples that were positive by PCR were amplified for sequencing in 50 ul reactions using Phusion High-Fidelity DNA Polymerase (New England Biolabs, M0530) as follows: 5 ng of DNA were mixed with 1 ul of dNTP mix (10 mM each), 1 ul each of forward and reverse primers (10 mM), 0.5 ul ( = 0.01 Units) Phusion polymerase, 10 ul HF buffer, and 35 ul sterile nuclease-free water.Samples were amplified as follows: an initial melting cycle at 98uC /30 s, and cycled 35 times at 98uC/30 s, 60uC/30 s, and 72uC/30 s, followed by a final extension of 72uC/10 minutes.PCR products were prepared for sequencing by treating with ExoSAP-IT (USB, Affymetrix Cat# 78201) following manufacturer's guidelines.The samples were sequenced in both directions on an ABI 3130/ Genetic Analyzer.
Several samples had COI sequences that were highly divergent (see results).A subset of these samples was sequenced at the 16S rRNA gene for comparison to the previously described haplotypes of I. scapularis [3][9] [8].We amplified a 460 bp fragment using primers 16S+1 and 16S-1 (Table 2), using a modified step-down protocol to eliminate nonspecific products: 94uC, 4 cycles of 94uC/15 s, 58uC/15 s, 68uC/30 s, 4 cycles of 94uC/15 s, 54uC/ 15 s, 68uC/30 s, and 27 cycles of 94uC/15 s, 50uC/15 s, 68uC/ 30 s, with a final extension of 68uC/5 minutes.Three samples each from the two divergent COI clusters were sequenced, along with one ''typical'' haplotype of I. scapularis each from Mississippi and Pennsylvania.To confirm that the samples from the two divergent clades were not actually from different species, samples (leg or body fragment) of known species identity were received and analyzed in a blinded manner for cloning (Strataclone PCR cloning kit, Stratagene #240205) and sequencing.All sequences were aligned and compared to the 16S sequences from Rich et al

Amplification of nuclear genes
I. scapularis is known to harbor a maternally-inherited endosymbiotic Rickettsia [23] [24].All samples in this study were assayed for the rickettsial outer membrane protein A (rompA) [25] and all tested positive (data not shown).Since there have been studies suggesting that mitochondrial evolution might be affected by endosymbionts [26][27], we chose to sequence three additional nuclear genes from 4 of the 6 populations assayed for COI.Primers were designed to amplify fragments of three putative innate immune genes, identified through blastn homology to the I. scapularis genome and EST sequence data on NCBI and Vectorbase.The three genes were: 1) a salivary gland-specific lectin (ixoderin B, Vectorbase ID IscW_ISCW013797; GenBank NW_002845951.1);2) a lysozyme (Vectorbase ID Isc-W_ISCW020680; GenBank NW_002737421.1);and 3) a serine protease inhibitor (serpin 2 precursor, Vectorbase ID Isc-W_ISCW011017; GenBank NW_002630218.1).Fragments (between 500 and 600 bp) were amplified and sequenced from the Wisconsin, Mississippi, New Hampshire, and North Carolina specimens, and from 5 Wikel colony females.

Post-sequencing data analysis
For each sample, the aligned sequences were checked against trace files using MEGA 5.2.1 [28].The consensus of both strands was generated using jEmboss [29].The consensus sequences were aligned with MUSCLE 3.5 [30] and trimmed to overlapping regions that were as follows: 755 bp for COI, 360 bp for 16S rRNA, 413 bp for ixoderin B, 551 bp for serpin 2, and 396 bp for lysozyme.

Confirmation of sequences by blastn and the BOLD Identification system tool
Aligned and trimmed sequences were run through blastn (Blast2.2.28+) against all NCBI nucleotide (nt) databases [31].Additionally, sequences were run through the BOLD identification system (IDS) identification engine [32].Sequences that had less than 98.0% blast identity match to I. scapularis were submitted to the BOLD IDS to assess how closely the sequences matched archived I. scapularis COI sequences.The BOLD IDS runs a global alignment of all queries against a COI-specific Hidden Markov Model (HMM), and then compares the results against the reference library.The IDS engine provides species designations if the query is less than 1% divergent from a reference sequence, and genus designations if there is less than 3% divergence.Each sample is used to generate a phylogenetic tree and identification determined based on the tree output.
When several of the Mississippi sequences did not produce a species level identification (based on the BOLD criteria, species level identification is not provided for samples that are more than 1% divergent from a reference sequence), we compared our sample sequences to I. scapularis COI sequences from other studies in the GenBank database (Table S1), including the complete coding sequence from the I. scapularis whole genome shotgun sequencing project from which the original COl primer sequences were designed, sequences from Ixodes scapularis in Canada, and a sequence from I. scapularis used by Noureddine et al 2011 as an outgroup to I. ricinus [33] [34].These were aligned with our sequences and trimmed to 338 bp (overlapping region for all sequences) for phylogenetic analysis.
To confirm our divergent Mississippi sequences were I. scapularis and not a sister species, we also sequenced the 16S gene from a subset of the divergent Mississippi samples and compared them phylogenetically to 16S rRNA sequences found on GenBank (Table S2).

Phylogenetic analysis
Phylogenetic analysis was conducted using MrBayes version 3.2.1 [35].MrBayes uses a Markov Chain Monte Carlo (MCMC) integration method to estimate posterior probabilities.The evolutionary model implemented for the nucleic acid sequences was estimated using jModelTest [36].Best models for each gene were as follows: TPM3 (three parameter model) for mitochondrial genes COI and 16S (gamma = 0.1540 and 1.360, respectively); Kimura (K80) for lysozyme (gamma = 0.0940), ixoderin B (no gamma), and model TrN for serpin 2 (equal base frequencies, pinversion = 0.8870).The MCMC was run until the average standard deviation of split frequencies was below the stop value of 0.01 (using the stoprule option), sampled every 100 generations, with the default of 25% burn-in specified.A 50% majority rule consensus tree was constructed from the remaining 75% of the trees.MrBayes was also used to generate a phylogenetic tree from the concatenated sequences from the 3 nuclear genes.Since each of the genes had different substitution model conditions suggested by jModelTest, the different model conditions were specified for each gene (partition) in MrBayes.For the COI gene, the matching fragment of the complete mitochondrial genome sequence of Ixodes ricinus (GenBank Accn# JN248424) was used as outgroup.Ixodes pacificus haplotype SSCP CA_2 (GenBank Accn# AF309009) was used as outgroup for the mitochondrial 16S rRNA gene.No suitable outgroup sequences were available for the nuclear genes and these trees were thus left unrooted.
Figtree version 1.3.1 was used to view trees produced by MrBayes (http://tree.bio.ed.ac.uk/software/figtree/).When the 16S rRNA sequence from the blinded sample ''MSU-A'' (I.brunneus) was determined to be more divergent than the chosen root (I.pacificus), the tree was left unrooted.A second Bayesian analysis was run with just the 16S rRNA sequences from I. scapularis samples and the outgroup I. pacificus.

Population structure, neutrality, diversity, and genetic distance analyses
For the COI gene-based population genetic analysis, Arlequin 3.0 [37] was used to partition genetic variation between Northern USA (NY, WI, NH, PA) and Southern USA (MI, NC) populations using analysis of molecular variance (AMOVA).

Results
In total, we obtained clean bi-directional COI sequences from 89 out of 94 specimens assayed from 6 populations (four northern USA: PA, WI, NY, NH; two southern USA: NC, MS).Samples that had clean sequences in only one strand were excluded from phylogenetic analyses.After sequencing, each sample was confirmed to be Ixodes scapularis by blastn homology to GenBank and against the BOLD IDS (described in the Methods).Several of the Mississippi sequences were not assigned species-level identification on the BOLD IDS (Table 3).

Phylogenetics
Phylogenetic analysis was conducted on each gene, and on concatenated sequences of the three nuclear genes.In the COI tree, many of the MS samples clustered into two clades that were highly divergent from the other sequences (3%-7%) (Table 3, Figure 2).This amount of divergence is significantly greater than the BOLD ,1% or ,3% divergence criterion for assigning species or genus level designation.We next confirmed that the sequence divergence was specific to our Mississippi samples by comparing all our samples to other COI sequences that were available on GenBank (Table S1, Figure S1).
To address whether we had misidentified some of the Mississippi samples, we cloned and sequenced a fragment of the 16S rRNA gene from a subset of divergent samples as well as known conspecifics from Mississippi.When we compared the sequences phylogenetically, it became evident that 1) there were several divergent haplotypes in Mississippi, 2) although COI clade separation was supported by 16S analysis, our divergent samples clustered with I. scapularis sequences from Genbank, and 3) though highly variable, both clades of I. scapularis 16S sequences were closer to each other than to the other congeneric sequences (Figures 3 and 4, Table S2).
To clarify these results, we sequenced 3 additional nuclear genes (ixoderin B, lysozyme and serpin 2).We performed phylogenetic analysis on individual genes (Figures S2, S3, and S4) and also on concatenated nuclear sequences for all samples for which all three sequences were available (Figure 5).In the concatenated analysis,  most of the Mississippi samples clustered together, apart from the other samples.This clustering was primarily due to the very strong support for the Mississippi clade in the lysozyme gene tree (Figure S3), which was not reflected in the other nuclear genes (Figures S2  and S4).

Diversity measures and demographics
We identified 58 unique haplotypes in our COI data.The Shannon index for each population in our COI dataset ranged from 1.906 to 2.659 (Table 4).We analyzed the heterogeneity of alleles within the population using the haplotype genetic diversity index (h and uh) to estimate the probability that two haplotypes are different.The unbiased diversity probabilities ranged from 0.934 to 0.980, indicating high probabilities of haplotype diversity within populations (Table 4).In addition, we calculated the evenness to be between 0.7768 and 0.9299.Collectively these data indicate that, while the haplotype diversity was high, haplotypes were more or less evenly distributed within each population.In the southern USA, COI sequences did not deviate statistically from neutrality based on Tajima's D test and Fu and Li's D* and F* tests (Table 5).However, in the Northern USA, Tajima's D and Fu and Li's D* and F* were significantly negative.Negative values for these statistics are often indicative of directional selection or population expansion.Population expansion was further supported by the mismatch distributions and site frequency spectra.The mismatch distribution for the Northern USA showed a peak that was consistent with population expansion (Figure 6A).The site frequency spectra exhibit an excess of singleton mutations, also consistent with population expansion (Figure 7A).For the southern USA, the mismatch distribution and site frequency spectra are not consistent with an expansion event (Figures 6B and 7B).
Nuclear genes indicated a more variable evolutionary history.Serpin 2 did not deviate from neutrality in any population assayed (Table 5).For lysozyme, Tajima's D and Fu and Li's D* and F* were significantly negative in WI and NC.Results for ixoderin B were not consistent between tests, although southern samples were significantly negative suggesting possible directional selection on this gene (Table 5).

Population structure
Analysis of molecular variance (AMOVA) was conducted on each gene.Obtained sequences were highly variable (Table 6).For all genes, the majority of variation was explained by variation between individuals within populations (41.6%-89.5%).However, our analysis also showed that collection region (North or South) explained significant variation for most tested genes.For the mitochondrial COI gene, and for 2 nuclear genes (lysozyme and serpin 2), a significant proportion of molecular variation was explained by Northern vs.Southern location (COI: 16.8%, P = 0.03; lysozyme: 37.5%, P = 0.026; serpin 2: 6.9%, P = 0.05).Although not significant, the trend for ixoderin B was similar (14.6%,P = 0.059) (Table 6).

Population genetic distance
Mantel tests were conducted on COI, and the three nuclear genes, but no significant isolation by distance was detected.Due to   A high I indicates high diversity and evenness, while I = 0 indicates a single haplotype in a population.Equitability (Eh) ranges from 0-1, with 1 representing complete evenness in a population.Haploid genetic diversity index (h and uh) estimates the probability that two haplotypes will be different.doi:10.1371/journal.pone.0101389.t004the limited number of populations sampled, it is likely there is not enough power to detect a statistically significant relationship.

Discussion
It has been long established that there are differences between northern and southern populations of I. scapularis in the United States [43].At one time, they were considered by some groups to be two distinct species, based on biology and collection location (I.scapularis in the south and I. dammini in the north) [20].Northern ticks have a 2-year life cycle and feed primarily on mammalian hosts, while the Southern ticks complete their lifecycles in a year or less depending on environmental conditions and feed primarily on lizards [44].In addition, nymphs of southern populations of I. scapularis rarely bite humans and seem to quest in or beneath leaf litter instead of several inches above ground like those in northern areas [45][46][47] [48][49] [50].In 1993, Oliver et al designated I. dammini as a junior synonym of I. scapularis based on reciprocal crosses, assortative mating experiments, and morphometric criteria [18].Subsequently, molecular studies confirmed that, while some data support distinct northern and southern lineages (using both mitochondrial 16S and 12S rRNA genes, and nuclear ITS1 and ITS2 regions of the ribosomal genes), there was insufficient evidence for isolation of the two morphotypes [19][9][14] [3][12].
Similar to other studies, our mitochondrial and nuclear gene data support the division of Ixodes scapularis into several distinct lineages.We identified one clade that occurred only in southern population samples, and another clade that occurs throughout the northern and southern collection region.Our results suggest that northern and southern populations have significantly different demographic histories.Data from previous studies using 16S sequences have been used to infer that I. scapularis possibly originated in the southern USA, followed by migration of a small founder population into the north when the Laurentide ice sheet (Pleistocene era) receded [51] [12].Our COI sequence data support this hypothesis.We detected statistical signatures of population bottleneck and expansion in northern populations, but not in southern populations (Figures 6 and 7, Table 5).Population bottlenecks and resultant expansions coincident with Pleistocene glaciation events have been detected in other vector taxa [52][53] [43].We speculate that demographic history may explain genetic differences between northern and southern I. scapularis populations.
Our results spotlight a potential hazard of relying solely on COI for species identification (''barcoding'') and population genetics.The COI gene has been shown to be a powerful tool for identification of species or cryptic species [54][55].There have been criticisms about dependence on a single gene for identification, particularly mitochondrial genes, in part because the evolution of mitochondrial genes may be influenced by endosymbionts [27] [26].Because of the extent of the sequence divergence in our Mississippi samples, we might have concluded (on COI sequences alone) that we had sequenced 2 different species, or even 2 different genera.However, further confirmation with additional genes revealed that, while these samples were very divergent at the COI locus, they were, in fact, I. scapularis.
One explanation for the deep COI variation we observed in MS populations might be due to the presence of cryptic diversity within I. scapularis.A cryptic subspecies would be one that is morphologically similar to another, not reproductively isolated, but may have some genetic difference(s) that can phylogenetically separate the two.In I. uriae (seabird tick), Kempf et al 2009 compared microsatellite data (ecological timeframe) to COIII sequence data (evolutionary timeframe) to determine the timeframe of cryptic race divergence within Ixodes uriae and concluded that there was evidence of a cryptic race complex correlated with seabird hosts [56].This geographical/host-associated variability was not previously detected with more slowly evolving genes such as cytB and nuclear ITS-2, suggesting that the evolution of these variants was a recent occurrence, and possibly may have arisen multiple times [56].Whether our data indicate a similar cryptic race complex within I. scapularis will require more scrutiny of a much larger dataset (i.e. more individuals from many more populations from southern locations in the United States).
Another possible hypothesis to explain the COI divergence of southern populations of I. scapularis in the United States may be that the cytoplasmically-linked Rickettsia might play a role.Due to maternal inheritance, mitochondrial lineages can hitchhike with cytoplasmically inherited endosymbionts, leading to structuring of mitochondrial sequences that do not correlate with population differentiation [57] [26].Although the frequency of the I. scapularis Rickettsia endosymbiont has been reported in ranges from 40-100% [23][24], our own samples were 100% infected.However, in this study we did not sequence genes from the Rickettsia endosymbiont, so we do not have genetic data on the endosymbiont populations to support or refute the involvement of the Rickettsia in host COI divergence.It is unclear what the effect of the symbiont is on I. scapularis biology and evolution.We are currently conducting analyses of the endosymbiont population genetic structure.These data will be overlaid on vector population structure to possibly address whether or not this (or other symbionts) are affecting tick population structure.
Other hypotheses that might possibly explain the COI variation include nuclear genes of mitochondrial origin (''numts''), paternal mitochondrial leakage or mitochondrial heteroplasmy.''Numts'' are instances of mitochondrial DNA that have been inserted into the nuclear genome and are therefore under different selection pressures [58].We did not find that our COI sequences matched anything outside of the mitochondrial DNA when compared to the I. scapularis genome.Paternal mitochondrial leakage has been described in several systems including Drosophila simulans, in which 0.66% of 4092 offspring contained paternal mitochondrial COI [59], but has not been described in ticks.Relatively high levels of mitochondrial heteroplasmy have been observed in the tick Amblyomma cajennense [60], but the phenomenon has yet to be described in Ixodes.
Further in-depth population studies into more populations dispersed throughout the eastern seaboard and states east of the Mississippi would appear to be in order.In addition to the markers  we have described here, there are others that could, collectively provide a more robust data set.With the advent of next generation sequencing and overall cost of sequencing becoming more affordable, it will be possible to conduct multi-gene sequencing studies on a population-wide scale.

Figure 1 .
Figure 1.I. scapularis collection sites.GPS coordinates of collection sites were plotted onto a public domain map of the United States (http:// nationalmap.gov/)using an online GPS generation program (http://www.gpsvisualizer.com).Blue dots denote the collection sites, while the red dot denotes the approximate midpoint between two Mississippi collection sites used for the Mantel test.Red solid line represents approximate extent of historical Pleistocene glacial coverage.doi:10.1371/journal.pone.0101389.g001

Figure 3 .
Figure 3. I. scapularis 16S Bayesian phylogeny with placement of MS unknowns.Tree includes all four MS unknowns and was left unrooted.Tree B was rooted to I. pacificus ( = brown) and included only sequences confirmed by blastn to be Ixodes scapularis.Pink = Mississippi samples; purple = Pennsylvania; red = specimens were blinded until after sequence confirmation.Numbers at nodes represent posterior probability values and branch length corresponds to number of substitutions.doi:10.1371/journal.pone.0101389.g003

Figure 4 .
Figure 4. I. scapularis 16S Bayesian phylogeny compared to other I. scapularis sequences from Genbank.Tree is rooted to I. pacificus ( = brown) and included only sequences confirmed by blastn to be Ixodes scapularis.Pink = Mississippi samples; purple = Pennsylvania; red = specimens were blinded until after sequence confirmation.Numbers at nodes represent posterior probability values and branch length corresponds to number of substitutions.doi:10.1371/journal.pone.0101389.g004
h p = #mutations (theta) per site from pi (nucleotide diversity); and h S = theta per segregation site.Bold P-values indicate statistical significance based on 100,000 replicates with no recombination, and normalized by their standard deviations.Significant positive D/D*/F* indicates balancing selection or bottleneck; significant negative D/D*/F* indicates directional or expansion.doi:10.1371/journal.pone.0101389.t005

Figure 6 .Figure 7 .
Figure 6.Mismatch distribution of observed frequencies of pairwise differences among I. scapularis COI sequences and expected frequencies under the neutral model of evolution given the null hypothesis of no population change or population expansion.A) Northern populations (WI, NY, NH, PA).B) Southern populations (MS, NC).doi:10.1371/journal.pone.0101389.g006

Table 2 .
PCR primer sequences used in this study.

Table 3 .
Genbank blastn and BOLD IDS best matches for Mississippi samples.

Table 4 .
Haplotype diversity of COI by population.

Table 6 .
Analysis of molecular variance (AMOVA).Bold font in P-value column signifies statistical significance.Fct tests the genetic variability found among Northern versus Southern regions, Fsc tests genetic variability found among populations within Northern or Southern regions, and Fst tests genetic variability found among populations.doi:10.1371/journal.pone.0101389.t006