Nuclear Markers Reveal Predominantly North to South Gene Flow in Ixodes scapularis, the Tick Vector of the Lyme Disease Spirochete

Ixodes scapularis, the tick vector of the Lyme disease spirochete, is distributed over most of the eastern United States, but >80% of all Lyme disease cases occur in the northeast. The role that genetic differences between northern and southern tick populations play in explaining this disparate distribution of Lyme disease cases is unclear. The present study was conducted with 1,155 SNP markers in eight nuclear genes; the 16S mitochondrial gene was examined for comparison with earlier studies. We examined 350 I. scapularis from 7 states covering a representative area of the species. A demographic analysis using Bayesian Extended Skyline Analysis suggested that I. scapularis populations in Mississippi and Georgia began expanding 500,000 years ago, those in Florida and North Carolina 200,000 years ago and those from Maryland and New Jersey only during the past 50,000 years with an accompanying bottleneck. Wisconsin populations only began expanding in the last 20,000 years. Analysis of current migration patterns suggests large amounts of gene flow in northern collections and equally high rates of gene flow among southern collections. In contrast there is restricted and unidirectional gene flow between northern and southern collections, mostly occurring from northern into southern populations. Northern populations are characterized by nymphs that quest above the leaf litter, are easy to collect by flagging, frequently feed on mammals such as rodents and shrews, commonly attach to people, and about 25% of which are infected with B. burgdorferi. If there is a genetic basis for these behaviors, then the patterns detected in this study are of concern because they suggest that northern I. scapularis populations with a greater ability to vector B. burgdorferi to humans are expanding south.


Introduction
The black-legged tick, Ixodes scapularis is the main vector of Borrelia burgdorferi, the bacterium that causes Lyme disease, the most prevalent vector borne disease in the United States. There are more than 30,000 new cases reported yearly, and possibly ten times that number of cases actually occur [1]. The black-legged tick also transmits the pathogenic agents that cause human granulocytic anaplasmosis [2], Powasson encephalitis [3] and human babesiosis [4]. Ixodes scapularis ticks have a wide distribution throughout eastern North America, with populations found along the Atlantic seaboard from Florida to Nova Scotia, and from the Atlantic coast west to the 100 th meridian [5]. But risk of Lyme disease spirochete transmission is not uniform throughout this range. Lyme disease cases are focused in the northeastern U.S. (from Maryland to Maine) and in the Midwest (Minnesota-Wisconsin), but are rare or absent in the southeastern U.S. [6,7]. Interestingly, the behavior of the nymphal stage of I. scapularis differs dramatically between the northern and southern populations [8]. In the northern U.S., nymphal I. scapularis quest above the leaf litter, are easy to collect by flagging, frequently feed on mammals such as rodents, commonly attach to people, and about 25% of questing nymphs are infected with B. burgdorferi. In the south, nymphal I. scapularis are difficult to collect by flagging, rarely attach to rodents or people, appear to be more common on reptiles than mammals, and B. burgdorferi infection is extremely rare in questing ticks [9]. Correspondingly, Lyme disease is much more common in the northern U.S. than in the southern U.S. [10].
These and additional morphological differences were considered sufficient to classify northern and southern ticks as distinct species, I. dammini and I. scapularis, respectively [11]. However later studies failed to observe any reproductive barrier or chromosomal variation and I. dammini was reduced to a junior synonym of I. scapularis [12]. Discrete genetic differences were observed at the protein and DNA level, but were not sufficient to discriminate northern from southern ticks [11][12][13]. Later studies analyzed the mitochondrial 16S rRNA gene [12][13][14][15][16][17] and revealed the presence of two distinct clades, one that extends throughout the continental U.S., known as "Clade A" [16] or "All American Clade" [14] and another clade found exclusively in the southern regions, known as "Clade B" [16] or "Southern Clade." [14] In addition much greater 16S diversity was consistently observed among southern ticks [12][13][14][15][16][17].
The migration pattern of I. scapularis ticks has been investigated in several other studies. Some of them focus in specific areas, such as those in Virginia [18] and New York [19], while others have chosen to look at specific genes, especially the 16S mitochondrial gene [20,21] for signature migration patterns. In a review of the last 30 years of tick population genetic studies [22], the breeding structure of I. scapularis ticks is described as highly structured on highly mobile hosts [12,14,16,21,23,24]. Recently we amplified and sequenced nuclear genes in I. scapularis using sequence information from the genome project (https://www.vectorbase.org/ organisms/ixodes-scapularis) for primer design [17] and to assess the density of Single Nucleotide Polymorphisms (SNPs). We sampled 10 ticks from each of 4 collections from New Jersey, Virginia, Georgia, and Mississippi and analyzed the sequences of 9 nuclear genes and the mitochondrial 16S gene. SNPs were found to be extremely abundant (1 SNP /14 bases). A very preliminary population genetic analysis based on frequencies of 372 SNPs in these 9 genes showed that the ticks fell into three genetic groups. Northern collections from New Jersey and Virginia formed a homogeneous group with low genetic diversity, whereas ticks collected from Georgia and Mississippi formed two groups, each with high genetic diversity. It was also noted that northern ticks appeared to be migrating south but there was little evidence for gene flow in the reverse direction. More recently, the mitochondrial cytochrome oxidase I (COI) and 16S genes and three nuclear genes (serpin2, ixoderin B and lysozyme) were sequenced from field collected northern and southern I. scapularis [25]. This study also detected a divergence in the mitochondrial gene sequences from some southern specimens. Phylogenetic analyses and analysis of molecular variance (AMOVA) [26] supported significant differences between northern vs. southern populations.
We were intrigued by the patterns indicated by these nuclear marker studies [17,25] because if there is a genetic basis for the nymphal behaviors that differ between northern and southern populations then northern I. scapularis populations with a greater ability to vector B. burgdorferi to humans appear to be expanding south. To assess this possibility, we herein describe a study with sample sizes increased to fifty ticks/collection and the numbers of states sampled increased to seven to cover a more representative area of the distribution of I. scapularis. Larger samples sizes allowed us to perform a demographic analysis using Bayesian Extended Skyline plots [27] to assess the evolutionary history of I. scapularis in the U.S and enabled us to perform a non-equilibrium analysis of gene flow [28]. Three genetic groups were recovered when all nuclear genes were analyzed together. Analysis of migration rates and patterns revealed a pattern of very limited gene flow between northern and southern populations with the majority of migration occurring from northern populations into the south.

Materials and Methods
Collection sites were selected in an opportunistic fashion throughout the natural range of I. scapularis. Ticks from New Jersey, Wisconsin and North Carolina were collected in 2009 by dragging sheets of cloth across vegetation ("flagging"). Flagging was also used in 2011 to collect from Maryland and in 2012 from Mississippi and Florida. Ticks in the 2012 Georgia collection were taken off of dead deer at a hunter check station. A minimum of 50 ticks were analyzed in each of the seven collections. For Virginia, New Jersey and Mississippi only 40 ticks were collected, since 10 were previously used in another study [17]. No specific permissions were required for these locations/activities, all ticks were collected by collaborators on public property and did not involve the use of live vertebrate hosts.
The counties where collections occurred are mapped in Fig 1. For this study and in keeping with all previous population genetic studies of this species [12][13][14][15][16][17] New Jersey, Maryland and Wisconsin are considered northern while North Carolina, Georgia, Mississippi and Florida are southern.
Nine genes were used for this study: Ixolaris 2A, Defensin, Heat shock protein 70 b2, Trospa, Ferredoxin-glutamate synthase, Dihydropyrimidine dehydrogenase, Serotonin 4 receptor, Acetylcholinesterase and 16S. Extensive details on gene selection, DNA isolation, primer design, PCR conditions, and sequencing protocols are published [17]. Sequences were aligned using CLUSTALW (www.genome.jp/tools/clustalw) and visually inspected to insure correct alignment along codons. The phase unknown sequences of each of the 9 genes were entered in "fas" format into DnaSP v5 [29], where each was converted with the PHASE program [30] into a pair of phase-estimated sequences. DnaSP v5 was used to calculate S, the number of SNPs appearing in more than one sequence, η, the overall number of SNPs (S including singletons) and H, the numbers of unique alleles. π is the average observed number of pairwise differences per nucleotide between all pairs of sequences. θ, calculated from η, is the average expected number of pairwise differences per nucleotide between all pairs of sequences. θ also estimates the approximate product of effective population size (N e ) and the neutral mutation rate per site per generation (μ) and therefore estimates the approximate maximum expected amount of genetic variation in a natural population. Tajima's D [31] is the normalized difference between π and θ. A negative Tajima's D indicates fewer pairwise differences between sequences than expected and can arise from removal of variation through purifying selection or through a recent population expansion. A positive Tajima's D indicates more pairwise differences between sequences than expected and can arise from balancing selection or through a recent population contraction.
All aligned haplotypes were compared using RAxML v. 8 [32] to identify identical haplotypes and determine haplotype frequencies for mismatch analysis [33] and AMOVA [26] analyses by ARLEQUIN version 3.01 [34]. A FORTRAN program was written to extract SNPs from each gene and place these in a format for analysis by STRUCTURE 2.3.4 [35] which was then used to estimate the numbers of genetic clusters (K) of ticks on the basis of their genotypes at the 1,155 SNPs. Population identity was not used as a prior. The log probability of the data (X) was estimated for each value of K (conditional probability = (X|K)). For K = 1-14, the burnin period was 100,000 while the number of MCMC replicates after burnin was 900,000 [36]. Each run was repeated 10 times for each value of K for all 8 nuclear genes combined. Repeating the analysis with 20 repetitions for K = 1-5 did not change the estimate of K. The ΔK method [37] estimated the most likely value of K. STRUCTURE also quantifies the likelihood that each tick belongs to each of the K clusters and assigns ticks to a cluster based upon this information. The program DISTRUCT [38] was used to display membership likelihood coefficients. Cluster membership was represented as different colors, and individual ticks were depicted as vertical bars partitioned into segments that correspond to membership coefficients in each of the seven collection sites.
Maximum clade credibility trees were estimated for the 8 nuclear genes combined and for the 16S using the Bayesian Evolutionary Analysis Sampling Trees (BEAST v1.8.1) package [27] and by MrBayes 3.2.3 [39] (http://mrbayes.sourceforge.net/download.php). Results were visually compared to assess whether MrBayes and BEAST produce similar trees. The Bayesian Evolutionary Analysis Utility (BEAUti v1.8.1) in BEAST [27] generated ten million coalescent trees based upon the Kingman Coalescent Constant Size model [40]. The (HKY) substitution model [41] was used with empirical nucleotide frequencies. A maximum clade credibility tree was then derived from these many trees using TreeAnnotator (v1.8.1) in http://beast.bio.ed.ac. uk/treeannotator in BEAST with 3% of trees used during burnin. FigTree v1.4.2 in BEAST was used to view the maximum clade credibility tree and to display the posterior probability (PP) for each branch.
Extended Bayesian Skyline Plot (EBSP) analysis [27] was also performed with BEAST. EBSP is a relatively new variable-dimension Bayesian phylogenetic method for inferring nonparametric population size changes over geological time from multiple loci [42]. The EBSP estimates N e changes through time by directly inferring N e as a function of time using sequence data. We used EBSP to assess the evolutionary history of I. scapularis.
The 8 nuclear gene sequences were next entered as a group for analysis of θ and the effective migration rate 4N e m among collections using LAMARC 2.1.10 [28,43]. In LAMARC, the immigration parameter M equals m (the immigration rate per generation) divided by μ. If M = 1, the rate that new alleles enter a population through immigration is equal to the rate new alleles arise by mutation. The effective migration rate 4N e m was calculated between a donor and recipient population as M multiplied by the recipient population θ. A 4N e m < 1 suggests that immigration is insufficient to offset the effects of genetic drift so that populations diverge in allele frequencies while a 4N e m > 1 suggests that there is sufficient immigration to offset the effects of genetic drift to maintain homogeneity among populations. 4N e m values are displayed as arrows on the collecting site map (Fig 1) to indicate the direction and rate of gene flow between two collections. When 4N e m 1, no arrow was displayed, but when 1 < 4N e m 3.0 a thin open arrow is displayed. Thick open arrows indicate when 4N e m ! 3. Green arrows signify gene flow within the south or within the north while red arrows indicate north to south gene flow (south to north gene flow was never detected). Table 1 provides the full name of each of the eight nuclear genes and their VectorBase accession number. Also listed are the region(s) of the gene examined, the length of the regions, the VectorBase coordinates for the beginning and end of each region and the GenBank accession numbers. The number of nucleotides in coding versus intronic regions is listed as well. Table 2 lists the number of sequences obtained for each gene in the overall study and in each of the seven collections. S is the number of SNPs appearing in more than one sequence and η is the overall number of SNPs (including singletons) while H is the number of unique alleles. π is the average observed number of pairwise differences per nucleotide. θ, calculated from η, is the average expected number of pairwise differences per nucleotide. Tajima's D is the normalized difference between π and θThe significance of D is indicated by Ã p<0.05, ÃÃ p<0.01, ÃÃÃ p<0.001. The D ratio is only calculated for exons and is D for replacements substitutions divided by D for silent substitutions.  (Fig 2A and 2B). The average numbers of SNPs detected were greater among southern collections in all genes except Hsp and TROSPA (Fig 2C). Similarly SNP density was greatest in southern collections in all genes except Hsp and Ser (Fig 2D). With the exception of Ace all of Tajima's D values in Table 2 are negative implying removal of variation through purifying selection or through a recent population expansion. Purifying selection is unlikely since the absolute value of most (41/54) of Tajima's D (Nonsynonymous/Synonymous) ratios are > 1. Recent population expansion is a more likely explanation and is supported in subsequent analyses. Table 3 lists the numbers of unique alleles identified among all collections in the 8 nuclear markers and in the 16S gene. Also listed are the numbers of alleles that occurred in common between northern and southern collections and the number that were unique to southern or northern collections. There were from 0.70-3.2 fold more alleles detected among southern as compared to northern collections. Table 3 also shows that, with the exception of hsp, only a small percentage of the alleles were shared between northern and southern collections, a pattern consistent with limited gene flow.

Genetic diversity
AMOVA analyses of the 16S sequences (Table 4A) show that 11.8% of the variance arose between northern and southern collections while 34.5% arose among collections in the north and in the south. Most (53.7%) of the variation arose within individual collections. In contrast, AMOVA analyses of individual nuclear genes (Table 4B) showed that on average 15.5% of the variance arose between northern and southern collections but this varied from 0-62% among genes while only 6.0% arose among collections in the north and in the south and this varied from 1-22% among genes. Most (78.5%) of the variation arose within individual collections. A mismatch distribution is the frequency distribution of the observed number of differences between pairs of haplotypes [33]. A unimodal distribution with a small range in numbers of mismatches is indicative of populations that have gone through a recent demographic expansion. In contrast, a multimodal distribution involving a large range in the number of mismatches is indicative of a population at demographic equilibrium reflecting highly stochastic shape of gene trees [33]. Fig 3 shows that mismatch distributions among the four southern collections (in red) are multimodal with 0-57 mismatches while the northern collections (in blue) have no more than 11 mismatches and are unimodal with the exception of New Jersey which has two large peaks but without a large range of mismatches possibly suggesting two expansions. Thus southern I. scapularis populations appear to be at a demographic equilibrium while northern populations have gone through one or in the case of New Jersey two recent N is number of sequences obtained for each gene in the overall study and in each of the seven collection sites. S is the number of SNPs appearing in more than one sequence and η is the overall number of SNPs (including singletons) while H is the number of unique alleles. π is the average observed number of pairwise differences per nucleotide. θ, calculated from η, is the average expected number of pairwise differences per nucleotide. Tajima's D is the normalized difference between π and θ. The significance of D is indicated by *p<0.05 **p<0.01 ***p<0.001.
The D ratio is only calculated for exons and is D for replacements substitutions divided by D for silent substitutions.    demographic expansions. Demographic expansion was also supported by analysis of Tajima's D ( Table 2).

Phylogenetic relationships among I. scapularis collections
Phylogenetic relationships among all of the 16S sequences generated in the present study and using Ixodes ricinus as an outgroup were assessed using BEAST. A clock rate for the 16S arthropod gene was set at 1.06% /million years (My) [44]. The maximum clade credibility coalescent tree appears in Fig 4. Only clades with a posterior probability (PP) ! 0.90 are indicated. Ixodes ricinus sequences appear in green (clade A). Sequences collected exclusively in the south are colored red (e.g. clade C-E) while those only collected in the north are blue. Any high order branches that join both north and south sequences are black (clades B and F). The I. scapularis clade B has a PP = 0.95, the next three clades C-E are exclusively southern (red) in composition with PP = 0.99. Clade F (PP = 0.91) contains the remainder of all sequences and is a mixture of ticks collected in both the north and south. Note that this tree implies historical, but not necessarily current, unidirectional introgression of southern ticks into the north since no blue sequences appear in southern clades C-E. The topology of the maximum clade credibility tree and PP are extremely similar to those generated by MrBayes (tree not shown). Maximum clade credibility coalescent trees were next derived for the combined nuclear genes but without outgroups since these genes have not been sequenced in other acari (Fig 5). This tree, as with the 16S tree, also has three southern branches A, C, D. Clade A has a PP of   (PP = 0.94). This tree implies little to no current gene flow between northern and southern ticks.
Extended Bayesian Skyline Plot (EBSP) [42] was performed with all nuclear genes combined following the online tutorial provided by J. Heled (http://beast-mcmc.googlecode. com/svnhistory/r3615/trunk/doc/EBSP). The authors demonstrate the essential role of multiple loci in recovering population size dynamics. Multi-locus data recover past bottlenecks that cannot be characterized by analysis of a single locus. [42] EBSP are read from right to left for each of the collections and the period during which the Laurentide Ice Sheet existed is indicated in gray (Fig 6). Ixodes scapularis populations in Mississippi and Georgia began expanding 500,000 years ago, those in Florida and North Carolina 200,000 years ago and those from Maryland and New Jersey only during the past 50,000 years with an accompanying bottleneck. Wisconsin populations only began expanding in the last 20,000 years. This demographic analysis is consistent with a hypothesis that I. scapularis arose as a species in the southeastern U.S., expanded further south to Florida and North to the Carolinas and only within the last 50,000 years became established in the Northeast. Northern Midwest and Canadian populations only became established as the Laurentide Ice Sheet retreated.

Genetic analysis of individual ticks
The program STRUCTURE was used to estimate the numbers of genetic clusters (K) of ticks on the basis of their 16S haplotypes ( Fig 7A) and for all 1,155 SNPs in the combined nuclear genes (Fig 7B). The ΔK method identified three genetic groups in the 16S dataset. Ticks in the first group (green bars) were mostly collected in Wisconsin, Maryland, and North Carolina. Ticks in the second group (red bars) were mostly ticks from New Jersey and Mississippi. Ticks in the third (blue) genetic group are mostly from Georgia and Florida. Ticks from the green group are mostly northern but do appear in the south while ticks from the blue group are mostly southern and do not appear in the north. On the other hand ticks from the red group occur exclusively in New Jersey and Mississippi. These results suggest historical unidirectional and bidirectional flow of mitochondrial genomes. Blue southern ticks appear in other southern states. Green ticks appear frequently in southern collections. Red ticks appear in both the northern and southern collections. A different pattern emerges from analysis of the combined nuclear genes which reflects current gene flow. The ΔK method again estimated three genetic groups. Green bars represent ticks primarily from Maryland, New Jersey, and Wisconsin while red bars represent ticks primarily from the four southern collections. The third blue genetic group consists primarily of ticks from Georgia, with some blue group ticks appearing in Florida, Mississippi and North Carolina. Two blue group ticks appear in Maryland and some northern ticks have genes from the red group. Conversely northern green cluster genes appear variously in southern collections. These analyses suggest, in agreement with Fig 5, that there is limited current flow of nuclear genes between northern and southern collections.

Nonequilibrium migration patterns among I. scapularis populations
Rates and directions of gene flow were estimated for all nuclear genes combined using LAMARC. Table 5 lists the average 4N e m among northern collections (6 values), among southern collections (12 values), from north to south (12 values) and south to north (12 values). When analyzing all nuclear genes together, the average 4N e m among northern collections was 6.42 reproductive migrants per generation while there were 5.27 reproductive migrants per generation among southern collections. 4N e m was reduced between southern and northern collections with 1.51 migrants from the north to the south/generation and only 0.07 migrants from the south into the north/generation. Considering all nuclear markers together there was 21 fold greater migration from north to south relative to the reverse direction.

Discussion
Earlier studies on the population genetics of I. scapularis consistently reported greater genetic diversity in southern versus northern collections. This trend was noted earlier for the 16S mitochondrial genes [14][15][16] and including the 12S [14], and CO1 [25] mitochondrial genes. The same trend was also noted for three nuclear genes (serpin2, ixoderin B and lysozyme) [25] and for the same nuclear markers used in the present study [17]. The same trend was found in the current study and, along with Figs 4 and 6, recapitulates the common observation of greater genetic diversity in the regions of ancestral origin of a species. Table 3 shows that only a small percentage of alleles are shared between northern and southern collections. This trend is also seen with the maximum clade credibility coalescent tree ( Fig 5) and in Fig 7 wherein northern genetic group genes appeared in the south but few southern genetic group genes appear in the north. This same trend with nuclear genes was reported earlier [17]. The AMOVA results are similar in this and earlier studies in which genetic differences in nuclear genes between north and south accounted for 15.5% in the present study and accounted for 20% [25] and 18% [17] in earlier studies. Genetic differences in nuclear genes among northern and among southern collections accounted for 6% of the variation in in the current study and for 15% [25] and 6% [17] in previous studies. In all three studies, the differences between north and south were greater than the differences among northern or among southern collections. However the variance patterns in with the 16S differ from patterns in earlier studies which found 33.6% [15] to 16.8% [25] of variance arose between north and south while 27% [15] to 10.6% [25] of variance arose within northern and southern collections. This may reflect differences in sampling location in some states. For example Norris et al [14] sampled from Currituck Stokes county in coastal North Carolina(NC), Qiu et al [15] sampled from Currituck and Washington County in eastern NC, Sakamoto et al [25] sampled from Stokes county in western NC [25] and the current study sampled from Gates Co. (Fig 1) near both Currituck and Washington County in eastern NC. Ecological Units vary enormously from east to west in NC; extending from "Tidewater" in eastern NC to "Blue Ridge" in western NC [45,46]. North Carolina ticks appeared more similar to northern genetic groups in the STRUCTURE analysis of the 16S (Fig 7).
Mismatch patterns were previously examined in northern and southern I. scapularis collections [15,25]. These studies also found a unimodal distribution with a maximum of 4 mismatches in northern I. scapularis collections and detected a multimodal distribution with up to 17 mismatches in southern collections [15]. The mismatch patterns among CO1 genes in southern collections were multimodal with mismatches as high as 61 but were unimodal with up to 15 mismatches in northern collections [25]. Thus southern I. scapularis populations appear to be at a demographic equilibrium while northern populations have gone through a recent demographic expansion. This trend is in agreement with the Tajima's D analysis in the current and previous [25] studies which also suggested recent population expansion. This conclusion is also supported by the EBSP analyses wherein southern tick populations have been expanding over the last 500,000 years while the northern populations only began expanding within the last 20,000 to 50,000 years.
The maximum clade credibility coalescent 16S tree implies historical unidirectional introgression of southern ticks into the north and agrees in almost all respects with the published trees estimated by Maximum Parsimony, Distance/Neighbor Joining, Maximum Likelihood and Bayesian techniques in all previous studies [12][13][14][15][16][17]25]. In contrast, the maximum clade credibility coalescent tree for the combined nuclear genes implies very little current gene flow between northern and southern ticks.
Taken as a whole these results suggest a scenario in which I. scapularis arose as a species in the southeastern U.S. Populations in Mississippi and Georgia began expanding into Florida and North Carolina half a million years ago and into Maryland and New Jersey only 50,000 years ago and with an accompanying population bottleneck. Wisconsin populations only began expanding in the last 20,000 years as the Laurentide Ice Sheet retreated. During the northward expansion and accompanying population bottleneck, a barrier to gene flow arose such that very little bidirectional gene flow occurred between northern and southern ticks. This barrier must have existed over an extended period of time (and may still exist) in order to account for the small percentage of the alleles shared between northern and southern collections and for the patterns of isolation implied by the nuclear maximum clade credibility coalescent tree. During this time I. scapularis nymphs may have evolved away from an ancestral tendency to quest at ground level towards questing above the leaf litter. At the same time they may have switched from the ancestral tendency to seek and feed on reptiles towards locating and feeding primarily on mammals. Eventually the gene flow barrier seems to have partially broken down such that there is now gene flow albeit unidirectional from northern populations into the south.
This scenario assumes that questing above the leaf litter and host preference are both genetic traits which are polymorphic in natural populations. In fact there is no data to support this idea. These behaviors may instead be plastic in response to different environmental conditions in northern and southern habitats.
While our study supports a model of current restricted, largely unidirectional gene flow, many biological questions remain unanswered. The principal vectors of Lyme disease spirochetes in the eastern US are nymphal I. scapularis. By questing above the leaf litter, they frequently contact humans and rodents, thereby serving as an excellent bridge vectors passing Lyme disease spirochetes from their rodent and soricid reservoirs to people [47]. In contrast, southern nymphs quest mainly below the leaf litter and feed mainly on reptiles [48,49], which are considered sink-hosts, therefore interrupting the transmission of spirochetes. Could this behavior impact rates of migration and gene flow between populations in certain areas?
Apparently, the highly successful populations of I. scapularis in the northeastern United States are favored by temperate environmental conditions and abundant rodent, bird, and large mammal hosts. Earlier studies [20,21] and the current study demonstrate that these ticks are spreading to the Midwestern United States and the southern United States. They are also spreading northward to Canada [50][51][52][53], perhaps on birds. The spread of northern vectorial competent ticks to the southern United States would be of public health concern, if questing and host finding patterns have a strong genetic component, as opposed to these behaviors being merely temporal adaptations to extant local environmental conditions. More intensive geographic sampling of I. scapularis especially in Virginia, the Carolinas and Georgia will be needed to delineate the geographic regions where barriers and corridors for gene flow are occurring.
Supporting Information S1 Table. Membership coefficients (inferred ancestry) for individual ticks as estimated in STRUCTURE [35]. Colors correspond to those in Fig