Analysis of the Population Structure of Anaplasma phagocytophilum Using Multilocus Sequence Typing

Anaplasma phagocytophilum is a Gram-negative obligate intracellular bacterium that replicates in neutrophils. It is transmitted via tick-bite and causes febrile disease in humans and animals. Human granulocytic anaplasmosis is regarded as an emerging infectious disease in North America, Europe and Asia. However, although increasingly detected, it is still rare in Europe. Clinically apparent A. phagocytophilum infections in animals are mainly found in horses, dogs, cats, sheep and cattle. Evidence from cross-infection experiments that A. phagocytophilum isolates of distinct host origin are not uniformly infectious for heterologous hosts has led to several approaches of molecular strain characterization. Unfortunately, the results of these studies are not always easily comparable, because different gene regions and fragment lengths were investigated. Multilocus sequence typing is a widely accepted method for molecular characterization of bacteria. We here provide for the first time a universal typing method that is easily transferable between different laboratories. We validated our approach on an unprecedented large data set of almost 400 A. phagocytophilum strains from humans and animals mostly from Europe. The typability was 74% (284/383). One major clonal complex containing 177 strains was detected. However, 54% (49/90) of the sequence types were not part of a clonal complex indicating that the population structure of A. phagocytophilum is probably semiclonal. All strains from humans, dogs and horses from Europe belonged to the same clonal complex. As canine and equine granulocytic anaplasmosis occurs frequently in Europe, human granulocytic anaplasmosis is likely to be underdiagnosed in Europe. Further, wild boars and hedgehogs may serve as reservoir hosts of the disease in humans and domestic animals in Europe, because their strains belonged to the same clonal complex. In contrast, as they were only distantly related, roe deer, voles and shrews are unlikely to harbor A. phagocytophilum strains infectious for humans, domestic or farm animals.


Introduction
Anaplasma phagocytophilum is a Gram-negative obligate intracellular bacterium that replicates in neutrophils [1]. It is transmitted via tick-bite and causes febrile disease in humans and animals. The main vectors of A. phagocytophilum are Ixodes ricinus in much of Europe, I. scapularis and I. pacificus in North America and I. persulcatus in Eastern Europe and East Asia [2].
Human granulocytic anaplasmosis (HGA) was first recognized in the USA in 1994 [3] and has reached its largest reported incidence since it became a notifiable disease in this country with 2,575 cases in 2011 [4]. Although increasingly detected, it is still a rare disease in Europe [5,6]. Underreporting, differences in host range or a reduced virulence in humans of the A. phagocytophilum strains circulating in Europe might account for this phenomenon. HGA also occurs in China [7,8]. However, the person to person transmission [9] and a mortality rate of 26.5% [10] initially described in Chinese patients, were later on found to be caused by a novel bunyavirus [7,11,12].
Clinically apparent A. phagocytophilum infections in animals are mainly found in domestic animals such as horses [13], dogs [14] and cats [15] as well as in farm animals such as sheep, cattle and goats [16]. Since A. phagocytophilum is not transmitted transovarially, at least in Ixodes spp. ticks [17], it is generally thought to depend on reservoir hosts to complete its life cycle. Whereas the white-footed mouse (Peromyscus leucopus) is a well characterized reservoir species in North America [18,19], the situation is less clear in Europe. Although A. phagocytophilum has been detected in a wide range of wildlife animals in Europe including wild ruminants, wild boars, red foxes and small mammals, their respective reservoir competence has not been fully established [20].
Evidence from cross-infection experiments indicating that A. phagocytophilum isolates of distinct host origin are not uniformly infectious for heterologous hosts has led to several approaches for molecular strain characterization [21,22]. The target genes that have been used most often comprise the 16S rRNA, groESL, ankA, msp2 and msp4 genes [22]. Unfortunately, the results of the studies performed to date are not always easily comparable because different gene regions and fragment lengths have been investigated. In contrast, multilocus sequence typing (MLST) offers the advantage of a standardized approach, a universal nomenclature and a free data exchange via the internet [23]. Therefore, we chose to develop a MLST scheme for A. phagocytophilum.
A total of 380 A. phagocytophilum positive samples from humans, animals and I. ricinus ticks from Europe were included in our data set. 11 additional samples originated from the USA. The MLST data we generated were used to analyze the population structure of A. phagocytophilum, to assess the concordance between MLST, 16S rRNA and ankA gene-based typing, to test whether particular A. phagocytophilum strains are associated with distinct host species and to examine which wildlife species may be potential reservoir hosts for granulocytic anaplasmosis in humans, domestic and farm animals in Europe.

Ethics Statement
The samples from humans, domestic and farm animals were obtained as part of routine diagnostic evaluations [24,25,26,27]. All samples were de-identified. Written informed consent was obtained form the patients and owners, respectively. A part of the ovine samples was collected for the Q-fever surveillance program of the federal state of Lower-Saxony [24]. The red foxes were killed for the official rabies monitoring program of the federal state of Brandenburg [28]. All other wild large animals were shot by professional hunters during the annual hunting seasons [24]. Hedgehogs from a captive population in the federal state of Baden-Württemberg and from hedgehog caretaking stations in various areas of Germany were bled from the saphenous vein or died naturally [29,30]. None of the hedgehogs was sacrificed for this study. For Germany, permission to trap rodents using snap traps was given by the District Government Stuttgart, Germany [31]. For the United Kingdom, protocols for the handling and sampling of wild small mammals were approved by the University of Liverpool Committee on Research Ethics and were conducted in compliance with the terms and conditions of licenses awarded under the UK Government Animals (Scientific Procedures) Act, 1986 [32,33].

MLST Analysis
For our MLST analysis seven housekeeping genes (pheS, glyA, fumC, mdh, sucA, dnaN, atpA) each located at least 10 kb apart in the A. phagocytophilum HZ genome (GenBank accession number CP000235) were chosen. Sequencing revealed that all of them contained more synonymous than non-synonymous mutations. Fragments of an in-frame length of approximately 400 bp (without primer sequences) were amplified by nested PCR. 2 ml of DNA were used as a template in a 50-ml reaction mixture containing 50 mM KCl, 20 mM Tris-HCl (pH 8.4), 2 mM MgCl 2 , 0.2 mM deoxynucleoside triphosphates, 0.4 mM concentrations of each primer and 0.2 ml (1 U) of Taq DNA polymerase (Invitrogen, Karlsruhe, Germany). PCRs were performed by using a GeneAmp PCR System 9700 (Applied Biosystems, Darmstadt, Germany) under the following conditions: an initial denaturation at 94uC for 3 min, followed by 40 cycles of denaturation at 94uC for 30 s, annealing at 54uC for 30 s, extension at 72uC for 30 s and a final extension at 72uC for 10 min. Because of nucleotide polymorphisms a set of multiple primers had to be developed for amplification (Table S2). The nucleotide sequences of the primers are shown in Table S3. Nested PCR products were directly sequenced bidirectionally using a 3130 Genetic Analyzer (Applied Biosystems) and a BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems). Forward and reverse sequences were aligned and quality controlled using TraceEdit Pro 1.2.1 (Ridom, Würzburg, Germany). Different sequences of a given locus were ascribed a unique, but arbitrary allele number and each unique combination of alleles was assigned a sequence type (ST). In 383 samples, all seven housekeeping genes could be amplified. However, in 3 roe deer, 2 sheep, 1 dog, 1 red deer and 1 hedgehog samples the DNA was depleted before completion. The pheS, glyA, fumC, mdh, sucA, dnaN and atpA sequences were assigned GenBank accession numbers KF242733 to KF245413. The A. phagocytophilum MLST profiles and isolate databases are available at http://pubmlst.org/aphagocytophilum/.

16S rRNA and ankA Gene Sequences
Partial 16S rRNA and complete ankA gene sequences of 256 samples were determined as part of previous studies [24,25,27,28,29,30]. In 135 additional samples, the A. phagocytophilum 16S rRNA gene was amplified and sequenced as described earlier [34,35]. Only in 130 of these, nested PCR amplification and sequencing of partial ankA sequences was possible, because the DNA was used up in 4 hedgehog and 1 tick samples. The composition of the PCR reaction mixture was the same as described above. The following PCR conditions were used: an initial denaturation at 94uC for 3 min, followed by 40 cycles of denaturation at 94uC for 30 s, annealing at 52uC for 30 s, extension at 72uC for 1 min (first amplification) and for 30 s (nested amplification), respectively and a final extension at 72uC for 10 min. Primers SLO fo 1 (59-GGG ATR AGT GCR GTG CAG YAT-39) and SLO re 1 (59-ACT GCR GCM GCT ARA GGR CT-39) were used in the first amplification step and primers SLO fo 2 (59-TTA CGC TGT RRT RGC ATR GAC-39) and SLO re 2 (59-AWR GWT CCS KYA GGA GYA TTT A-39) in the nested amplification step. Nested PCR products were directly sequenced bidirectionally as described above. Sequences were edited and assembled with the SeqMan program 8.02 of the DNASTAR package (Lasergene, Madison, WI). The 16S rRNA and ankA gene sequences are available at GenBank under the accession numbers KF242536 to KF242732, KF286395 and JQ347533 to JQ347591.

Allele-based Analyses
The allele-based analyses were carried out by using eBURST (http://eburst.mlst.net/) and START2 (http://pubmlst.org/ software/analysis/start2/). Clonal complexes were defined by sharing identical alleles at five of the seven loci with at least one other member of the group. The standardized index of association [36] was calculated in order to detect linkage disequilibrium.

Sequence-based Analyses
For phylogenetic analyses the program MEGA 5.1 [37] was used. Sequences were aligned by ClustalW applying the IUB matrix (16S rRNA gene) or codon-aligned applying the PAM (Dayhoff) matrix (concatenated allele sequences, ankA). Tree construction was achieved by the maximum-likelihood (ML) method using the Tamura-Nei model and a gamma distribution with the use all sites option. Bootstrap analysis was conducted with 1,000 replicates. Net average distances between concatenated allele gene clusters and ankA gene clusters were computed using the Jukes-Cantor matrix and applying the complete deletion option.

Structure Analysis
The program STRUCTURE 2.3.4 [38] was used to examine population clustering among the A. phagocytophilum strains investigated. A dataset of 518 variable sites was created consisting of all single-nucleotide polymorphisms found in the seven housekeeping genes (variable indels were not included in this dataset). With these single-nucleotide polymorphisms, we examined the number of clusters (K) represented in the data from 1 to 9, using the admixture model. We applied a burn-in period of 10,000 followed by 20,000 replications. Each value of K (1-9) was run five independent times. The program STRUCTURE HARVESTER 0.6.93 [39] was used to analyze these results and to estimate what number of clusters is best supported by the data applying Evanno's DK [40]. The program DISTRUCT 1.1 [41] was used to visualize the clusters.

Analyses for Recombination
Four separate analyses were used to test for recombination in each of the seven housekeeping genes and ankA. These were GENECONV 1.81 (available at http://www.math.wustl.edu/ ,sawyer/geneconv/index.html) [42], the pairwise homoplasy index (PHI) [43], maximum x 2 [44] and the neighborhood similarity score (NSS) [45]. These last three methods were applied using the PhiPack package [43]. Using the eight datasets of sequences aligned by ClustalW as described above, regions with indel polymorphisms and sequences containing ambiguous sites were removed first. Additionally, when two or more sequences were identical, only a single version was retained. In the GENECONV analyses, recombination was assessed using only silent site polymorphisms. The number of mismatches allowed between recombining fragments (gscale) was set to 0 and putative recombinant regions were determined for inner fragments using 10,000 random permutations. A window size of 50 nucleotides was used in the PHI analyses. In the maximum x 2 tests, a fixed window size of 2/3 the number of polymorphic sites was applied. p values were estimated based on 1,000 permutations for PHI, maximum x 2 and NSS with significance at a ,0.05.

Analyses for Selection
To examine selection, the non-synonymous (dN) and synonymous (dS) rate ratios (dN/dS) [46] for each of the seven housekeeping genes and ankA were assessed using START2. For ankA we also looked for evidence of site-specific selection in this gene. To do so, the CODEML program in PAML 4 [47] was applied. Using the tree topology generated from our MLST analysis, two site-specific models that allow dN/dS rate ratios (v = dN/dS) to vary among codons were applied. These models were M1a (nearly neutral), which only allows site classes for v to vary between 0 and 1 and M2a (positive selection), which allows an additional site class for v .1 [48]. A likelihood ratio test (LRT) comparing these two models to a x 2 distribution (degrees of freedom = 2) was carried out to determine whether there is significant evidence for positive selection in the ankA gene. The Bayes empirical Bayes method (BEB) as implemented in CODEML was used to identify sites that have evolved under positive selection [49].

Allele-based MLST Analysis
In 8 samples the DNA was depleted before completion. In 383 of the 391 samples, all seven housekeeping genes could be amplified. However, the analysis was hindered by the fact that numerous chromatograms contained double peaks, a phenomenon that was most prominent in samples from roe deer, red deer and European bison. Double peaks were almost exclusively restricted to variable sites and were generally present in more than one of the seven alleles of a given sample. A total of 99 samples showing ambiguous nucleotides had to be excluded from the allele-based MLST analysis. Thus, only in 284 cases a sequence type (ST) could be ascribed. In total, 90 different STs were detected. 67 (74%) of them were found only once and thus were unique. The distribution of STs among the different host species is shown in Table 1. The allele frequency and the number of polymorphic sites for each locus are shown in Table 2.
STs can be grouped into clonal complexes by similarity of their allelic profile [23]. In our data set, we defined a clonal complex by sharing identical alleles at five of the seven loci with at least one other member of the group. The relationship between the STs was analyzed using eBURST (http://eburst.mlst.net/), which identified eight clonal complexes ( Table 3). 223 of 284 (79%) A. phagocytophilum strains belonged to a clonal complex. However, with the exception of clonal complexes 1, 2 and 5, the other clonal complexes contained less than 10 strains. The most prominent clonal complex 1 contained 177 strains. ST 25 was identified as a founding genotype, because it differed from the largest number of other STs of clonal complex 1 at only one single locus. This was supported by a bootstrap value of 67% (1,000 replicates) and suggests it may resemble the common ancestor of the other STs. All typeable strains from humans and dogs from Europe and all typeable strains from horses, hedgehogs, wild boars, cats and red foxes belonged to clonal complex 1 ( Table 4). The same was true for a proportion of the tick strains and for a minority of sheep strains. In contrast, all 11 strains from the USA (10 human, 1 canine) were part of clonal complex 5. Interestingly, A. phagocytophilum strains from clonal complexes 1 and 5 did not share any of the alleles, indicating a longer evolutionary distance between strains from Europe and North America. All typeable strains from voles and shrews belonged exclusively to clonal complex 2. Thus, the allele-based MLST analysis proved to be informative with respect to host origin and geographic provenance.
In a fully clonal population that reproduces asexually without recombination taking place, mutations will accumulate in nonrandom combinations. This situation is called linkage disequilibrium meaning that two or more alleles are observed together in a genome more frequently than by chance alone [52]. Calculating the standardized index of association [36] significant linkage disequilibrium was detected in our data set indicating a clonal population structure.
However, only 41 (46%) of the 90 STs defined so far were part of a clonal complex, whereas 49 (54%) were found to be unlinked. In contrast to European strains from dogs, humans and horses, which uniformly belonged to clonal complex 1, strains from sheep and roe deer showed a marked heterogeneity with a high proportion of unlinked STs ( Figure S1). The population snapshot shown in Figure S1 thus indicates that the population structure of A. phagocytophilum might be semiclonal with a high proportion of relatively rare unlinked STs superimposed by a limited number of rather frequent genotypes.

Sequence-based MLST Analysis
Allele and ST numbers themselves are uninformative concerning genetic relationship, because an allele change does not reflect the number of nucleotide exchanges. Thus, the eBURST analysis is rather independent of the sequences as it takes into account only the allelic profiles. Furthermore, in 99 (26%) of the 383 samples the allele-based analysis could not be applied because they contained ambiguous nucleotides. We therefore constructed a  Table 2. Allele frequency and number of polymorphic sites of the seven loci of the A. phagocytophilum MLST scheme 1 and of the partial 16S rRNA 2 and ankA 3 gene sequences.  Clonal complexes were defined by sharing identical alleles at five of the seven loci with at least one other member of the group. 2 41 of the 90 STs belonged to a clonal complex, whereas 49 STs were found to be unlinked. 3 Prevalence of the respective STs in the data set. 4 The number of different STs forming a clonal complex is given in parenthesis. doi:10.1371/journal.pone.0093725.t003 maximum-likelihood (ML) phylogenetic tree using the use all sites option. This sequence-based phylogenetic analysis revealed that A. phagocytophilum strains from humans, dogs, horses, wild boars and hedgehogs were very uniform, whereas strains from ruminants such as sheep, cattle, European bison, roe deer and red deer showed a much higher degree of heterogeneity (Figure S2 A). Three major clusters could be distinguished. Cluster 2 contained all 46 roe deer strains and 1 of the 17 red deer strains. All 24 vole and all 3 shrew strains were part of cluster 3, which is equivalent to clonal complex 2. All other A. phagocytophilum strains from humans, hedgehogs, domestic, farm and large wild animals belonged to cluster 1. Strains from the USA were found in cluster 1 but on a separate branch supported by a bootstrap value of 99%. With the exception of cluster 3, tick strains were scattered around the tree as expected. However, the bootstrap values supporting the separation of clusters 1 and 2 did not exceed 75%. Because ambiguous nucleotides were almost exclusively restricted to the most informative variable sites, a further ML tree was constructed from the 284 samples containing unambiguous bases only. It showed an increased statistical significance for the separation of the roe deer strains (Figure S3 A). The average identities and similarities between the different concatenated allele gene clusters are shown in Table 5.
While we could distinguish A. phagocytophilum strains from the USA from those from Europe by means of MLST, they were 99% identical both at the nucleotide and amino acid level to European strains. Further, roe deer as well as vole and shrew strains were found to cluster apart from human, domestic and farm animal strains. Thus, roe deer, voles and shrews are unlikely to be reservoir hosts for granulocytic anaplasmosis in these species.

Structure Analysis using the MLST Loci
The program STRUCTURE [38] was used to examine population clustering among the A. phagocytophilum strains investigated. A dataset of 518 variable sites was created consisting of all single-nucleotide polymorphisms found in the seven MLST loci. The structure analysis most strongly supported two A. phagocytophilum population clusters (Table 6). When the number of clusters was set to two (K = 2), STRUCTURE clustered strains from voles and shrews together, whereas all other strains comprised a second cluster ( Figure S4). Evanno's DK tends to be conservative [40]. Thus, smaller numbers of clusters representing the most highly differentiated sets of populations tend to be favored by this method. When the number of clusters was set to three (K = 3), A. phagocytophilum strains from roe deer formed a third distinct cluster, which is in agreement with the sequence-based MLST analysis. Subsequent increases in the number of clusters resulted in a further division with strains from cattle and European bison being separated from strains from humans, dogs, horses, wild boars and hedgehogs. A. phagocytophilum strains from sheep and red deer exhibited an admixture between these two clusters. Interestingly, when the number of clusters was set to four (K = 4), human strains from the USA clustered most closely with strains from cattle and European bison and not with the other European strains from humans. As seen in the sequence-based MLST analysis, A. phagocytophilum strains from ticks clustered with multiple groups.

16S rRNA Gene-based Typing
In the past, the 16S rRNA gene was used most often for the molecular characterization of A. phagocytophilum [22]. Therefore, the infecting 16S rRNA gene variant was determined in all 391 samples. In 22 (6%) of these samples ambiguous nucleotides indicating double infections were found. Host species and identities of the detected 16S rRNA gene variants to GenBank entries are shown in Table S4. The phylogenetic analysis using either all strains or those containing unambiguous nucleotides only did not reveal a clear clustering of the strains according to their host species origin (Figures S2 B and S3 B). The bootstrap support of the branches was generally low. In particular, roe deer as well as vole and shrew strains were not sufficiently separated from strains of other host origin. Further, in contrast to the MLST analysis, A. phagocytophilum strains from the USA and Europe could not be distinguished by means of 16S rRNA gene-based typing. This might be due to the fact that in our data set the 16S rRNA gene was found to be highly conserved showing only nine (2%) polymorphic sites in a fragment of 497 bp, whereas the loci chosen for MLST contained 10% to 20% polymorphic positions ( Table 2).
ankA Gene-based Typing The housekeeping genes used for MLST are prone to maintain their metabolic function and are therefore likely to be evolutionarily conserved. In contrast, the ankA gene is thought to be an effector protein of a type IV secretion system (T4SS) of A. phagocytophilum [53]. Thus, it might be a virulence factor and it has been hypothesized to be involved in host adaptation underlying diversifying selection [24,25]. Accordingly, the ankA gene was found to be more diverse, showing 330 (61%) polymorphic sites ( Table 2). 23 samples had to be excluded from the allele-based analysis, because the ankA sequences contained ambiguous nucleotides.
The phylogenetic analyses of ankA using either all strains or those containing unambiguous nucleotides only revealed similar results ( Figures S2 C and S3 C) separating the A. phagocytophilum strains into five ankA gene clusters as has been described earlier [24,25,34]. Major branches were supported by bootstrap values of up to 100%. As for the MLST analysis A. phagocytophilum strains from the USA were found to cluster with strains from Europe, but were found on a separate branch with strong bootstrap support. Strains from the USA were 97% identical at the nucleotide level and 96% similar at the amino acid level to those from Europe. The average identities and similarities between the different ankA gene clusters are shown in Table 7.
The distribution of the A. phagocytophilum strains into five ankA gene clusters according to their host species origin is shown in Table 8. In 6 of the 386 samples analyzed ankA sequences belonging to two different ankA gene clusters were found (2 roe deer: cluster 2 and cluster 3, 1 roe deer: cluster 2 and cluster 4, 1 red deer: cluster 3 and cluster 4, 1 cattle: cluster 1 and cluster 4, 1 tick: cluster 2 and cluster 4). With the exception of cluster 5, tick strains were as expected scattered around the tree. Strains from voles and shrews were exclusively found in cluster 5, whereas those from roe deer were part of cluster 2 (33/46), cluster 3 (9/46) and cluster 4 (4/46). On the other hand, cluster 5 was restricted to vole and shrew strains and clusters 2 and 3 with the exception of 2 red Table 5. Net average identities 1 and similarities 2 between the different concatenated allele gene clusters in percent (n = 383) 3 . At the nucleotide level (roman). 2 At the amino acid level (italics). 3 383 strains in which all seven loci could be amplified were included. doi:10.1371/journal.pone.0093725.t005 Table 6. Results of the STRUCTURE analysis using the sequences of the seven MLST loci of the A. phagocytophilum strains (n = 383

Analysis for Recombination in the MLST Loci and ankA
Recombination in bacteria facilitates their adaptation to different environments [54,55,56]. However, recombination significantly affects phylogenetic analyses, because a single recombi-nation event may introduce multiple nucleotide exchanges at once [54,55,56]. We used four separate analyses to test for recombination in each of the seven housekeeping genes and ankA. The tests applied were GENECONV [42], the pairwise homoplasy index (PHI) [43], maximum x 2 [44] and the neighborhood similarity score (NSS) [45].
Of the seven housekeeping genes, four (glyA, mdh, dnaN, atpA) showed no evidence of recombination in any of the four tests, whereas three (pheS, fumC, sucA) did produce a significant result, but only for the PHI test ( Table 9). The GENECONV, maximum x 2 and NSS tests were not significant for any of the seven MLST loci. By contrast, all four tests revealed significant evidence for recombination in the ankA gene. The GENECONV test indicated that a 318 bp fragment from position 133 to 450 in the examined part of the ankA gene exhibited strong evidence of being recombined.  Analysis for Selection in the MLST Loci and ankA As well as recombination, selection can impact phylogenetic and structure analyses. This would be especially true if divergent host adaptation has accelerated evolutionary changes in any of the genes we examined. Investigating the number of non-synonymous versus synonymous substitutions in our data set, we found the dN/ dS ratio [46] to be ,1 for all seven housekeeping genes ( Table 2), indicating that stabilizing selection is most likely maintaining their metabolic function. The dN/dS ratio for ankA was also found to be ,1, although the value was elevated compared to the MLST loci. In contrast to the result from the dN/dS ratio, the average similarities at the amino acid level between the different ankA gene clusters were considerably lower than the identities at the nucleotide level indicating that most polymorphic loci are nonsynonymous (Table 7).
Because ankA may be involved in host adaptation [24,25], we also looked for evidence of site-specific selection in this gene. To do so, the CODEML program in PAML 4 [47] was applied. The results comparing the nearly neutral model (M1a) to the model allowing positive selection (M2a), suggested that the ankA gene has evolved under adaptive evolution (Table 10). 31 sites showed support for positive selection (Table S5). Recombination has likely biased these patterns, but even when sites within the recombined region indicated by the GENCONV analysis (nucleotides 133 to 450) were excluded, 13 sites still appeared to have evolved under positive selection.

Comparison of MLST, 16S rRNA and ankA-gene based Typing
Membership in one of the three concatenated allele gene clusters (Figures S2 A), as well as to one of the five ankA gene clusters (Figures S2 C) was included as a partition in the comparison of the different typing methods. As shown in Table 11, the typability was highest for the concatenated allele and the ankA gene clusters because ambiguous nucleotides do not disturb this kind of analysis. The typability was lowest for the STbased MLST analysis. This is because this analysis cannot be applied if there are any ambiguous positions in any of the seven sequences. On the other hand, the discriminatory index used as an indication of the power of a typing method to distinguish between unrelated strains was highest for ST-based MLST and the ankA allele-based analyses. Further, as mentioned above, both of these typing methods were geographically informative separating A. phagocytophilum strains from the USA and from Europe.
To test for the concordance between the different typing methods, Wallace coefficients [50] were calculated. Country and host species origin of the strains were regarded as partitions. Because the A. phagocytophilum strains from ticks were scattered around the phylogenetic trees of the concatenated allele and ankA sequences, they were excluded from the analysis. Despite the geographical sampling bias in our data set, the concordance between the country of origin and all other partitions was low ( Table 12). The continent of origin was not included in the analysis, because only 11 samples from the USA (10 human, 1 canine) were available. The concordance between ST and 16S rRNA gene-, ankA allele-and ankA cluster-based typing, respectively, was high with Wallace coefficients .0.75. The strongest association between host origin and typing method was found for the clonal complex, the concatenated allele and the ankA gene cluster.
To make sure that this agreement was not due to chance alone, adjusted Wallace coefficients [51] were calculated. As shown in Table 13 similar results were obtained. Because only 223 of 383 strains were part of a clonal complex, the clonal complex was Table 9. Results of the four tests for recombination. excluded as a partition. Again, the host origin showed the strongest relationship with the concatenated allele and ankA gene clusters. This indicates that ankA might be involved in host adaptation. A graphical representation of the concordance between the different partitions is shown in Figure S5. When the results of the sequenced-based phylogenetic analyses were compared, MLST and the ankA-based typing separated strains from voles and shrews as well as from roe deer from all other strains. In the sequence-based MLST analysis, A. phagocytophilum strains from humans, hedgehogs, domestic, farm and large wild animals except roe deer belonged to the same cluster. However, the ankA-based typing revealed that strains from sheep, cattle, red deer, European bison and chamois belonged to two different ankA gene clusters (cluster 1 and cluster 4), only one of them (cluster 1) containing the A. phagocytophilum strains from humans, domestic animals and hedgehogs.

Discussion
MLST is a widely accepted method of bacterial typing that benefits from standardization, a universal nomenclature and free data exchange [23]. The development of an MLST typing scheme became possible after the whole genome sequence of A. phagocytophilum had been published [26]. Interpretation of earlier data on the molecular characterization of A. phagocytophilum is hindered by the fact that a variety of different target genes and a limited number of strains was investigated [22]. We here provide for the first time a standardized typing method that is easily transferable between different laboratories. We validated our approach on a large data set of almost 400 A. phagocytophilum strains and have made MLST profiles and isolate databases freely available (http://pubmlst.org/aphagocytophilum/). We showed that our MLST data were informative concerning host species and geographic origin, whereas the most widely used 16S rRNA genebased typing was not. Thus, 16S rRNA gene-based typing does not allow to reliably define A. phagocytophilum genotypes as it had been shown before [24,32,34,57,58].
The major drawback of the MLST performed here was that multiple infections with different A. phagocytophilum strains seem to be more the rule than the exception. This phenomenon has been described previously in dogs [59], sheep [60], cattle [27], roe deer [61] and red deer [61]. In our data set farm and large wild animals, small mammals and ticks were especially prone to carrying multiple genetic variants. In contrast, in humans and Table 10. Log-likelihood values and parameter estimates for the CODEML analysis of ankA.  Table 11. Tyability and discriminatory indices of MLST, 16S rRNA and ankA gene-based typing of A. phagocytophilum strains.  [34], simultaneous transmission might occur. However, detection limit of such variants in vivo is probably not identical as it has been shown experimentally in sheep that distinct A. phagocytophilum strains differ in their ability to cause peak bacteraemias [62,63]. On the other hand, superinfection with different genetic variants has been demonstrated in a sheep infection model [64]. Therefore, both scenarios may occur in vivo. Accordingly, we detected identical as well as different STs and ankA alleles at various time points in the six hedgehog individuals that were sampled more than once (data not shown).
The housekeeping genes, as well as the 16S rRNA and the ankA gene analyzed here, are single copy genes [26]. It is therefore unlikely that a certain A. phagocytophilum strain possesses more than one allele that could lead to the ambiguous nucleotides that we observed in the chromatograms. However, for the related species A. marginale it was found that the vaccine strain A. marginale subsp. centrale was genetically heterogeneous at multiple loci [65]. If a similar phenomenon occurs in A. phagocytophilum as well, this would represent an alternative explanation, but in turn would question the concept of a strain as a genetically homogenous bacterial population. Because A. phagocytophilum replicates obligate intracellularly, clones cannot be separated easily. Thus, cell culture isolates grown from multiple infected hosts will not readily solve the problem of typing these strains.
Although the MLST loci represent only a minor portion of the bacterial genome, strain relatedness inferred from MLST was reflected by whole genome data [52]. To date, whole genome sequencing usually requires the isolation of the respective strain. Isolation of A. phagocytophilum in cell culture is possible, but laborious [66]. Therefore, the MLST analysis we propose here offers the advantage that it can be done using small amounts of DNA isolated directly from the infected host.
In our study, A. phagocytophilum strains from humans, dogs, horses, wild boar, and hedgehogs were found to be homologous because, with the exception of the North American strains, all belonged to clonal complex 1 and clustered together in the ankA gene-based phylogenetic analysis. In Europe, clustering of human, canine and equine strains has been observed previously amongst others using groESL [67] and ankA [24] sequences. A. phagocytophilum strains from wild boars and hedgehogs were equally part of clonal complex 1. This suggests they may serve as reservoir hosts for granulocytic anaplasmosis in humans and domestic animals. For wild boars, this has been proposed previously because of groESL- [68] and ankA- [69] based typing. European hedgehogs are infested with both, the hedgehog specialist I. hexagonus and the generalist I. ricinus [70]. Therefore, it has been suggested earlier that I. ricinus ticks fed on European hedgehogs could transfer A. phagocytophilum to humans and animals [29,30].
The clustering of msp2 sequences from humans, dogs and horses from North America has been observed previously [71]. Further, whole genome sequencing revealed that two human strains and one canine strain from the USA were closely related [72]. In our study, the canine A. phagocytophilum strain from the USA belonged to the same clonal complex 5 as the 10 North American strains from humans. This is in line with previous findings, although only one dog sample was available to us. In contrast to the 16S rRNA gene, MLST and ankA-based typing differentiated between A. phagocytophilum strains from Europe and the USA. Strains from the European clonal complex 1 and the North American clonal complex 5 did not share any alleles indicating transcontinental diversification. Nevertheless, human strains from both continents were highly identical, especially at non-synonymous sites. Further, all strains from humans, dogs and horses from Europe belonged to the same clonal complex. As canine and equine granulocytic anaplasmosis occurs frequently in Europe [13,14], HGA is likely to be underdiagnosed in Europe.
Although three A. phagocytophilum strains from sheep belonged to the clonal complex 1, most of the ovine strains were assigned to unlinked STs. The phylogenetic analysis using the concatenated allele or ankA sequences revealed that the sheep strains were not as uniform as those from humans and domestic animals, but showed a higher diversity. This may suggest admixture in these strains as a similar result was seen in the STRUCTURE analysis. A higher degree of strain diversity was also found for A. phagocytophilum strains from cattle, red deer and European bison. Further, whereas strains from sheep, cattle, red deer and European bison were found in the same cluster in the sequence-based MLST analysis, they belonged to two different ankA gene clusters, cluster 1 and cluster 4. This heterogeneity of strains infecting ruminants could be the result of genetic exchange via recombination between A. phagocytophilum strains in multiply infected highly tick-exposed animals. Evidence that recombination might occur at least within the ankA gene is already available [25] and is further supported here. Another possibility is that the adaptation of A. phagocytophilum to humans and domestic animals occurred more recently than to sheep and cattle.
Using MLST and ankA-based typing, we found that red deer strains were closely related to those from humans and domestic animals. However, they did not belong to the clonal complex 1 that contained the human, canine and equine strains. This makes them less likely to serve as reservoir hosts for granuloytic anaplasmosis in humans and domestic animals, and questions earlier suggestions based on groESLand ankA-based typing [24,67]. The results we present here indicate that they might instead harbour A. phagocytophilum strains infectious for sheep and cattle. However, their role as reservoir hosts for granulocytic anaplasmosis in sheep has been critically debated recently using msp4-based typing [73].
As we found for sheep strains, roe deer strains were mostly assigned to unlinked STs. The phylogenetic analysis using the concatenated allele sequences revealed that all of them were part of the same cluster. With the exception of one A. phagocytophilum strain from a red deer, this cluster contained only strains from roe deer and ticks. A similar observation was made in the ankA genebased phylogenetic analysis, where roe deer strains were found almost exclusively in ankA gene clusters 2 and 3 indicating that roe deer are unlikely to be the reservoir for granulocytic anaplasmosis in humans, domestic and farm animals as it has been suggested previously [24,67].
In contrast to roe deer, all A. phagocytophilum strains from vole and shrews were part of the clonal complex 2. They were only distantly related to all other strains investigated showing a limited identity of 93% at the nucleotide level to the concatenated allele sequences of the rest of the strains. Voles and shrews are therefore unlikely to serve as reservoir hosts for granulocytic anaplasmosis in humans, domestic and farm animals. This has already been suggested before based on groESL- [74] and ankA-based typing [25] and contrasts with North America where the white-footed mouse (Peromyscus leucopus) is a well established reservoir species [18,19].
Instead, in Europe voles might be involved in a separate enzootic cycle probably with I. trianguliceps as the tick vector [32]. Although, there might be some sampling error in our data set, we did not find any of the A. phagocytophilum strains from I. ricinus ticks analyzed here clustering with the strains from voles and shrews supporting this hypothesis.
Analyzing the population structure of A. phagocytophilum we found it most likely to be semiclonal with a high proportion of relatively rare unlinked STs superimposed by a limited number of rather frequent genotypes. There was one major 'epidemic' clonal complex containing all strains from humans, dogs and horses that might have evolved more recently than the great variety of the other unlinked STs. Our results indicate that some recombination may have occurred in three of the seven MLST loci. This recombination may disturb a fully clonal population structure in A. phagocytophilum. However, the support for recombination was not strong, because only one of the four tests performed was significant. Further, the phylogenetic analyses were only minimally influenced when these three genes were removed from the dataset and none of the major clusters was affected (data not shown).
Comparing the concordance between the three different typing methods used here, we observed a clear relationship between all of them. Variation in the 16S rRNA gene might therefore be linked with other loci such as certain housekeeping genes or ankA, but used alone, it is not sufficient for reliable typing. Whereas the MLST loci represent the core genome [52], ankA has been hypothesized to be involved in host adaptation underlying diversifying selection [24,25]. This may limit its suitability for typing A. phagocytophilum strains, especially as recombination may facilitate host-specific adaptation and obscure phylogenetic signatures. Nevertheless, although we observed a huge diversity within this gene, the ankA-based typing provided similar results as the MLST analysis in separating strains from roe deer as well as from voles and shrews from all other strains. Whether these molecular data are reflected by in vivo differences should be investigated in cross-infection experiments. However, the A. phagocytophilum strains used should be carefully typed because the in vivo data available to date have been limited by incomplete molecular characterization of the isolates.
Although our MLST analysis was hindered by the occurrence of multiple infections, we found it a valuable new tool to investigate the genetic variation in A. phagocytophilum. As our analyses were not designed as an epidemiological study and because of a considerable collection bias in our data set, our results have to be confirmed by further investigations with a structured sampling plan. We hope that others might submit further profiles and isolates data to http://pubmlst.org/aphagocytophilum/in order to increase our knowledge concerning the population structure of this emerging pathogen.

Conclusions
Based on this study, we conclude that HGA is likely to be underdiagnosed in Europe. Further, we provide evidence that wild boars and hedgehogs may serve as reservoir hosts for granulocytic anaplasmosis in humans and domestic animals in Europe. Red deer strains were found to be less uniform, but red deer may harbour A. phagocytophilum strains infectious for sheep and cattle. In contrast, roe deer, voles and shrews are unlikely to be reservoir hosts for granulocytic anaplasmosis in humans, domestic or farm animals. Lastly, the ankA gene is presumably involved in hostspecific adaptation of A. phagocytophilum strains that probably occurred via both recombination and positive selection.