Insights to Genetic Characterization Tools for Epidemiological Tracking of Francisella tularensis in Sweden

Tularaemia, caused by the bacterium Francisella tularensis, is endemic in Sweden and is poorly understood. The aim of this study was to evaluate the effectiveness of three different genetic typing systems to link a genetic type to the source and place of tularemia infection in Sweden. Canonical single nucleotide polymorphisms (canSNPs), MLVA including five variable number of tandem repeat loci and PmeI-PFGE were tested on 127 F. tularensis positive specimens collected from Swedish case-patients. All three typing methods identified two major genetic groups with near-perfect agreement. Higher genetic resolution was obtained with canSNP and MLVA compared to PFGE; F. tularensis samples were first assigned into ten phylogroups based on canSNPs followed by 33 unique MLVA types. Phylogroups were geographically analysed to reveal complex phylogeographic patterns in Sweden. The extensive phylogenetic diversity found within individual counties posed a challenge to linking specific genetic types with specific geographic locations. Despite this, a single phylogroup (B.22), defined by a SNP marker specific to a lone Swedish sequenced strain, did link genetic type with a likely geographic place. This result suggests that SNP markers, highly specific to a particular reference genome, may be found most frequently among samples recovered from the same location where the reference genome originated. This insight compels us to consider whole-genome sequencing (WGS) as the appropriate tool for effectively linking specific genetic type to geography. Comparing the WGS of an unknown sample to WGS databases of archived Swedish strains maximizes the likelihood of revealing those rare geographically informative SNPs.


Introduction
Francisella tularensis is a highly virulent, facultative intracellular pathogen, which causes tularaemia in humans and animals [1]. Transmission occurs via arthropod bites (mosquitoes and ticks), inhalation of contaminated dust, or ingestion of contaminated food or water. Tularaemia is endemic in Sweden and has been reported since 1931 causing between 100-700 infections annually [2]. The most common clinical form is ulceroglandular tularaemia and all reported human cases of tularaemia in Sweden have been caused by F. tularensis subspecies holarctica (type B). The disease occurs with a seasonal pattern -being especially prevalent in summer and autumn. It is most common in Northern and Middle Sweden but during the last decade the infection has spread further south (SmiNet2, Swedish National Surveillance System).
The aim of present study was to evaluate canSNPs, MLVA, and PmeI-PFGE for practical uses in epidemiological investigation of F. tularensis infections in Sweden and in combination with geomapping to possibly predict sources of infection and reservoirs.

Molecular typing
To evaluate the effectiveness of SNP, MLVA and PFGE typing as methods for tracing source and place of infection within an endemic country, we analysed 127 clinical specimens (Table S1). We discovered that all three typing methods identified two major genetic groups with near-perfect agreement. Six samples were removed from canSNP analysis due to DNA quality issues. Analysis with canSNPs revealed that all F. tularensis samples were assigned to two major groups: B.7/8 (n = 24) and B.12 (n = 97) ( Figure 1A). The proportion of B.12 phylogroup to B.7/8 was in agreement with previous publication [8]. MLVA analysis resulted in two major clusters, Tul-I and Tul-II that corresponded with the B.12 and B.7/8 lineages, respectively (Table S1; Figure 1A). Two samples falling within the B.12 lineage formed a distinct MLVA cluster, Tul-III (Table S1). The two major MLVA clusters (Tul-I and Tul-II) were based on differences in the two less discriminating markers, Ft-M22 and Ft-M24, with only two allelic variants each. Tul-III MLVA cluster differed from all other MLVA types in the allele combination of VNTR markers Ft-M24 and Ft-M22 with 1 and 4 copies, respectively. Two other canSNP groups (B.4 and B.10) that were not found in this study have been previously reported in Sweden [8,11,12]. However, samples of these phylogroups are rather rare in Sweden so their lack of presence in this study may be due to our smaller sample size.
PFGE analysis of the 124 clinical samples revealed three different but rather similar PFGE profiles, types 1-3 ( Figure 2). Comparison of PFGE data to MLVA and canSNP data (Table S1) revealed that PFGE types 1 and 3 corresponded to the two major canSNP groups B.12 (Tul-I) and B.7/8 (Tul-II), respectively (Table S1). MLVA cluster Tul-III corresponded with PFGE type 2. Taken together, canSNP, MLVA, and PFGE were in general agreement when classifying samples into major groups.
At this higher resolution level, MLVA and canSNP assignment of samples were not in consistent agreement. Samples within identical MLVA genotype did not fall into a single canSNP group, but rather fell into multiple SNP phylogroups ( Figure 1A; Table  S1). Several canSNP groups contained multiple MLVA genotypes ( Figure 1A; Table S1). This lack of grouping agreement is due to the type of genetic markers (SNPs or VNTRs) targeted by each typing system. SNP mutations are highly stable and therefore are informative of the genetic relationships among groups of isolates [13]. MLVA typing is based on VNTR loci that can be prone to mutational instability and, therefore, often result in homoplasy, making inferences about relationships among groups of isolates unreliable. When SNP analysis and MLVA are used in a hierarchical scheme, accurate genetic relatedness among SNPdefined groups of F. tularensis samples can be confidently known ( Figure 1A) [12,13] and the MLVA data provided finer levels of discrimination among samples within the same phylogroup.

Geographic distribution of SNP groups
A highly complex geographic pattern among the phylogroups exists at a national and regional scale in Sweden ( Figure 1). Our data present a pattern of wide distribution of multiple phylogroups that are found in multiple relatively distant counties in Sweden. Samples from both major groups (B.7/8 and B.12) are represented in middle and northern Sweden, with the highest density found in middle Sweden. Southern Sweden is sparse and is represented by samples classified in closely related terminal phylogroups within B.12 lineage (B.33/34 and B.34/35). Nearly all phylogroups identified in this study are present in middle Sweden, making this region the center of diversity, as previously described [8].
As a consequence of wide distribution of multiple phylogroups, extensive phylogenetic diversity was found within individual counties in middle Sweden. The richest diversity is found in Stockholm area ( Figure 1B, county AB), representing seven diverse phylogroups, each with multiple MLVA subtypes. Ö rebro, a city in middle Sweden, was co-localized with 3 distinct phylogroups and MLVA subtypes (Table S1 and Figure 1, county T). Most samples appeared to cluster along major waterways, such as rivers and coastal areas of the Swedish east coast, the Baltic Sea. There was no obvious geographical correlation to age and gender of the patients (data not shown), but adults were significantly more commonly infected than children. Water bodies, like the Dalä lven river, appears to be linked to multiple phylogroups that are diverse ( Figure 1B, county W).
In contrast, Ljusnan river (county X in Figure 1B) is linked to seventeen case-patient samples that are highly genetically similar despite the 12 year span (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006). All samples, except one, belonged to subgroup B.22 MLVA D4. The remaining sample fell in the B.21/22 subgroup, which is the closest relative to B.22 samples. Despite this pattern, 3 B.22 samples were found in 3 other nearby counties in middle Sweden, including Stockholm ( Figure 1B), which may reflect recent dispersal from the founding source. Taken together, small regional sections or water bodies are co-localized with multiple phylogroups with great genetic diversity.

Discussion
The complex phylogeographic pattern of tularemia distribution in Sweden poses a great challenge to accurately identify a source and place of infection for any given F. tularensis specimen within Sweden (Figure 1). Our attempts to correlate geographic origin of the samples to specific genetic types did not generate a clear cut result ( Figure 1B) despite employing a canSNP typing scheme that provided higher genetic resolution [6] than in the study performed by Karlsson et al 2013 and colleagues. This SNP typing scheme allowed us to place 121 clinical specimens, collected throughout Sweden over a span of 16 years, on the existing global phylogenetic tree for F. tularensis subsp. holarctica ( Figure 1A) [6]. All phylogroups identified in Sweden have also been found in distant nations abroad, except B.22, which is restricted to Sweden. Given this broader dispersal range, it is no surprise that samples within these specific phylogroups are broadly dispersed in Sweden across multiple counties and not restricted to a small geographic region. The broad distribution of closely related phylogroups is central to the argument that F. tularensis is a rapidly dispersed organism. The basis for this rapid dispersal remains unclear, but recent reports suggest dispersal could be facilitated by the migration of birds [19,20].
Despite the phylogeographic complexity, patterns emerge that suggest the possibility of identifying SNP markers that could be meaningfully linked to geographic regions or at least narrow the geographic range of possible places. An example of this is the SNP for B.22 phylogroup, which is comprised of samples found only in Sweden [6,11,12]. It is interesting to note that unlike all other phylogroups in this study, B.22 SNP was identified from a Swedish genome (FSC200) linked to the Ljusnan river in county X (Figure 1). This B.22 SNP was highly specific to FSC200 genome. All other phylogroups are defined by SNPs discovered from genomes found in other nations (USA, Russia, and Hungary) [6,11,12]. Most case-patients (16/17) linked to the Ljusnan river, collected over a 12 year time frame (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006) (Figure 1B), typed as B.22 MLVA D4 (Table S1). The striking genetic similarity suggests that they all recently emerged from a common ancestor. The temporal pattern rules out a single outbreak season and provides insight into the stability of this genotype. Taken together, these data suggest that the B.22 D4 genotype originated from a common ecological niche that spanned the approximately 70 kilometres distance of the river. Intense localized sampling of beaver populations may verify this hypothesis. Beavers, which are common on this river, could be a viable source given that these semi-aquatic rodent species have shown evidence of seroconversion to F. tularensis [21]. Three B.22 samples were found in other neighboring counties, one B.22 D4 type was found in Stockholm and two B.22 samples with D25 & D32 subtypes were found in counties W and S ( Figure 1B). The B.22 samples located in multiple counties may reflect very recent dispersal events, which is supported by the MLVA subtype data, or errors in epidemiological records.
The B.22 SNP provides a striking line of evidence that supports the pattern that closely genetically related samples tend to have closer geographic proximity despite the complex phylogeographic landscape found in Sweden. Extrapolating from this model, SNPs that are highly specific to the sequenced genome may be found most frequently among samples recovered from the same location where the reference genome originated. Such SNPs would be both relationally and geographically informative and, therefore, useful in identifying a likely place of infection. That said, due to the rapid dispersal of F. tularensis, the geographical attribution of the B.22 SNP may be time limited.
The results of the present study indicate that SNP typing schemes, designed from geographically informative SNPs, combined with a MLVA typing scheme have the potential to be used as a standalone typing method in outbreak investigations. However, since our attempts to correlate geographic origin of the samples to specific genetic types did not generate a clear cut result despite employing a canSNP typing scheme that provided high genetic resolution we conclude that whole-genome sequencing (WGS) would be the most appropriate tool for effectively linking specific genetic type to geography. Comparing the WGS of an unknown sample to WGS databases of archived Swedish strains  maximizes the likelihood of revealing those rare geographically informative SNPs among genetic near matches. This insight may prove useful for future epidemiological investigation practices.

Clinical samples
The Public Health Agency of Sweden receives continuously clinical specimens for primary diagnostics of F. tularensis from physicians. All samples were received during 1994 to 2010 from patients of both genders, varying ages (1-89 years), and from diverse regions in Sweden (Table S1). Since tularaemia is a noticeable disease in Sweden, cases had been reported to the Swedish National Surveillance System by the treating physician. If no likely place of exposure had been included, the patients were contacted. We extracted F. tularensis subspecies holarctica DNA from patients suffering from ulceroglandular tularaemia: 3 complex clinical samples taken directly from the wound site and 124 cultured samples. Genomic DNA was prepared using two commercially available DNA extraction kits, the QIAamp tissue protocol (Qiagen, Stockholm, Sweden) for samples from 1994 to 2005 and the NucliSens magnetic extraction protocol (Biomérieux, Gothenburg, Sweden) for F. tularensis samples and wound specimens from 2006-2010.

PFGE
PFGE analysis was performed on F. tularensis samples [18]. Agarose plugs were sliced and incubated in 10 U of restriction enzyme PmeI (Biolabs, New England) for 3 hours at 37uC. Electrophoresis was performed in 1% agarose with a switch time of 1.79 to 10.71 s at 6 V/cm for 24 hours at 14uC. Salmonella enterica serotype Braenderup strain H9812 restricted with XbaI was used for gel normalization. Gels were stained with gel red and gel images captured by using a Gel Doc 1000 imager (Bio-Rad).
PFGE images were analysed using Bionumerics v 6.01 (Applied Maths, Sint-Martens-Latem, Belgium). Unique PFGE patterns were analysed and compared manually for band polymorphism.

Geomapping
To obtain phylogeographic patterns we mapped the phylogenetic groups on a geographic map of Sweden at the county level resolution ( Figure 1B). Of the 127 patients, 24 were excluded from this analysis due to unknown or uncertain location of exposure or lack of genotype information. The study protocol was approved by the Regional Ethical Review Board in Stockholm (# 2008/1020-31/2).