Comparison of French and Worldwide Bacillus anthracis Strains Favors a Recent, Post-Columbian Origin of the Predominant North-American Clade

Background Bacillus anthracis, the highly dangerous zoonotic bacterial pathogen species is currently composed of three genetic groups, called A, B and C. Group A is represented worldwide whereas group B is present essentially in Western Europe and Southern Africa. Only three strains from group C have been reported. This knowledge is derived from the genotyping of more than 2000 strains collected worldwide. Strains from both group A and group B are present in France. Previous investigations showed that the majority of sporadic French strains belong to the so-called A.Br.011/009 group A clade and define a very remarkable polytomy with six branches. Here we explore the significance of this polytomy by comparing the French B. anthracis lineages to worldwide lineages. We take advantage of whole genome sequence data previously determined for 122 French strains and 45 strains of various origins. Results A total of 6690 SNPs was identified among the available dataset and used to draw the phylogeny. The phylogeny of the French B group strains which belongs to B.Br.CNEVA indicates an expansion from the south-east part of France (the Alps) towards the south-west (Massif-Central and Pyrenees). The relatively small group A strains belonging to A.Br.001/002 results from at least two independent introductions. Strikingly, the data clearly demonstrates that the currently predominant B. anthracis lineage in North America, called WNA for Western North American, is derived from one branch of the A.Br.011/009 polytomy predominant in France. Conclusions/Significance The present work extends the range of observed substitution rate heterogeneity within B. anthracis, in agreement with its ecology and in contrast with some other pathogens. The population structure of the six branches A.Br.011/009 polytomy identified in France, diversity of branch length, and comparison with the WNA lineage, suggests that WNA is of post-Columbian and west European origin, with France as a likely source. Furthermore, it is tempting to speculate that the polytomy’s most recent common ancestor -MRCA- dates back to the Hundred Years' war between France and England started in the mid-fourteenth century. These events were associated in France with deadly epidemics and major economic and social changes.


Introduction
Bacillus anthracis, the causative agent of anthrax, is a spore-forming bacterium present worldwide. The bacterium mainly affects wild and domesticated herbivores, causing a serious, often fatal disease. Spores which are the infectious form of the bacteria can remain in soil for decades before being ingested by animals while browsing or grazing [1,2]. B. anthracis possesses, together with Yersinia pestis, Mycobacterium tuberculosis and a few other major human pathogens, a highly monomorphic and clonal population [3][4][5]. This may reflect a recent emergence or re-emergence of this pathogen combined with limited opportunities for genetic exchanges.
Since year 2000, major progress has been made in the understanding of the population structure and evolution of the B. anthracis species owing to the access to whole genome sequence data and the resulting possibility to systematically search for genetic polymorphisms. The first tool used to characterize the species at a large scale was tandem repeats polymorphisms [6,7]. Multiple Loci Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) has been applied to more than two thousand strains worldwide (reviewed in [8]). Most of the associated published data can be queried via databases accessible on the internet [8,9].
The resulting view of the population structure of B. anthracis was used to select representative strains for whole genome sequencing and search for single nucleotide polymorphisms (SNPs). A set of 14 single nucleotide polymorphisms (called canonical SNPs or canSNPs) able to subdivide all isolates into the three major lineages (A, B and C) and 13 sub-lineages or subgroups [10][11][12][13][14], consistent with the genetic diversity previously uncovered by applying MLVA, was proposed. Additional SNPs that define the A.Br.Ames and A.Br.WNA lineages or that lie on various branches of the B. anthracis SNP tree have been investigated [11,[14][15][16][17] to address more specific points.
Whole genome SNP surveys allowed proposing age estimates for key evolutionary steps in the evolutionary history of B. anthracis. The most recent common ancestor (MRCA) of the A, B and C lineages is estimated to be a few tens of thousands years old [11].
In spite of these major achievements in the past 15 years, information on the geographic origin of B. anthracis, and on time-scales regarding the more recent evolution events is still lacking. One exception is the proposed dating of the spread of the lineage which is largely dominant in North America (the Western North America or WNA lineage). The genetic diversity observed along this lineage suggested an origin via the Bering land bridge well before the arrival of European colonizers [15,18]. The analyses we present in this report challenge this interpretation.
Whole genome SNP analysis is an unbiased genotyping tool giving very high resolution with strong phylogenetic content. Owing to rapid progress in this field, the approach can now be applied at a reasonable cost especially since the number of B. anthracis isolates is very limited in comparison with pathogens of much higher prevalence. We previously reported the MLVA typing, canSNP characterization and whole genome SNP analysis of 122 strains of B. anthracis isolated in France in the past decades ( [8,17,19]). French strains (mostly veterinary isolates with robust geographic assignment) resolved into about thirty MLVA genotypes belonging to one of the three canSNP lineages, B.Br.CNEVA, A.Br.011/009 and A.Br.001/002. In particular we described that the A.Br.011/009 lineage is a remarkable polytomy with six branches radiating from its MRCA. In the present report, taking advantage of additional published draft or complete whole genome sequences we describe in detail and discuss the phylogenetic position of the French strains within the larger B. anthracis phylogeny. In particular, we demonstrate that the WNA lineage emerged from the A.Br.011/009 polytomy. Consequently, and taking into account historical events, we argue that the WNA lineage is a recent lineage resulting from a post-columbian introduction from Europe, followed by a strongly accelerated evolution rate.
Ames Ancestor (GenBank accession no. NC_007530.2) was used as the reference genome for assembly. SNPs identification was done as previously described [30] using BioNumerics version 7.5 (Applied Maths, Belgium). Sets of pseudo-reads of 100 bp length were created in silico when using contigs or whole genome data. A set of SNPs was deduced for each genome sequence data using the BioNumerics Chromosome Comparisons module. SNP positions at which one or more strains displayed an ambiguous residue call or missing data were discarded. Ribosomal operons and repeated sequences including tandem repeats were masked. The list of SNPs positions is provided in S2 Table. Whole Genome Phylogenetic Analysis A minimum spanning tree was drawn in BioNumerics by using the filtered whole genome sequencing SNP data as input. The tree was rooted by using B. cereus strain AH820 (accession number NC_011773) [31]. The B. cereus genome was interrogated for each SNP position identified within B. anthracis. The canSNPs positions in the global tree were identified from the list of SNPs (S2 Table).

Results and Discussion
Whole Genome Sequencing and SNP identification A total of 6690 chromosomal SNPs was identified by mapping sequence data from 166 strains (122 French strains and 44 additional strains of worldwide origin) against the Ames ancestor reference genome. Fig 1 illustrates the minimum spanning tree (MST) deduced from the SNP data. The color code reflects geographic origins and the strain Id of the 45 additional strains (including the Ames strain) is indicated. The position of the root, as determined by projecting the Bacillus cereus strain is indicated by the red star. In agreement with previous analyses [10], the root is located along the branch leading to the lineage C representative. This root represents the position of the MRCA of the currently known B. anthracis lineages A, B and C. The yellow star indicates the position of the MRCA of the A and B lineages. The distance from the yellow star to tips is comparable along lineages A and B. Along lineage A, the maximum distance corresponds to the west African strain Sen3 [26] located at a distance of 794 SNPs. The shortest branch corresponds to a French isolate located at a distance of 325 SNPs from the yellow star. Along lineage B, strain Kruger_B is located at a distance of 661 SNPs whereas some isolates from the French Alps are located at a distance of 340 SNPs. The most recent common ancestor (MRCA) of the A (green star) and the B (blue star) lineages occur at an almost identical distance from the split between the two lineages indicated by the yellow star (276 versus 264). From these two points, the shortest branches extend for a further 49 and 76 SNPs whereas the longest extend for 518 and 397 SNPs respectively. The two longest branches occur in Africa (Senegal and South Africa), whereas the two shortest are represented by French isolates. This ratio of up to ten between average substitution rate along different lineages is much higher than the previously observed ratio of three [10].
The seven genomes which were initially used to define canSNPs are indicated in red. The position of each of the 16 canSNPs (including A.Br.005, A.Br.010 and A.Br.011 which are not part of the standard and most used canSNPs assay) is indicated by numbers in colored circles, with a different color code for each of the three lineages, A, B and C [11,32]. In agreement with previous reports [8,17,19,33], the French isolates clustered into three canSNP types: A. Br.001/002 (branches located between canSNPs A.Br.001 and A.Br.002), A.Br.011/009 (branches located between canSNPs A.Br.009 and A.Br.011) and B.Br.CNEVA (branches beyond canSNP B.Br.004). The most basal node within group A is defined by the split of the A. Br.005/006 canSNP type represented by one strain from Cameroon [17] and seven from Zambia [34,35]. The Cameroon strain was previously shown to belong to MLVA cluster E described by Lista et al. [36] corresponding to the "?" cluster described by Le Flèche et al. [7] and to the basal Aß lineage described by Maho et al. [37]. This lineage is also widely represented in Chad as can be seen by querying MLVAbank . Fig 1 illustrates that   The A.Br.008/011 lineage is represented by seven strains, including two strains from Bulgaria [28], one representative of the historical Russian vaccine strain Tsiankovskii-I, two strains of unknown geographic origin recovered from drug users [13,24], one strain from Turkey, and one from Ethiopia. The two strains isolated from drug users, UR-1 and Ba4599 [13,24], are very closely related but clearly distinct. UR-1 is characterized by a relatively long branch (178 SNPs) as compared to the Ba4599 terminal branch (15 SNPs). Although not formally impossible in view of other substitution rate distortions, the long branch would need to be confirmed by using the raw Ion Torrent data rather than the resulting available WGS contig assemblies as done in the present investigation. The Ion Torrent technology has been shown to have a higher error rate [39] and random sequencing errors will increase the length of terminal branches. If the UR-1 strain is not taken into account, the length from the A.Br.008/011 group MRCA to the tips is similar, 42-44 SNPs towards the two Bulgarian strains, 84 towards the Tsiankovskii-I Russian vaccine strain, 63 towards the second drug user strain of unknown geographic origin, 65 and 69 towards the strains from Ethiopia and Turkey respectively. The slightly longer vaccine strain branch might reflect extensive in vitro cultivation. These seven A.Br.008/011 strains define two branches starting from the group MRCA. A third branch contains canSNP A.Br011 and leads to the A.Br.011/009 group comprising the majority of sporadic French strains.
The A.Br.011/009 strains define a polytomy with six branches radiating from a central node located at 15 SNPs from the purple star (Fig 1) [17]. The distances from this central node are very variable. They range from a minimum of 12, up to 471 SNPs, corresponding to a remarkable ratio of almost 40. The longest branches are produced by strains from Senegal and Gambia [26]. Even if terminal branches which may concentrate sequencing errors are not taken into account, the length ratio is still 12 to 221, i.e. 18. Very interestingly, the WNA lineage located exclusively in North America [11,15] and represented here by three strains, one from Canada and two from the United States represent the second longest branch, 120-124 SNPs, a ratio of 10 compared to the shortest branch. The WNA and the Senegal-Gambia lineages split very early after departing from the French lineages.
In order to maximize the resolution of the SNP analysis, the SNP screen was applied to the subset of A.Br.011/009 and A.Br.WNA strains. Less SNPs are excluded when less strains are taken into account, mostly due to deletions, partial genome assemblies in public WGS data and/or sequence quality issues in some data sets. Within this subset of strains, 1934 SNPs are identified as compared to 1451 in the corresponding portion of Fig 1, and used to draw Fig 2. The lineages are numbered from 1 to 6 as in [17]. The length of the different French lineages is highly variable, from 17 for lineage 5 up to 76 for lineage 4 not taking into account French strains obtained from third parties-collections potentially more extensively cultured. The red star indicates the position of the root. The WNA and Senegal-Gambia lineages are branching devoid of length value indication have a length of one SNP. The name of the published genomes included in this study is indicated next to their node in the tree. Five colored stars mark major splits in the B. anthracis phylogeny. CanSNPs are indicated in colored disks next to the branch where they occur in the tree.
doi:10.1371/journal.pone.0146216.g001 out from lineage 2 at a distance of 11 SNPs from the central node of the polytomy. Only four SNPs later, WNA and Senegal-Gambia split. Lineage WNA extends for a further 133 SNPs, and Senegal-Gambia for 624 SNPs (or 268 if not taking into account the terminal segments) whereas the rest of lineage 2 extends for only 16 up to 24 SNPs when not taking into account two collection strains. This demonstrates that within the same time span and among very closely related lineages, substitution rates can differ by a ratio of more than 15, i.e. much higher than previously reported. The WNA-Senegal-Gambia split occurs at 30% to 40% along the lineage 2 expansion.
Along this line, the presence in this polytomy of a very short branch (17 SNPs for the lineage 5) found in France has a great impact on previous propositions for the evolution of B. anthracis. The previously proposed pre-columbian Bering Strait hypothesis for the importation of the WNA branch in North America was prompted by the very long branch length of the WNA lineage [15]. The present analysis clearly demonstrates that there is no need to invoke an ancient origin for the WNA lineage. In the same time-span, lineage 5 reached a length of only 17 SNPs, compared to the 151 SNPs length for the WNA lineage. In other words, 151 SNPs are not indicative of a longer time-span than 17 SNPs. A more likely interpretation for the fast expansion of the WNA and Senegal-Gambia lineages is that they occurred in an ecosystem which was new to the infection by B. anthracis, was highly sensitive, and in which B. anthracis benefited from a much increased number of generations per year.
A.Br.001/002 and A.Br.Ames phylogenetic analysis Sequence analysis identified 231 chromosomal SNPs that differentiate the 24 A.Br.001/002 French strains. The length from the purple star to the tips is similar and varies from 131 for one Georgian strain to 198 (Fig 1). All 21 strains from the Doubs department formed a single clonal cluster differing by a maximum of two SNPs and they belong to the same branch as the Sterne vaccine strain. They will together be subsequently called the Sterne lineage. The three additional A.Br.001/002 strains are older samples (collected in 1953, 1954 and 1981) isolated in other departments of France. They are clearly unrelated to the recent Doubs outbreak [40] strains and contribute to a polytomy with four branches, one of which is leading to the A.Br. Ames sublineage (Fig 1). Fig 3 provides a focus on  Ames lineage appear to result from a single outbreak event. Simonson et al. [14] investigated 191 B. anthracis isolates from China, 74 of which were canSNP typed as A.Br.001/002 and 8 were A.Br.Ames. The authors estimate from historical records that the Ames lineage present in the Texas/Louisiana Gulf region split from its Chinese progenitor lineage in the early to middle 1800s. Since the USA-import, the USA specific Ames lineage expanded between 4 and 12 SNPs. The most part of the Ames lineage expansion occurred while in China, before the USAimport.
The tentative dating of the A.Br.011/009 polytomy, and of its most recent common ancestor Under the hypothesis of a French emergence of the A.Br.011/009 polytomy, as implied by the very short French branch (12 SNPs in Fig 1), B. anthracis would have been introduced to North America from Europe, perhaps France. The WNA lineage is believed to have been present in North America at least since 1770: anthrax was present in Haiti at that time according to historical records [41,42] and the WNA lineage is present in Haiti [11]. We hypothesize that the split leading to WNA corresponds to an introduction of anthrax in North America via French settlers to Quebec or other French colonies in the beginning of the seventeenth century. Interestingly, the WNA lineage and the Africa lineage are splitting early (4 SNPs) after the emerging from the lineage 2 identified in France. This short distance seems consistent with a split at the same period of time and with the hypothesis that French navigators could have introduced B. anthracis strains in others countries. French settlements were established in Eastern North America ("Nouvelle France") since the beginning of the seventeenth century (Quebec, 1608) and in Western Africa because of the slave trade almost at the same period (1600-1650). The trade of textiles could have been a source of introduction of the disease in North America and Africa. Later on, the better understanding of the disease, and the strong prophylactic and control measures applied in France since the end of the nineteenth century have probably slowed down the turn-around of B. anthracis during the twentieth century. We can consequently reasonably hypothesize that the French lineage kept expanding at an unaffected rate after the WNA split for approximately 300 years i.e. from year 1600 until 1900 corresponding to two-thirds of the French lineage length. 1300-1500 is consequently a likely estimate for the dating of the outbreak which gave birth to the A.Br.011/009 polytomy. In France, years 1350-1450 corresponds to the "One hundred years war" with England (1337-1453). It was characterized by major epidemics (including the Black Death plague due to Yersinia pestis), civil war events, and major economic changes. At that time, South-West France was occupied by England. Uncontrolled armed groups were travelling across large regions of France. The associated events would have favored the geographic spreading and subsequent fixation across most of France. This period is thus a good candidate for the dating of the root of the A.Br.011/ 009 polytomy observed in France.
Cluster B and B.Br.CNEVA phylogenetic analysis More than half of the B. anthracis isolates from France clustered within the B2 B.Br.CNEVA lineage. The MLVA sub-division of cluster B in sublineages B1 and B2 [6] including respectively B.Br.Kruger and B.Br.CNEVA [11] is clearly visible. The length from the MRCA of cluster B strains to the tips is highly variable. It varies from 77 up to 160 within the B2 cluster representatives, as compared to the 426 SNPs to South African "Kruger B". This corresponds to a ratio of more than 5, higher than the previous estimate of 3 [10] and this increase is due to the inclusion of the Alps lineage. Starting from the MRCA of the French Cluster B strains, the Pyrenees lineage which includes the A0465 strain shows the longest branch (range 71-92 SNPs) whereas the Alps lineage is the shortest (range 21-26 SNPs, ratio is approximately 1/3). The inclusion of the lineage leading to BF1 defines an older MRCA for lineage B2 (Fig 1). BF1 originates from Bavaria in Germany, which contains part of the Alps [25]. This tree topology suggests that the spreading of B. anthracis to the Massif-Central and the Pyrenees represents secondary events and spill-outs from the Alps. The faster rate of evolution observed in the Pyrenees is in agreement with the previous suggestion that B. anthracis evolutionary rate accelerates significantly as new territories with naïve populations are encountered [10,11]. Apart from the polytomies with very short branches -1 or 2 SNPs-associated with the typical outbreak observed in the Alps strains, no significant polytomy is observed in the whole B-group.

Mutations in vaccine strains
Collection strains extend lineage 2, lineage 3 and lineage 4 in the A.Br.011/009 polytomy by 26, 52 and 63 SNPs respectively. Lineage 3 is of special interest as it contains two well-known vaccine strains, the Pasteur II strain and one of its derivatives, Carbosap widely used in Italy (these vaccine strains are pXO1+, pXO2+). In agreement with these historical records, the Carbosap branch is the longest presumably as a result of more extensive cultivation. Three chromosomal deletions of 29, 24 and 3.5 kb relative to virulent strains have been described in Carbosap [23]. Interestingly, the 3.5 kb deletion, at position 1235796-1239220, including locus tags GBAA_1286 to GBAA_1290 in Ames ancestor NC_007530.2 and 24 kb deletion at position 1640915-1665378, GBAA_1751-GBAA_1781 including the virulence associated gene GBAA_1760 [31] are already present in the Pasteur II strain whereas the third region (position 267162-294257) is intact in Pasteur II. Pasteur II, Carbosap and Tsiankovskii I are vaccine strains possessing both virulence plasmids, so deletions in these strains may be associated with loss of virulence factors.

Conclusions
In this report, we position the genetic diversity found within the B. anthracis population in France in the broader context of the worldwide diversity of this species. France is one among a few countries where both the A and B branches are represented [11]. The A.Br.011/009 lineage is the most wide-spread in France and constitutes a remarkable polytomy with six branches. The present report shows that the WNA lineage predominent in North America belongs to this polytomy. The large length of the WNA branch prompted previous investigators to propose a pre-Columbian origin for this lineage. The analysis of the A.Br.011/009 polytomy demonstrates instead that this length is due to a remarkable acceleration of the evolution of the WNA branch after the split from the progenitor lineage. In the same time-span, the average pace of expansion of the progenitor lineage was slower by a factor of ten. The speed of expansion of the Senegal-Gambia lineage was even faster. Van Ert et al [11] previously proposed that the evolution rate of B. anthracis would be faster after introduction in a previously B. anthracis free environment. This would fit with the present observations, and, conversely, suggest that lineage expansion rates may provide indications on the origin of a lineage. In this view, short lineages (less successful, lower number of generations per year, rare) would tend to be closer to their birth place. Because of the recent introduction of A.Br.011 SNP typing, the distribution of the A.Br.011/009 lineage in the world is poorly known. In order to test the hypotheses proposed here, it will be necessary to more precisely investigate the presence and genome sequence of A.Br.011/009 in neighboring European countries and elsewhere.
Supporting Information S1