Application of High-Density DNA Resequencing Microarray for Detection and Characterization of Botulinum Neurotoxin-Producing Clostridia

Background Clostridium botulinum and related clostridia express extremely potent toxins known as botulinum neurotoxins (BoNTs) that cause severe, potentially lethal intoxications in humans. These BoNT-producing bacteria are categorized in seven major toxinotypes (A through G) and several subtypes. The high diversity in nucleotide sequence and genetic organization of the gene cluster encoding the BoNT components poses a great challenge for the screening and characterization of BoNT-producing strains. Methodology/Principal Findings In the present study, we designed and evaluated the performances of a resequencing microarray (RMA), the PathogenId v2.0, combined with an automated data approach for the simultaneous detection and characterization of BoNT-producing clostridia. The unique design of the PathogenID v2.0 array allows the simultaneous detection and characterization of 48 sequences targeting the BoNT gene cluster components. This approach allowed successful identification and typing of representative strains of the different toxinotypes and subtypes, as well as the neurotoxin-producing C. botulinum strain in a naturally contaminated food sample. Moreover, the method allowed fine characterization of the different neurotoxin gene cluster components of all studied strains, including genomic regions exhibiting up to 24.65% divergence with the sequences tiled on the arrays. Conclusions/Significance The severity of the disease demands rapid and accurate means for performing risk assessments of BoNT-producing clostridia and for tracing potentials sources of contamination in outbreak situations. The RMA approach constitutes an essential higher echelon component in a diagnostics and surveillance pipeline. In addition, it is an important asset to characterise potential outbreak related strains, but also environment isolates, in order to obtain a better picture of the molecular epidemiology of BoNT-producing clostridia.


Introduction
Botulism is a severe neuroparalytic disease caused by botulinum toxin, and characterized by acute descending flaccid paralysis. The disease is caused by consumption of food contaminated with preformed toxin (foodborne botulism), or by absorption of toxin produced in situ in wounds (wound botulism) or colonized intestinal tracts (infant/intestinal adult botulism) [1]. Botulinum neurotoxins (BoNT) are the most potent toxins known and are considered as one of the six highest risk threat agents of bioterrorism [2], [3].
BoNT are produced by six physiologically and genetically distinct bacteria, namely Clostridium botulinum Groups I to IV, and occasionally strains of C. butyricum and C. barati [4,5]. These neurotoxin-producing bacteria can be further categorized in seven major toxinotypes (A, B, C, D, E, F and G) based on the antigenic properties of the toxins they produce [6], [7]. Toxinotypes A, B, E and more rarely F are responsible for human botulism cases, while infections by C and D toxinotypes are observed mainly in animals.
Although most strains produce only one toxin, bivalent strains producing two different toxins (Ab, Af, Ba, and Bf) have also been reported [8][9][10][11]. In addition, C. botulinum strains producing a single toxin but carrying a silent gene for another (A(B)) [10], and strains producing a chimeric neurotoxin (C/D or D/C) have been described [12].
The genes encoding the neurotoxins (bont) can be found on the chromosome, a plasmid or a bacteriophage, and are located in a cluster with other genes encoding associated non-toxic proteins (ANTPs) forming the botulinum neurotoxin complexes [13]. The recent sequencing of the bont genes has revealed significant sequence variations within the toxinotypes and has led to the recognition of several subtypes, with differences ranging between 2-32% at the amino acid level [14]. In addition, the BoNT gene cluster varies in structure and organization [5], leading to the differentiation of two different conserved cluster types, termed the ''ha cluster'' and the ''orf-X cluster'' ( Figure 1). The ''ha cluster'' consists of a set of hemaglutinin (HA) genes while the ''orf-X cluster'' consists of a set of genes of unknown function called ''orf-Xs'' associated to the bont gene. The diversity in nucleotide sequence and genetic cluster organization can be explained by the occurrence of many recombination and insertion events [15], as well as bont gene transfer between some toxinotypes of C. botulinum, and with non-botulinum species [7].
This variability not only has implications for antibody neutralization strategies [14], but also poses a great challenge for screening and characterization of BoNT-producing strains. Sensitive laboratory detection techniques are necessary for establishing monitoring programs to track C. botulinum contamination in animals, food and environmental samples. In addition, characterization methods are essential for tracing potentials sources of contamination in outbreak situations or forensic investigations, and for performing molecular risk assessments of BoNT-producing clostridia. Moreover, characterization of genetic differences allows the development of improved diagnostics and therapeutic agents for the treatment of botulism [14]. In consequence, efforts have been centered on the development of molecular techniques allowing simultaneous detection and typing of C. botulinum toxinotypes, such as multiplex real time PCR and focused DNA microarray [16,17] [17][18][19]. These assays are highly sensitive but rely on hybridization of sequence-specific primers for sufficient DNA amplification and/or on a limited number of probes for detection of target genomic regions, and may therefore be suboptimal when bacterial sequence divergence is high. In addition, although some multiplex assays allow simultaneous detection and differentiation of toxinotype A subtypes and strains [16,20], they do not allow typing further than the toxin type or indepth genetic characterization of strains belonging to other known toxinotypes. Several genotyping methods such as pulse-field gel electrophoresis, randomly amplified polymorphic DNA, multilocus variable number tandem repeat analysis and comparative genomic hybridization microarray have been applied to further differentiate strains within a toxinotype. However, such methods are often technically demanding, difficult to standardize between laboratories and sometimes produce poorly interpretable results, as subtype and strain discrimination relies on differences in observed electrophoretic or hybridization patterns [18,19,[21][22][23][24]. In addition, these approaches often require high amounts of genomic DNA obtained from pure bacterial cultures. The use of high-yield random DNA amplification methods, combined to sequence-based technologies targeting multiple molecular markers, is indispensable to overcome these limitations and obtain a better overall resolution.
To overcome the problem of limiting amount of available DNA in biological samples, whole genome amplification (WGA) methods have been developed and successfully applied for a number of genotyping assays [25][26][27][28][29]. In addition, high-density resequencing microarrays (RMA) have emerged as a rapid detection and molecular characterization tool for a broad range of bacterial and viral agents [30], [31]. This technology has demonstrated reliable detection of sequences differing up to 10-15% from the prototype sequences on the array, enhancing the spectrum of detection [31], [32]. In order to determine a target nucleotide sequence, resequencing microarrays use closely overlapping (''tiled'') probe sets of 25 mers, which contain one perfectly matched and three mismatched probes per base for both strands of the target genes [33]. In the present study, we evaluated the PathogenID v2.0 resequencing microarray, containing probes for the detection and characterization of neurotoxin-producing Clostridium species. In the gene cluster encoding the botulinum neurotoxin complex, the ''ha cluster'' appears to be associated with type A1, A5, B, C, D and G bont genes, and the ''orf-X cluster'' with type A1, A2, A3, A4, F and E bont genes. 1 Although the type A1 bont gene is mostly found in a ha cluster, it has been found associated with an orf-X cluster in some single toxin strains and in all bivalent strains [18], [17]. 2 The botR gene is in front of the ha genes in all toxinotype C and D strains. 3 The Ha33 gene is absent in all toxinotype G strains. 4 The botR gene is absent in all toxinotype E strains doi:10.1371/journal.pone.0067510.g001

Design of PathogenID microarray for Clostridium detection and characterization
Detailed description of prototype sequences selected for Clostridium detection and characterization is shown in table 1. The array contains 4 tiled sequences targeting housekeeping genes of Clostridium species: the complete rrs gene of C. botulinum toxinotype A and partial rpoB genes of C. difficile, C. perfringens and C. botulinum toxinotype A. In addition, the RMA includes 48 tiled sequences ranging from 149 to 200 nt in length, targeting representative botulinum neurotoxin gene cluster components (bont, ntnh, botR, ha17, ha70, ha33, orf-X1, orf-X2, orf-X3 and p47) of toxinotypes A through G, as well as three toxinotype A subtypes (A1, A2 and A3). Finally, it contains 19 tiled regions of 200 nt in length for the detection and characterization of other specific toxins produced by C. tetani, C. difficile, C. perfringens, C. sordellii, C. septicum and C. oedematiens.

Resequencing of Clostridium with the PathogenID v2.0 microarray
The ability of the PathogenID v2.0 prototype sequences to detect and subtype neurotoxin-producing C. botulinum strains was assessed by performing a blind analysis of genomic DNA from eleven well-characterized strains representative of the main toxinotypes and subtypes. The observed call-rate for each tiled sequence is detailed in table 2. The call-rates marked in bold indicate sequences retained after filtering using the defined threshold. The average base call rate was 91.6% (range 52.5-100%) for strains for which genome sequence was identical to the tiled sequences. The analysis of more distant genomic sequences showed that these were successfully retrieved by RMA (Figure 2), and retained after filtering when they presented up to 24.7% divergence. The mean resequencing accuracy (correctly resequenced bases in comparison to reference sequencing results or database sequences) was 98.9% (range 87.1-100.0%). For all strains analysed by the RMA and the automated filtering approach, at least one sequence targeting a housekeeping gene and one sequence targeting a neurotoxin gene cluster component were successfully retrieved. For all bivalent strains analysed by RMA, two distinct bont gene prototype sequences could be retrieved: bont/B and/F sequences for strain 168.08 (toxinotype Bf2), and bont/A and/B sequences for strains NCTC 2916 [toxinotype A1(B)] and 1430-11 [toxinotype A5(B9)]. Additionally, sequences of two complete sets of ha and orf-X cluster components were retrieved for strains 168.08 and NCTC 2916, but as expected only sequences of ha cluster components were recovered for strain 1430-11. Finally, the RMA did not only allow the retrieval of sequences specifically targeting BoNT gene cluster components antp genes from the toxinotype of the strain analysed, but also sequences targeting antp genes from other toxinotypes belonging to the same taxonomic group. In particular, toxinotype B prototype sequences (e.g. botR/B, ha17/B) were successfully retrieved for strain Hall (toxinotype A), although these sequences show up to 11% divergence with the corresponding regions of the Hall strain genome. Similarly, prototype sequences targeting toxinotype A1 antp genes were retrieved for the BL6 strain (toxinotype B), and prototype sequences targeting toxinotypes A2 and A3 antp genes were retrieved for the NCIB10658 strain (toxinotype F). This was also observed for strains belonging to the C. botulinum taxonomic group III strains, as both toxinotype C and toxinotype D antp sequences were retrieved for strains 468 and 1873.

Automated filtering and taxonomic identification of resequencing results
Automated filtering by TaxFinder using the defined parameters retained between 7 and 27 C. botulinum sequences for each strain ( Table 2). Of the retained sequences, 98.8% (168/170) allowed retrieval of at least one BLAST hit, and 99,4 % (167/168) of the retrieved BLAST hits lead to correct taxonomic identification at least to the toxinotype level. Moreover, blasting of all remaining unfiltered C. botulinum prototype sequences (n = 358) did not return any BLAST hit, indicating that no useful sequence was excluded. For all reference strains the systematic BLAST strategy of retained sequences obtained by the RMA allowed successful identification of the clostridial species and taxonomic group using housekeeping genes prototype sequences ( Table 3). The strategy also permitted in-depth characterization of all tested strains using the BoNT gene cluster prototype sequences (Table 3). Moreover, BLAST analysis allowed simultaneous typing of the neurotoxin up to the subtype level, as for each strain a unique best hit or multiple best bacterial hits were retrieved and corresponded to the correct subtype. Finally, the turnaround time of this automated approach was inferior to 2 hours, extensively decreasing the time needed for filtering of sequences, blasting of retained sequences and result analysis.

Identification, typing and characterization of a foodborne botulism outbreak strain
To demonstrate the capacity of the RMA approach to detect C. botulinum target sequences in the context of a foodborne outbreak, DNA extracts from naturally contaminated food samples were analysed. Food specimens were collected during the investigation of a C. botulinum family outbreak in Corsica [34]. RMA was performed on the DNA extract from a contaminated salad sample, as well as the enrichment culture of the salad sample. For DNA obtained directly from the salad specimen, a total of 286 bacterial prototype sequences were retained, including 3 sequences targeting C. botulinum genes (rrs, rpoB and ha33). However, none led to correct taxonomic identification of the C. botulinum infecting strain. For the DNA extract obtained from the enrichment culture of the contaminated salad, a total of 15 C. botulinum prototype sequences were retained after filtering, with call-rates ranging between 46.7 to 97.7%. In addition, sequences targeting partial C. perfringens-specific housekeeping genes (rpoB, gyrA, and parC) and toxin genes (cpa, cpb2 and pfoR) were retrieved (call-rates ranging between 70.1 and 98.3%). All filtered sequences led to the retrieval of at least one BLAST hit and corresponding taxonomic identity. The highest scoring BLAST results obtained for C.perfringens prototype sequences retrieved by RMA all designated C. perfringens strains as taxonomic hit (data not shown). The results of the highest scoring BLAST alignment for each retained C. botulinum sequence (total score, coverage and taxonomic identity) are presented in figure 3. Strains belonging to the toxin subtype A2 were recognized as best hit for most of the retrieved sequences of the neurotoxin gene cluster components. In addition, a gene cluster type ha-/orfX+ was identified. Phylogenetic analysis of the retrieved neurotoxin sequence with the corresponding sequences of five strains representative of the different toxin A subtypes (A1 to A5), demonstrated an overall identity of 100% with the neurotoxin sequence of the toxin subtype A2 reference strain. The only mismatches in the multiple alignment were due to unidentified bases ( Figure 4).

Discussion
Although botulism is quite rare in developed countries, recent outbreaks have demonstrated the potential severity of this hazard [35][36][37][38][39][40]. In such a context, rapid identification and characterization of BoNT-producing clostridia in home-made or manufactured food is a priority to stop the spread of botulism infections and trace back the source of contamination [41][42][43]. While identification of BoNT-encoding genes, using rapid nucleic-acid detection tools, is usually satisfactory for molecular risk assessment of BoNT-producing clostridia in food and environmental samples, it is not sufficient to discriminate strains within a toxinotype for source-tracing and molecular epidemiology [16]. For this reason, recent recommendations have stated that new assays should be able to detect variants for all toxinotypes, should be type-specific to determine proper treatment, and should be sensitive to perform risk assessment (NIAID Expert Panel on Botulism Diagnostics, Bethesda Maryland, May 2003). The development of such universal genetic characterization tools is hampered by the great variability of the genomic background, the sequence of the bont gene and the arrangement of the BoNT gene cluster of C. botulinum strains, and therefore only available for a limited number of strains.
In the present study, we designed and evaluated the performances of a resequencing microarray combined with an automated data approach for the simultaneous detection and characterization of BoNT-producing clostridia. The described RMA, designated PathogenID v2.0, and similar arrays have been proven successful for the characterization of numerous bacteria including Neisseria meningitidis, Streptococcus pyogenes, Streptococcus pneumoniae, Bacillus anthracis, and several species of the Enterobacteriaceae family [30,[44][45][46]. For the purpose of this study, prototype sequences of housekeeping genes rrs and rpoB from clostridia pathogenic for humans and most frequently associated with outbreaks were incorporated in the RMA for identification of clostridial species and lineages [47][48][49]. In addition, sequences from the BoNT gene cluster were included for their involvement in toxicity and usefulness for studying genetic variation [18,50,51].
The method was validated with DNA extracts from a representative panel of reference C. botulinum strains. The obtained results demonstrated that this approach has the potential to detect neurotoxin-producing C. botulinum strains, but also phenotypically related strains, by providing high quality sequence data. The RMA approach reliably detected and characterized the neurotoxin complex genetic determinants for all tested toxino-(sub)types, including bivalent strains. Further analysis of resequencing results showed that targeted sequences were successfully retrieved by the RMA when they presented up to 24.7% divergence with the tiled probes. This data supports the broadness of detection of the RMA, as the divergence of the tiled probes compared to the respective bont gene regions from all known toxin A, B, C, D, E and G subtypes, and most toxin F subtypes, was calculated to be inferior to 21%. Although the tiled bont/F sequence demonstrated respectively 70.65% and 55.72% with the corresponding gene region of the rare toxin subtypes F5 and F7, the high homology of the additional housekeeping gene and antp gene sequences targeted by the RMA should allow detection and correct identification of these strains. In addition to sequence divergence, differences in DNA extract quality, genomic copy number variation, location of targeted genomic regions and possible secondary structures, as well as the G and C content of the tiled probes and the presence of nonspecific nucleotide stretches could have affected genome amplification and hybridization efficacy, possibly causing reduced base-call rates. Nevertheless, this did not hamper the correct identification and characterization of the analyzed strains.
Subsequently, the pathogenID v2.0 was used to identify C. botulinum strains in contaminated food specimens. The analysis of the DNA extract obtained directly from these specimens by the RMA did not permit quality resequencing of the infecting C. botulinum strain sequences. These results are due to the presence of numerous cultivable and non-cultivable environmental bacteria naturally present in food, competing with the C. botulinum strain during the whole genome amplification step, and subsequently increasing the noise level observed at the time of the analysis. Nevertheless, a 48 h enrichment culture of the contaminated food samples was sufficient to successfully detect and characterize the infecting C. botulinum strain as well as a co-cultured C. perfringens strain. The results obtained by BLASTing and phylogenetic analysis of the retrieved bont sequence, as well as the characterization of the BoNT gene cluster components, classified the strain as a C. botulinum toxinotype A2. Moreover, the examination of the taxonomic BLAST hits indicated a close relatedness to strain Mascarpone. As the analysis was performed with a partial sequence of the bont gene (200 nt), containing 4% unidentified bases, misclassification of the toxin subtype may have occurred. However, this was unlikely as the retained sequence contained specific determinants found only in bont/A2 genes, and unidentified residues were located in regions that are not discriminant for subtype differentiation. The C. botulinum strain detected by RMA could be isolated from the food sample, and confirmed as subtype A2 by sequencing the specifically PCR-amplified bont gene (genbank accession number JQ954970).
The many advantages of using DNA microarrays for characterization of bacteria have been discussed in the past and can be applied to this assay. First, the genetic characterization results can The spectrum of detection of BoNT-producing clostridia by the PathogenID v2.0 was assessed using 9 well-characterized strains representative of all toxinotypes and several toxin A subtypes. Each sequence obtained by the RMA for which the percentage of nucleotide divergence compared to the corresponding tiled sequence is known (n = 118), is indicated by a blue diamond, and presented according to the percentage of nucleotide bases determined by the RMA (call rate). The linear association between these two parameters is shown, and demonstrates a good correlation (correlation coefficient R 2 of 0.79). doi:10.1371/journal.pone.0067510.g002 be used to support epidemiological associations in outbreaks involving a large number of samples or those from multiple geographical locations [17]. Secondly, they can be used to trace or confirm the source of an outbreak or to perform environmental risk assessments [52,53]. In the specific context of botulism, the results provided by DNA microarray can further serve to design specific primers for rapid nucleic acid detection methods, and allow a preliminary differentiation of the strains before applying more laborious genotyping methods such as PFGE [17]. In addition to these advantages, the specific design and properties of high-density RMAs allow the detection of a broad range of strains, including potentially emerging variants [44,54]. Firstly, the use of closely overlapping 25 nt probe sets in the RMA approach instead of unique probes to cover a target region of 100-500 nt increases the chance for hybridization of divergent target DNA to the array. Indeed, studies using focused or comparative genomic hybridization microarrays have reported a maximum value of 15-18% nucleotide sequence divergence of probe regions for positive microarray detection of C. botulinum genomic DNA, as compared to 24.7% for the RMA described here [17][18][19]24]. In addition, the relatedness between strains can be quantified more precisely by using a sequence-based approach such as the RMA, rather than by analysis of differences in hybridization patterns provided by classic microarrays. This is particularly important in the case of botulism, as the frequent occurrence of gene recombination, insertion and transfer observed between Clostridium strains indicate that such events are likely to occur again. In such a case, the sequence information provided by the RMA will be most valuable for the characterization of the emerging variants and the development of improved rapid detection tools. This also emphasizes the need for bacterial culture isolates as well as well-  characterized microbial strain collections, in order to monitor the molecular evolution of the strains. The unique design of the PathogenID v2.0 array allows the simultaneous detection and characterization of 48 sequences targeting the BoNT gene cluster components. These sequences could easily be used in the design of specific RMAs with reduced probe densities by decreasing the number of sequences, thereby diminishing production costs and computing time. Using standard methods to obtain the same level of resolution would imply the use of specific PCRs followed by Sanger sequencing. Although these assays are cheaper as standalone tests, running enough of them to cover the same number of genes with a single sample quickly The orientation and arrangement of these components were reproduced after Franciosa et al [59]. For each component, the highest call-rate observed and the results of the best retrieved BLAST hit(s) (defined as the BLAST hit demonstrating the highest total score) are given i.e. the number of best hits retrieved, as well as the corresponding query sequence coverage, total score, and taxonomy. doi:10.1371/journal.pone.0067510.g003 surpasses the RMA in cost, time and logistical complexity of running the analysis. Moreover these methods rely on sequencespecific primer hybridization for target DNA amplification instead of random hexamers for unbiased whole genome amplification, and may therefore fail to amplify and detect emerging variants. Other technologies with universal coverage could be used, such as high-throughput sequencing, however they are still much more laborious and expensive. Finally, the use of an automated approach allows an unbiased, exhaustive and rapid retrieval of taxonomic results.
In summary, the RMA allowed successful detection and fine characterization of neurotoxin-producing clostridia in both pure and polymicrobial cultures. The RMA approach, combined with the automated filtering and retrieval of taxonomic identities, allowed efficient and accurate subtyping of the neurotoxin and detailed characterization of the BoNT gene cluster components. This assay will be used as a higher echelon component in a diagnostics and surveillance pipeline. Importantly, it will be used in further studies to characterise potential outbreak related strains, but also environment isolates, in order to obtain a better picture of the molecular epidemiology of BoNT-producing clostridia.

Materials and Methods
Design of the Pathogen ID v2.0 resequencing microarray (RMA) The second generation of a broad-spectrum resequencing microarray (RMA) was used in this study: the PathogenID v2.0 array, able to detect 124 bacteria, 126 viruses and 673 genes involved in toxin production, pathogenicity or antibiotic resistance [31]. This array included specific housekeeping genes, namely rrs and rpoB, for detection and characterization of clostridial species. In addition, it included prototype sequences of each component of the neurotoxin gene cluster from all known toxinotypes, as well as several subtypes. Tiled probes targeting these genes were selected using multiple sequence alignments of genome sequences from various toxin subtypes available in GenBank during the time of the RMA design, in order to select conserved regions within toxinotypes for optimal hybridization efficacy. Additional probes were added to improve the detection of specific subtypes, if strains demonstrated sequence variation within the selected probe region. In addition, the bont/C and bont/D sequences tiled on the array were specifically designed to maximize the probability of detection of mosaic C. botulinum toxinotype C/D or D/C strains, by targeting regions highly homologous to either mosaic bont/C/D or bont/D/C genes. Finally, 19 additional sequences were tiled on the array for detection of a large panel of clostridial toxins other than neurotoxins.  G). The characteristics and origin of most strains have been described elsewhere. The C. botulinum type A5(B) strain 1430-11 was isolated from contaminated commercial food (pasta). The C. botulinum strain 161.08 was isolated from a foodborne botulism case. Pure cultures of Clostridium spp. strains were performed as described previously [55]. Genome sequences from reference strains were retrieved from the genbank database. The bont sequences for strain 1430-11 (genbank accession numbers KC683799 and KC683800) and strain 161.08 (genbank accession numbers KC471328 and KC471329) were obtained by sanger sequencing of specifically PCR-amplified bont genes and deposited in genbank.

Bacterial strains and biological samples analysed
Biological samples included a naturally contaminated food specimen (salad) and a 48 h enrichment culture of this food sample in Fortified Cooked Meat Medium (FCMM) at 37uC in anaerobic conditions [56]. Presence of C.botulinum toxinotype A in the food samples was confirmed by SYBR green real-time PCR with primers P1646 (59-TCTTACGCGAAATGGTTATGG-39) and P1647 (59-TGCCTGCACCTAAAAGAGGA-39) for bont/A gene, P1648 (59-CCTGGGCCAGTTTTAAATGA-39) and P1649 (59-GCGCCTTTGTTTTCTTGAAC-39) for bont/B gene, P1650  and P1651 (59-TAATGCTGCTTGCACAGGTT-39) for bont/E gene, P2107 (59-TGCACAATGAATTTTCAAAACA-39) and P2108 (59-TCCAAAAGCATCCATTACTGC-39) for bont/F gene. Real time PCR was performed in a total volume of 25 ml containing 12.5 ml of 26concentration of iQ SYBR green Supermix (Biorad), 5 pmole of each primer, 5 ml of template DNA and 7.3 ml of DNase-RNase free ditilled water (Gibco). Amplifications were carried out on a CFX96 Real Time System (Biorad) using 96-well microwell plates and according to the following temperature profile: one cycle of 95uC for 10 minutes, 40 cycles of 95uC for 15 sec and 60uC for 30 sec with fluorescence signal capture at the end of each 60uC step, an extension phase of 1 cycle at 95uC for 60 sec, 60uC for 60 sec and 95uC for 60 sec, followed by a default melt (disassociation) stage.

Extraction and amplification of bacterial DNA
Total genomic DNA was isolated from C. botulinum cultures by lysozyme and proteinase K treatment as described previously [57]. DNA extraction from food samples was performed with Powerfood Microbial DNA isolation kit (MoBio Laboratories Inc) according to the manufacturer's recommendations. After extraction, DNA was amplified using the whole genome amplification (WGA) protocol (RepliG Midi kit, Qiagen) as described previously [54].

Resequencing microarray assay
The amplification products obtained from genomic DNA by WGA were quantified by Quantit BR (Invitrogen) according to the manufacturer's instructions. A recommended amount of target DNA was fragmented and labeled according to the GeneChip Resequencing Assay manual (Affymetrix). The obtained products were coded with unrelated numbers by a non-observer to ensure that the study was performed blindly. The microarray hybridization process was carried out according to the protocol recommended by the manufacturer (Affymetrix). The details and parameter settings for the data analysis (essentially conversion of raw image files obtained from scanning of the microarrays into FASTA files containing the sequences of called bases for each tiled region of the microarray) have been described previously [54]. The base call rate refers to the percentage of base calls generated from the full-length tiled sequence.

Automated data analysis
A Perl-based program, designated ''TaxFinder'', was used for the automated analysis of re-sequencing data provided by PathogenID v2.0. The program read the FASTA file generated for each sample, which contains all the sequences read by the GSEQ software (Affymetrix) from the hybridization results. A filtering process, based on the one described by Malanoski et al. [58] was applied to these sequences. For each sequence, TaxFinder considers the Nb first bases of the sequence, Nb defined by the user in the beginning of the experiment and corresponding to the minimal length accepted for the sequence to keep. If the percentage of No-call in this fragment is inferior to an elongation threshold defined by the user, a base is added to the fragment and the percentage of No-call is recalculated. As soon as it is higher that the elongation threshold, the No-call at the ends of the fragment are removed. If the length of the nucleotide fragment is still higher than the minimal threshold established, then it is conserved. If the percentage is higher than the threshold, the fragment is skipped, and the program considers the fragment of size Nb and starting position one base upstream than the precedent fragment. This process is reiterated until the end of the sequence is reached. In this study, sequences that did not contain subsequences with a minimum nucleotide length of 20 nt and a maximum undetermined nucleotide content of 10% were discarded. Filtered sequences individually underwent a blastn analysis to search for sequence homologues in the NCBI nucleotide collection (nr/nt database) with adjusted settings to restrict the search to bacteria entries. Blast + application distributed by NCBI is used for the automated blast research (http://www.ncbi.nlm.nih.gov/books/NBK1763/). The following algorithm parameters were modified: the expected value cut-off was fixed to 100, the minimum word size was set to 7 and the upper limit of displayed descriptions of database sequences per query was decreased to 50. The best BLAST hits were classified according to their total score (the sum of the high-scoring segment pairs) and their corresponding taxonomies were retrieved from the NCBI Taxonomy database. When several hits obtained the highest total score, the script automatically retrieved the taxonomies of the 10 first BLAST hits.

Phylogenetic analysis
The resulting bont sequence obtained from the enrichment culture of the C. botulinum strain in a contaminated food specimen was compared with the corresponding bont sequences of reference strains from the 5 known C. botulinum toxinotype A subtypes (GenBank accessed 21/02/2013). Multiple sequence alignment was performed using the CLC Bio software and checked for accuracy by eye. A neighbor-joining tree of these sequences was constructed using the Jukes-Cantor method with the SeaView v4.2.1 software. The level of support for each node is provided by 100 bootstrap replications.