Genetic Analysis of Vibrio parahaemolyticus O3:K6 Strains That Have Been Isolated in Mexico Since 1998

Vibrio parahaemolyticus is an important human pathogen that has been isolated worldwide from clinical cases, most of which have been associated with seafood consumption. Environmental and clinical toxigenic strains of V. parahaemolyticus that were isolated in Mexico from 1998 to 2012, including those from the only outbreak that has been reported in this country, were characterized genetically to assess the presence of the O3:K6 pandemic clone, and their genetic relationship to strains that are related to the pandemic clonal complex (CC3). Pathogenic tdh+ and tdh+/trh+ strains were analyzed by pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST). Also, the entire genome of a Mexican O3:K6 strain was sequenced. Most of the strains were tdh/ORF8-positive and corresponded to the O3:K6 serotype. By PFGE and MLST, there was very close genetic relationship between ORF8/O3:K6 strains, and very high genetic diversities from non-pandemic strains. The genetic relationship is very close among O3:K6 strains that were isolated in Mexico and sequences that were available for strains in the CC3, based on the PubMLST database. The whole-genome sequence of CICESE-170 strain had high similarity with that of the reference RIMD 2210633 strain, and harbored 7 pathogenicity islands, including the 4 that denote O3:K6 pandemic strains. These results indicate that pandemic strains that have been isolated in Mexico show very close genetic relationship among them and with those isolated worldwide.


Introduction
Vibrio parahaemolyticus is a bacterium that inhabits warm marine environments and are able to causes gastroenteritis, wound infections and septicemia, that are associated with the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Bacterial isolates
Thirty-eighth toxigenic V. parahaemolyticus strains were selected. Nineteen clinical strains were isolated by the InDRE from stool samples in hospitals from various Mexican states and years. Nine strains were obtained from the Collection of Aquatic Important Microorganisms (www.ciad.mx/caim), including strain CAIM 729 (TX2103 T ), which was isolated during the 1998 Texas (USA) outbreak [23,24], and strain CAIM 728, isolated in Japan. Most CAIM strains were isolated in 2004 during the outbreak in Mexico [19]. Four strains were isolated from oysters (Crassostrea spp.) that were collected at the La Nueva Viga seafood market in Mexico City in 2011, and 6 strains were isolated from the biofouling of commercial ships hulls in 2012 [25]. Strains that were isolated from oysters were examined per the Bacteriological Analytical Manual [26]. Strains were identified as V. parahaemolyticus, using the species-specific molecular markers tlh [27] and pR72H [28], homolysins were detected with tdh and trh specific primers [27]. Only those strains that were positive for the tdh and/or trh hemolysin genes were included (Table 1).

DNA extraction and PCR amplification
Genomic DNA from overnight Luria-Bertani (LB) broth cultures of suspicious V. parahaemolyticus strains was purified with the Wizard TM Genomic Kit (Promega) per the manufacturer. Purified DNA was diluted in RNAse/DNAse-free water (~50 ng μl -1 ). The strains were tested by PCR for presence of the pR72H, tlh, tdh, trh and ORF8 genes. The PCR was performed using the conditions and primers as reported in the literature (

PFGE analysis
Thirty-eight V. parahaemolyticus and 2 V. cholerae strains (Table 1), were analyzed by PFGE with the standardized CDC protocol for V. cholerae [29], with minor modifications. Briefly, genomic DNA was digested with 20 U of Not-I (New England BioLabs), and restriction fragments were resolved on a CHEF Mapper TM PFGE system (BioRad). The run time was divided

MLST analysis
Thirty-two strains that were classified as tdh + and tdh + /trh + were selected from the PFGE clusters (Table 1) for analysis by MLST. Seven loci from both chromosomes ( Table 2) were chosen; 5 loci were amplified using primers in the PubMLST database for V. parahaemolyticus (http:// pubmlst.org/vparahaemolyticus/) by PCR per González-Escalona et al. [14]. Each locus, observed as a single band after electrophoresis, was purified with the AxyPrep-PCR Kit (Axygen TM ) and sequenced by SeqXcel Inc. (CA, USA) using the M13 universal primers (Forward-TGTAAAACGACGGCCAGT and Reverse-CAGGAAACAGCTATGACC, 5'-3'). For loci recA and pntA, new primers were designed (Table 2), and used to amplify fragments as follows: 10 min at 96˚C for initial denaturalization, followed by 35 cycles (1 min at 96˚C, 1 min at 59˚C, and 1 min at 72˚C), 10 min at 72˚C for the final extension, and maintenance at 4˚C. The amplicons were then sequenced using the designed primers for recA and pntA ( Table 2). The sequences for each locus, were queried against the V. parahaemolyticus PubMLST database, to determine the allelic profile (AP) and sequence type (ST) (S1 Table). Novel alleles were submitted to PubMLST; the nucleotide sequences data that were reported for the MLST are available in the GenBank database under accession numbers KP455743-KP455966.

Genomic analysis
Genomic DNA of an overnight LB culture of a single colony of the O3:K6 V. parahaemolyticus strain (CICESE-170), isolated in 1998, was extracted using the Wizard TM Genomic kit (Promega). The quantity and quality of the DNA were determined on a NanoDrop TM 2000 (Thermo Scientific, Wilmington, DE), and the DNA was diluted to~1000 ng/μl in DNAse/ RNAse-free water. Whole-genome sequencing was performed using Illumina Myseq™ genome analyzer (Illumina Inc., USA) per the manufacturer's instructions.
The reads that were obtained from the CICESE-170 strain were assembled into contigs using CAP3 [34] and Vague 1.0.5 [35]. The synteny of contigs was obtained with Mauve 2.3.1 [36], using both chromosomes of the V. parahaemolyticus strain RIMD 2210633 (GenBank accession number: BA000031 and BA000032) as the reference genome [37].
Comparative analyses were performed between the whole-genome sequencing results on CICESE-170 and RIMD 2210633 as the reference strain. Contigs that were obtained from CICESE-170 were submitted and inspected with Rapid Annotation using Subsystem Technology (RAST) server (http://rast.nmpdr.org), and comparisons were made, based on the functions and sequences of both strains. The analyses were evaluated in the Seed Viewer, focusing on virulence, disease, defense, phages, prophages, transposables elements, plasmids, iron acquisition and stress response. Contigs from CICESE-170 were also inspected by alignment using Geneious 4.8 [38] and BLAST analysis to compare them with the reference strain. Whole-genome sequences of CICESE-170 were submitted to GenBank under accession number MIEM01000000.

Species identification
In this study, the same V. parahaemolyticus strains were identified at the species level using the tlh or the pR72H primers; thus, both sets of primers could be used to determine species. Of the InDRE V. parahaemolyticus collection, 19 isolates were tdh + /trhand 18 were ORF8 + . The 6 environmental strains that were isolated from biofouling were tdh + /trh -, 1 of which was confirmed to be ORF8 + (CICESE-273). The 4 isolates from the oyster samples were tdh + /trh + . From the CAIM collection, 8 strains were tdh + /trh -, 1 was tdh + /trh + and 6 were ORF8 + as reported (Table 1).
Twenty-five ORF8 + strains were positive for the O3 and K6 antisera, including those that have been isolated by InDRE since 1998. The strains that were isolated during the 2004 outbreak in Mexico (CAIM collection) and 1 environmental strain from biofouling of a ship hull from 2012 (Table 1) were also ORF8 + and O3:K6 + .

PFGE analysis
Restriction fragments that were over 48.5 kb were examined, generating 2 branches (Fig 1A). Branch I contained all V. parahaemolyticus strains and branch II contained the out-group of V. cholerae strains. V. parahaemolyticus strains had high diversity, 31 patterns were obtained from 38 strains. At 50% of similarity, branch I split into 8 clusters (A to H), according to their genetic and serological characteristics (tdh + /trh + , tdh + or tdh + /ORF8 + /O3:K6). The lowest similarity among the 8 clusters was <24%.

MLST analysis
The concatenated sequences of 7 loci that were analyzed from 32 selected strains of V. parahaemolyticus based on PFGE analysis, were separated them into 3 clusters according to their genetic and serological characteristics (Fig 1B). Cluster I comprised all the pandemic strains (tdh + /ORF8 + /O3:K6), cluster II contained tdh + /trh + strains, and cluster III was composed of tdh + isolates. Cluster I included the strains that have been isolated by InDRE since 1998, those during the Mexican outbreak, a strain from the biofouling of a ship hull in provenance from Fukuyama, Japan (CICESE-273), and the pandemic strain (CAIM 729 T ), that was isolated during the Texas outbreak. With the exception of the CICESE-185 strain, all ORF8 + /O3:K6 strains in cluster I had 100% similarity.
The haplotype diversity (Hd) was 0.391 for most of the alleles; gyrB had an Hd of 0.44. The highest value for nucleotide diversity was observed for recA (π = 0.01054), whereas pyrC had the lowest value (π = 0.00378). The recA locus had the highest genetic variability (θ = 0.01345); dnaE had the lowest (θ = 0.00535). The number of polymorphic sites varied per locus, ranging from 11 for dnaE to 34 for recA (S2 Table).
The 5 novel STs that we obtained were submitted to the PubMLST database, assigned as  Table). The sequences that were obtained with the recA and pntA primers that we designed, had the same alleles (19 and 29) for all the O3:K6 strains, including the reference strain CAIM 729 (TX22103 T ), as previously reported with different primers [14].
The results on the 7 loci of concatenated sequences for 32 CICESE strains and 25 strains from pubMLST, are shown in Fig 2. Thirty-one strains from CICESE/CAIM had 100% similarity with the 9 strains from pubMLST that have been isolated worldwide, reported as O3:K6. Non-O3:K6 CICESE/CAIM and reference strains were grouped separately (Fig 2).

Discussion
Our results indicate that V. parahaemolyticus O3:K6 (tdh + /ORF8 + ) pandemic strains have been linked to clinical cases since 1998 in 6 Mexican states, particularly in the northwestern state of Sinaloa, most of which failed to develop into an outbreak [20]. Watery diarrhea, which is the most common symptom of V. parahaemolyticus infection, is self-limiting resolving in 3 days; thus, it is possible that most of O3:K6 infections were not reported or detected by the Mexican health system. Further, these reports might have been masked by the Mexican cholera epidemic from 1991 to 2000 [22].
In this study, by PFGE analysis, all strains with the characteristics of pandemic strains (tdh + /ORF8 + /O3:K6) were grouped into a single cluster (E), with 21 patterns (Fig 1A). Wong et al. [9] and Yeung et al. [39] also assembled the pandemic strains into a single cluster with fewer PFGE patterns (8 and 7, respectively) between their O3:K6 isolates. Even if PFGE analysis reveals relative low similarity between pandemic strains, they are clearly separated from non-O3:K6 strains. PFGE results contrast with the high similarity among O3:K6 strains that is observed with other molecular approaches, such as MLST [14] and toxRS sequencing [40].  PFGE has been used frequently to discriminate between strains from various regions [41], but no such differentiation was noted in our study. By MLST, using 7 housekeeping genes, Mexican V. parahaemolyticus isolates showed high genetic similarity between them. Most of the tdh + /ORF8 + /O3:K6 strains isolated in Mexico, contained ST-3 and were thus associated with reference strains of the pandemic clone CC3 (Fig 2; S1 Table), which have been isolated worldwide since 1996, as did the reference strain CAIM 729 (TX22103 T ) [14,33].
Notably, in 1998, seafood-associated clinical cases of V. parahaemolyticus O3:K6 were reported in Peru, Chile, and the US [15][16][17], the same year in which most InDRE strains were isolated. Thus, it appears that the O3:K6 pandemic clone encountered favorable conditions for its dispersion in these countries, including Mexico, likely due to the strong El Niño event that was registered in 1997-1998, as suggested [42].
Whole-genome sequencing of the CICESE-170 strain indicated a high genetic similarity with the reference sequence of RIMD 2210663 (>99.3%). Both strains shared mobile genetic elements, such as VPals (Table 3). RAST server analysis showed that CICESE-170 contained 4513 CDSs, 319 fewer than RIMD 2210663 (4832 CDSs). These differences might be associated with the adaptation to local environmental conditions, because similar disparities have been reported among O3:K6 strains from various countries [43].
All O3:K6 strains were positive for the ORF8 region by PCR, a region that is present in the phage f237 and was detected in CICESE-170, showing 100% similarity compared with RIMD 2210633 (Table 3). This phage has been used as a molecular marker due to its high specificity for the pandemic group [44] and according to Nasu et al. [45], it is involved in the epidemic potential of V. parahaemolyticus O3:K6.
The four VPals that characterize the post-1996 pandemic strains were also detected in CICESE-170, confirming that this strain, with its high similarity to other Mexican O3:K6 isolates by MLST, is associated with pandemic V. parahaemolyticus O3:K6, that has been isolated worldwide [43]. VPals 1, 4, 5 and 6, which have been identified as a distinctive characteristic of O3:K6 pandemic strains and their serovariants, might have been acquired by horizontal gene transfer and might provide the pandemic strains with the ability to infect humans or to adapt to several environments [46]. VPal-7, which harbors the main pathogenic elements (tdh and T3SS2) of V. parahaemolyticus, was also detected in CICESE-170, but it had the lowest genetic similarity (96.9%) to RIMD 2210633.

Conclusions
The V. parahaemolyticus O3:K6 strains that were isolated in Mexico, were grouped into a single cluster by PFGE and MLST (Fig 1A and 1B). They were associated with O3:K6 sequences in the PubMLST database and were related to CC3 (Fig 2). Bioinformatics analysis of the CICESE-170 genome, demonstrated high genetic similarity with the reference sequence of the RIMD 2210633 strain (Fig 3) and the characteristic elements of pandemic O3:K6 strains. These findings show that V. parahaemolyticus O3:K6 pandemic strains, have been detected in Mexico since 1998 and confirm their persistence in this country from 1998 to 2012, probably without undergoing significant genetic changes. Whole genome analysis of additional Mexican strains need to be performed to confirm this statement.