Comparative Genomic Characterization of a Thailand–Myanmar Isolate, MS6, of Vibrio cholerae O1 El Tor, Which Is Phylogenetically Related to a “US Gulf Coast” Clone

Background The cholera outbreaks in Thailand during 2007–2010 were exclusively caused by the Vibrio cholerae O1 El Tor variant carrying the cholera toxin gene of the classical biotype. We previously isolated a V. cholerae O1 El Tor strain from a patient with diarrhea and designated it MS6. Multilocus sequence-typing analysis revealed that MS6 is most closely related to the U. S. Gulf Coast clone with the exception of two novel housekeeping genes. Methodology/Principal Findings The nucleotide sequence of the genome of MS6 was determined and compared with those of 26 V. cholerae strains isolated from clinical and environmental sources worldwide. We show here that the MS6 isolate is distantly related to the ongoing seventh pandemic V. cholerae O1 El Tor strains. These strains differ with respect to polymorphisms in housekeeping genes, seventh pandemic group-specific markers, CTX phages, two genes encoding predicted transmembrane proteins, the presence of metY (MS6_A0927) or hchA/luxR in a highly conserved region of the V. cholerae O1 serogroup, and a superintegron (SI). We found that V. cholerae species carry either hchA/luxR or metY and that the V. cholerae O1 clade commonly possesses hchA/luxR, except for MS6 and U. S. Gulf Coast strains. These findings illuminate the evolutionary relationships among V. cholerae O1 strains. Moreover, the MS6 SI carries a quinolone-resistance gene cassette, which was closely related with those present in plasmid-borne integrons of other gram-negative bacteria. Conclusions/Significance Phylogenetic analysis reveals that MS6 is most closely related to a U. S. Gulf Coast clone, indicating their divergence before that of the El Tor biotype strains from a common V. cholerae O1 ancestor. We propose that MS6 serves as an environmental aquatic reservoir of V. cholerae O1.


Introduction
Vibrio cholerae, which is present in aquatic environments worldwide, is a facultatively anaerobic, asporogenous, motile, curved, or straight gram-negative rod. There are more than 200 serogroups of V. cholerae, but only serogroups O1 and O139 cause epidemics and pandemics of cholera in human populations [1], and cholera toxin causes the major clinical signs of the disease. The O1 serogroup is classified into classical or El Tor biotypes. The sixth cholera pandemic  was caused by the classical biotype, and the ongoing seventh cholera pandemic is caused by El Tor. Several other outbreaks of cholera occurred between the sixth and seventh pandemics, and some El Tor strains were isolated and are designated pre-seventh pandemic El Tor.
During the past two decades, atypical V. cholerae O1 El Tor was isolated more frequently and was spread widely [2][3][4][5][6]. These isolates produce a cholera toxin that is distinct from that expressed by El Tor.
We isolated a strain from a clinical specimen that we designated MS6 that expresses the typical El Tor cholera toxin (genotype 3) [7]. Characterization of MS6 using ribotyping, pulsed-field gel electrophoresis, multiple-locus variable-number tandem-repeat analysis, and multilocus sequence typing analyses revealed that the strain is not closely related to other strains isolated in Thailand or other countries. The sequences of MS6 housekeeping genes reveal that it is most closely related to V. cholerae O1 strains isolated in the U. S. Gulf Coast area. The U. S. Gulf Coast clone [8,9] is genetically distinct from several pathogenic clones of V. cholerae O1 [10], which caused only sporadic disease or small outbreaks, with no transmission spread along the Gulf Coast [11]. Nevertheless, two of 15 housekeeping genes of MS6 (malP and pepN), are minimally related to those of U. S. Gulf Coast strains and represent novel sequences according to nucleotide sequence comparisons using BLAST.
Here, we report the characterization of entire genome of V. cholerae O1 El Tor strain MS6. The results of these analyses enhance our understanding of the evolution and genetic basis of the pathogenicity of V. cholerae.

Ethics Statement
The patient's consent was not required by the hospital, because the isolation of V. cholerae was performed as part of clinical management during hospitalization. To protect the privacy of the patient and the patient's family, all identifying information was excluded from this study.

Strains, Growth Conditions, and DNA Isolation
V. cholerae O1 El Tor serotype Ogawa strain MS6 was isolated from a Myanmanese inpatient suffering from diarrhea who was treated at a hospital located in a Thai-Myanmar border city [12]. MS6 was grown in Tryptic Soy Broth (Difco, Detroit, MI) at 37uC for 18 h with shaking. Cells were collected by centrifugation, and genomic DNA was extracted using proteinase K and phenol/ chloroform, treated with RNase, and purified.

Genome Sequencing, Assembly, and Annotation
The genome of MS6 was sequenced using the Roche GS FLX Titanium system (8-kb-span paired-end library). Newbler (version 2.6; 454 Life Sciences/Roche, Branford, CT) was used to generate and assemble 395,285 reads into two scaffolds (2.95 Mb and 1.11 Mb) comprising 66 contigs and 53 stand-alone contigs $ 500 bp with an average read depth of 24.5. The gaps between contigs were closed using the unassembled mate-paired reads, PCR sequencing, or both of amplicons generated using primers flanking the gaps. Further, Illumina sequence data (14.5 Gbases, 100-bp paired-end reads) was used to improve low quality regions using GenomeTraveler (In Silico Biology, Inc., Yokohama, Japan). The whole genome sequence of MS6 was deposited in the DDBJ (AP014524/AP014525).

Phylogenetic Analyses
Coding sequences (CDSs) present as a single copy in 27 genomes were analyzed using the pan-genomes analysis pipeline (PGAP) 1.02 [16] with the default parameters. CDSs of the same length (including gaps) were aligned after using a MAFFT with L-INS-I strategy [17]. Further, we chose CDSs with a low probability of recombination based on the PHI-test (cutoff value: p-value $0.05) in SplitsTree4 [18]. Subsets of predicted amino acid sequences of each strain were concatenated, and maximum likelihood analyses were conducted with 100 bootstrap replicates by using the Randomized Axelerated Maximum Likelihood (RAxML) program [19]. The results were visualized using Dendroscope 3 [20].
Design of Primers and Conditions for Detection of the V. cholerae Molecular Markers metY and hchA/luxR PCR primers were designed to amplify metY or hchA/luxR by using the sequences of the 27 V. cholerae genomes. The PCR reactions employed two forward primers (metY-F, 59-GCGTGAAACCGGAGATGATCC-39 and luxR-F, 59-TAGCT-CACCGCGAGCTCGTTG-39) and one reverse primer (lys-R, 59-AGCGCAGAAGGTGTTACGCCA-39). The theoretical amplicon lengths for metY and hchA/luxR are 353 bp and 521 bp, respectively. All amplification mixtures consisted of template DNA, 0.2 mM of each primer, 200 mM of each deoxynucleoside triphosphate, and 0.025 U/ml of Ex Taq polymerase in the buffer supplied with the enzyme. After an initial denaturation step of 94uC for 5 min, the reaction was conducted using 30 cycles each of 94uC for 30 s, 59uC for 30 s, and 72uC for 30 s. PCR products were analyzed by electrophoresing the products on 1.5% agarose gels, and the amplicons were detected using ethidium bromide.

Results and Discussion
Comparison of the Genomes of V. cholerae O1 El Tor MS6 and Reference Strains The V. cholerae MS6 genome consists of circular chromosomes 1 and 2, comprising 2,936,971 bp and 1,093,973 bp with average G+C contents of 47.7% and 46.8%, respectively. RAST annotation analysis of the MS6 genome predicted 3,746 predicted open reading frames (ORFs). The nucleotide sequences of the genes encoding the components of the polysaccharide (wav cluster) and O antigen (wbe gene cluster) biosynthetic pathways [21] were highly similar to those of other O1 El Tor strains, indicating that other organisms were not likely their source.
We next compared the genome sequences of MS6 with those of the prototype seventh pandemic El Tor, N16961 [22], seventh pandemic atypical El Tor, 2010EL-1786 [23], and the pre-seventh pandemic El Tor, M66-2 [24] strains deposited in the EMBL/ GenBank/DDBJ databases. We identified 3,420 core ORFs based on comprehensive orthologous gene detection using reciprocal comparison. ORFs of chromosome 1 were shared more frequently with those of the four test strains compared with chromosome 2. The sequence of the superintegron (SI), which functions as a gene capture system, varied considerably among the strains (Fig. 1). The gene order of the common ORFs of each chromosome (i.e., synteny) was well conserved, except for a 184-kb inversion near the replication origin of chromosome 1 in strains MS6 and 2010EL-1786 compared with the other two strains. The sequences of the chromosomes 1 and 2 ORFs were compared with the genomes of three V. cholerae O1 reference strains. The shared ORFs were classified according to the percentage identity of the DNA sequences ( Fig. 1, outer rings 3-5). Blocks of ORFs (green represents 95% and 99% identities) were recognized in the chromosomes of each strain. Further, some of the ORFs were highly conserved only in the pre-seventh pandemic strain M66-2 or seventh pandemic strains N16961 and 2010EL-1786. Therefore, the MS6 genome exhibits a mosaic structure, which was likely generated by homologous recombination with other V. cholerae chromosomes. Sixteen ORFs and 44 ORFs on the large and small chromosomes, respectively, of MS6 were not detected in the genomes of the three reference strains. Notably, 51 of these 60 ORFs are encoded by mobile genetic elements or the SI.
The MS6 ORFs included in the 18 COG categories were compared with the complete genome sequences of the three V. cholerae O1 strains. The number of ORFs in these three strains that were identical to those of MS6 was examined in proportion to the total number of ORFs in each category (results not shown). The average percentage of amino acid and nucleotide sequence matches among chromosome 1 of all categories were 79% and 69%, respectively; however, similarities of ORFs on chromosome 2 were approximately 10% lower than those of chromosome 1. We compared the COG-categorized ORFs of MS6 with those of 26 strains of V. cholerae (Fig. S1). Only the ORFs of V. cholerae O1 were highly related to those of MS6.
The relationships among the 27 strains were further investigated using genome-wide phylogenetic analysis (Fig. 2). All V. cholerae O1 strains except two comprised a clade (PG clade), and 16 strains of the PG clade were further divided into two subclades [25]. The PG-1 subclade comprises most of the V. cholerae O1 El Tor strains and MO10 (O139), whereas the PG-2 subclade includes two classical strains and VC52 (O37). MS6 is most closely related to the U. S. Gulf Coast strain 2740-80, indicating that these organisms diverged before the phylogenetic separation of the El Tor biotype strains from a common V. cholerae O1 ancestor.

Significant Features of the MS6 Genome
Two tandem copies of CTX prophages are present at the dimer resolution site (dif) of MS6 chromosome 1 (Fig. S2). The toxinlinked cryptic element is present within the dif1 region of MS6, indicating that these elements likely integrated into the host chromosome through XerC/XerD-mediated recombination [26][27][28]. No CTX prophage was detected at MS6 dif2. The 6.9-kb CTXQ genome includes a DNA replication module designated repeat sequence (RS) 2, which comprises rstR, rstA, rstB and a core region comprising psh, cep, gIII (orfU), ace, zot, and ctxAB [29]. The sequences of these MS6 and O1 El Tor strain N16961 genes are identical.
However, the intergenic region-1 (ig-1) located near rstR and the toxboxes required for activating transcription from the cholera toxin promoter (PctxAB) [30,31] differed between these strains. Specifically, MS6 possesses three perfect direct repeats (TTTTGAT) within PctxAB as well as strains MAK757, MO10, V52, RC9, and CIRS101, and strain N16961 harbors four. The ig-1 region in MS6 is longer than that of N16961. Annotation using IMC-GE predicted that each ig-1 region of the CTX prophage encodes a protein (CTXUG-1) composed of 91 amino acid residues. Further, although MS6 lacks an RS1 region (consisting of rstR, rstA, rstB, and rstC), which is usually associated with CTXQ in V. cholerae O1 El Tor and O139 isolates [32], MS6 possesses a genomic island designated MS6CTXAGI that encodes a similar rstC sequence (two amino acid residue differences compared with N16961), four ORFs of unknown function, a putative transcriptional regulator, and a CTXUG-1 homologue.
Toxin coregulated pilus, which acts as a receptor for phage CTX, is essential for colonizing infant mice as well as humans in model systems and is encoded by sequences within a 45-kb pathogenicity island (VPI-1) [33]. The Vibrio pathogenicity island-2 (VPI-2; VC1758-VC1809; 57.3 kb) [34] that encodes neuraminidase and proteins involved in sialic acid metabolism is present in MS6. VPI-1 and VPI-2 regions in MS6 are highly related to those of the other V. cholerae O1 strains. However, the Vibrio seventh pandemic island-1 (VSP-1; VC0175-0185; 14 kb) [32,35,36] and Vibrio seventh pandemic island-2 (VSP-2; VC0490-VC0516; 27 kb) differ between MS6 and seventh pandemic strains. The MS6 genome lacks VSP-2, and although the entire VSP-1 region is present, its flanking genes VC0174 and VC0186 are distantly related to those of seventh pandemic strains (Fig. 3). The dendrograms based on their sequences indicate that they were closely related to those of 2740-80.
We identified a novel 4.7-kb mobile genetic element designated M1 in MS6, which is integrated into the spacer region between rpmF and maf on the large chromosome (Fig. S3). MS6-M1 comprises six ORFs (MS6_1784 to MS6_1789), including a putative integrase. The outer membrane protein of MS6 is encoded by ompW, which is split by the transposon (Fig. S4). In contrast, 11 bp of ompW is deleted in strain 2740-80. This gene is conserved among V. cholerae strains and is utilized for identification of V. cholerae strains [38]. Although the biological role of ompW is unknown, its function may be linked to the adaptive response to stress [39]. Among the 27 strains studied here, the integral membrane protein MS6_A0359 with four transmembrane-spanning helices (motif HPP) [40] is present in strains MS6, 2740-80, and R385. Moreover, the putative RNA-binding protein MS6_A0295 is present only in strains MS6 and 2740-80 in the PG clade, whereas an MS6_A0295 homologue is present in 10 of 11 strains in distinct phyletic lineages.

The SI of MS6 is a Massive Gene Capture System
The SI of V. cholerae O1 strain N16961 is a 127-kb integron island that resides on the small chromosome (VCA0291-VCA0508) [22]. Most genes predicted to reside within the SI encode hypothetical proteins, and the SI may serve as a source of genetic variation. Strains 2010EL-1786 and M66-2 harbor approximately 100-kb SI regions compared with the 144-kb SI of MS6. The RAST server automatically annotated 46 ORFs in category R of the SI that represent the death-on-curing family of toxin proteins, which contain the well-conserved central motif HxFx [ND][AG]NKR [41]. We detected 39, four, and two ORFs of this family, respectively, in N16961, M66-2, and 2010EL-1786, and we therefore suggest that toxin-antitoxin modules plays a role in maintaining the large SI in MS6. All predicted ORFs of the SI of MS6 were compared with those of 26 reference V. cholerae genomes (Fig. 4). ORFs identical or similar to those of MS6 were present in the genomes of strains 2740-80, BX330286, and NCTC8457. In contrast, there was little similarity between the MS6 ORFs and those of most of the non-O1/non-O139 strains.
The similar organization of ORFs within their SI domains suggests a close relationship between MS6 and strain 2740-80.

A qnr Cassette is Present in the SI of MS6
On the small chromsome of MS6, qnr is located approximately 28 kb from the SI integrase (IntI4). The qnr cassette was not detected in the chromosomes of other V. cholerae strains [42]. The MS6 qnr nucleotide sequence is 99% (650/657) identical to that of qnrVC4, which is a novel complex class 1 integron harboring the ISCR1 element in an aquatic isolate of Aeromonas punctata [43]. Moreover, the gene cassette in the class 1 integron harbored attC and a 214-bp noncoding sequence, which were a nearly perfect match to the gene cassette of MS6 (Fig. S5). This finding provides evidence that this gene cassette was mobilized from the SI into a plasmid-borne integron through class 1 integrase activity [44], Figure 5. Vibrio cholerae O1 genomes can be divided into two clusters that possess either hchA/luxR or metY in a conserved syntenic region of the small chromosome. Dendrograms were constructed based on hchA/luxR or metY using the method described in the legend to Fig.  3. Strains highlighted in blue belong to serogroup O1. However, four strains enclosed in the purple square may have undergone lateral gene exchange of O-antigen gene clusters; thus, strains V52 and MO10 were converted into O37 and O139 serogroups, respectively, while strains 12129(1) and TM11079-80 gained the O1-antigen gene cluster [25]. doi:10.1371/journal.pone.0098120.g005 leading to the transmission of the resistance integron to several gram-negative bacteria.
The 122 isolates of V. cholerae O1 from a variety of sources harbor hchA/luxR but not metY. In contrast, hchA/luxR is present in 47 of 64 non-O1/non-O139 V. cholerae isolates from clinical and environmental sources, and metY is present in the remaining 17 strains. Further, the sequences of hchA/luxR of 40 selected isolates of V. cholerae O1 El Tor are identical, including V. cholerae O1 El Tor strains N16961 and MAK757 (Fig. 5). These findings suggest that strain 2740-80 and MS6 reverted to metY from hchA/luxR in the distant past (Fig. S6, Fig. 6). Sequence comparisons of stains of the closely related species V. mimicus detected metY but not hchA/ luxR in the genomes of strains VM573, SX-4, VM603, MB-451, and VM223.

Conclusions
The analysis of the complete genome of MS6, which is distantly related to pathogenic O1 El Tor strains of V. cholerae, contributes insights into the evolution of the V. cholerae O1 serogroup as well as others. Our approaches demonstrates that chromosomes 1 and 2 of MS6 were frequently modified by horizontal gene transfer from other Vibrio species after divergence from a common ancestor of the PG-1 subclade and MS6. The genomic features of MS6 are most similar to those of U. S. Gulf Coast strain 2740-80.  nucleotide sequence identities among the integrase and the four ORFs are 73.2% and 89.5%, respectively. (EPS) Figure S4 A transposable element disrupts MS6 ompW. An outer membrane protein of strain N16961 is encoded by ompW (VCA0867, 654 bp). The homologous gene of MS6 is interrupted by an insertion of 1.2 kb encoding a transposase, and 11 bp were deleted from the corresponding allele of strain 2740-80. (EPS) Figure S5 The superintegron (SI) of MS6 harbors a quinolone-resistance gene cassette (qnrVC4) present in the class 1 integron of Aeromonas punctata 159. The qnrVC4 cassette of MS6 is located between V. cholerae repeats (VCRs are indicated in italics in the DNA sequence). VCR represents the attachment C site (attC) associated with the captured cassette in the SI of V. cholerae. VCR and a noncoding region corresponding to the gene cassette of the class 1 integron are linked to qnrVC4. The sequences of the latter two elements are 97% identical (988/1021).

Supporting Information
(EPS) Figure S6 Evidence for the substitution of hchA/luxR or metY in highly conserved regions. The nucleotide sequences of an approximately 22-kb region containing hchA/luxR or metY in MS6, 2740-80 (U. S. Gulf Coast), M66-2 (pre-seventh pandemic), and O395 (classical) strains were aligned using BioEdit version 7.1.3.0 [45]. The location of metY and hchA/luxR are highlighted in blue and red, respectively. Identical nucleotides are indicated by dots. The distribution of sequence differences (mismatches and gaps (2) is most frequent near hchA/luxR and metY, whereas the regions upstream and downstream of these genes are highly conserved. (PDF)