Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014-2017)

Background Over the past recent years, Vibrio cholerae has been associated with outbreaks in sub-Saharan Africa, notably in Democratic Republic of the Congo (DRC). This study aimed to determine the genetic relatedness of isolates responsible for cholera outbreaks in eastern DRC between 2014 and 2017, and their potential spread to bordering countries. Methods/Principal findings Phenotypic analysis and whole genome sequencing (WGS) were carried out on 78 clinical isolates of V. cholerae associated with cholera in eastern provinces of DRC between 2014 and 2017. SNP-based phylogenomic data show that most isolates (73/78) were V. cholerae O1 biotype El Tor with CTX-3 type prophage. They fell within the third transmission wave of the current seventh pandemic El Tor (7PET) lineage and were contained in the introduction event (T)10 in East Africa. These isolates clustered in two sub-clades corresponding to Multiple Locus Sequence Types (MLST) profiles ST69 and the newly assigned ST515, the latter displaying a higher genetic diversity. Both sub-clades showed a distinct geographic clustering, with ST69 isolates mostly restricted to Lake Tanganyika basin and phylogenetically related to V. cholerae isolates associated with cholera outbreaks in western Tanzania, whereas ST515 isolates were disseminated along the Albertine Rift and closely related to isolates in South Sudan, Uganda, Tanzania and Zambia. Other V. cholerae isolates (5/78) were non-O1/non-O139 without any CTX prophage and no phylogenetic relationship with already characterized non-O1/non-O139 isolates. Conclusions/Significance Current data confirm the association of both DRC O1 7PET (T)10 sub-clades ST69 and ST515 with recurrent outbreaks in eastern DRC and at regional level over the past 10 years. Interestingly, while ST69 is predominantly a locally endemic sequence type, ST515 became adaptable enough to expand across DRC neighboring countries.


Methods/Principal findings
Phenotypic analysis and whole genome sequencing (WGS) were carried out on 78 clinical isolates of V. cholerae associated with cholera in eastern provinces of DRC between 2014 and 2017. SNP-based phylogenomic data show that most isolates (73/78) were V. cholerae O1 biotype El Tor with CTX-3 type prophage. They fell within the third transmission wave of the current seventh pandemic El Tor (7PET) lineage and were contained in the introduction event (T)10 in East Africa. These isolates clustered in two sub-clades corresponding to Multiple Locus Sequence Types (MLST) profiles ST69 and the newly assigned ST515, the latter displaying a higher genetic diversity. Both sub-clades showed a distinct geographic clustering, with ST69 isolates mostly restricted to Lake Tanganyika basin and phylogenetically related to V. cholerae isolates associated with cholera outbreaks in western Tanzania, whereas ST515 isolates were disseminated along the Albertine Rift and closely related to isolates in South Sudan, Uganda, Tanzania and Zambia. Other V. cholerae isolates (5/78) were non-O1/non-O139 without any CTX prophage and no phylogenetic relationship with already characterized non-O1/non-O139 isolates.

Conclusions/Significance
Current data confirm the association of both DRC O1 7PET (T)10 sub-clades ST69 and ST515 with recurrent outbreaks in eastern DRC and at regional level over the past 10 years.

Introduction
Cholera is a life-threatening diarrheal disease caused by a Gram-negative comma-shaped bacterium called V. cholerae [1,2]. Serogrouping based on the reactivity of antibodies with outer membrane lipopolysaccharide O-antigen has allowed defining more than 200 V. cholerae, among which only two (O1 and O139) are so far associated with epidemic or pandemic cholera [3]. Africa, a previously cholera-free continent [4], now bears the highest burden of the disease. Sub Saharan countries in particular have been the most affected and notably DRC, which now ranks in the world as one of countries most frequently reported to be affected by serious outbreaks [4][5][6]. Cholera has indeed become part of the DRC clinical landscape, with most cases reported in hot spots in the eastern provinces along the Albertine Rift [7,8]. In the hot spot healthcare zone of Goma (North-Kivu province), the cumulative incidence of cholera in 2017 was estimated as 1015 cases per 100,000 inhabitants [9]. However, these figures must be interpreted with caution as current estimation is largely affected by a lack of accurate and recently updated population records at national level. Moreover, this limitation is further amplified, in eastern DRC, by recurrent conflicts and political instability, which have triggered large and successive population displacements. The year 2017 has even experienced a dramatic expansion of the disease to new provinces in the center and west of the country [9], and of particular concern is the decreased susceptibility of V. cholerae to antimicrobial drugs in DRC [10]. During the current seventh cholera pandemic El Tor (7PET), at least three independent but temporally overlapping waves of global transmission have been identified by phylogenetic analyses in Africa [11][12][13][14], at least 13 re-introduction events (T1-13) have caused epidemics, each genetic lineage probably representing an independent introduction event to that location [4,15]. Recent phylogenetic analysis of isolates associated with cholera outbreaks in DRC between 2006 and 2014 showed that all of them belonged to the 7PET, wave 3, T10 east African sub-lineage [4].
Understanding the dynamics of V. cholerae associated with recent cholera outbreaks in DRC is paramount in order to get insight into the mechanisms associated with the endemicity of the disease in the country, the epidemicity at local and regional level and the trace-back of infection sources. This study provides genomic information of V. cholerae isolates associated with cholera outbreaks, which occurred in eastern DRC between 2014 and 2017.

Ethical considerations
Given the low level of literacy of the patients, rectal swabs were sampled with their oral informed consent. For children, the informed consent was obtained from their parent or guardian. This verbal consent was recorded, prior to sampling, by local first-line responders. Healthcare workers and physicians signed the following statement: "We have explained the study to the patient in the areas under investigation and are satisfied that he/she understands and consents to sampling". Ethical approval to conduct the study was obtained from the Provincial Ministers of healthcare of North and South Kivu provinces (DRC192/CAB/MP-SA-SAFPP/NK/2018). The use of oral consent was approved by the Institutional Review Board of Université catholique de Louvain/ Saint-Luc academic Hospital.

Study design
The study sample consisted of 78 non-repetitive V. cholerae isolates which come from a collection of 97 isolates shipped to Belgium for whole genome sequencing. Upon arrival in Belgium, 19/97 isolates could not be resuscitated. The 97 isolates of the collection were cultured at the AMI-LABO (Goma, North-Kivu) and at the Centre de Diagnostic et de Recherche en Maladies (Bukavu, South-Kivu). They were recovered from rectal swabs specimens from patients (n = 321) admitted in cholera treatment centers (CTC) of the provinces of Maniema, North-Kivu and South-Kivu, and meeting the clinical case definition of cholera, i.e. an acute watery diarrhea with or without vomiting in a patient with more than one year of age. Cases belonged to a cohort of approximatively 52.400 suspected cholera patients registered in these provinces between January 2014 and December 2017. Personal identifiers were removed so that analyses of stored isolates were not traceable to individual patients. Each sample was labeled using a code referring to the date and location of sample collection.

Phenotype of V. cholerae clinical isolates
Samples were incubated in saline and alkaline peptone water broth during 6 hours and subsequently streaked onto thiosulfate-citrate-bile salts (TCBS) agar at 37˚C for 16-24 hours. Large and flattened yellow colonies with opaque centers and translucent peripheries were sub-cultured on Luria-Bertani agar and subsequently characterized by phenotypic tests, i.e. microscopic examination, oxidase assay, and Kligler's iron agar for fermentation of carbon hydrates. Isolates were further characterized by additional phenotype testing including Voges Proskauer assay (VP), hemolysis of sheep erythrocytes (HSE), chicken red cells agglutination (CCA) and susceptibility to polymyxin B (PXB). Enterobacter aerogenes (ATCC13048) and V. cholerae O395 were used as positive and negative control for the VP assay respectively. Serotyping was carried out using the Polyvalent O1, Ogawa and Inaba antisera (Becton Dickinson, Erembodegem, Belgium) following the manufacturer's recommendations.

Antimicrobial susceptibility testing
The susceptibility to antimicrobial agents (i.e., ampicillin, doxycycline, erythromycin, nalidixic acid, chloramphenicol, ciprofloxacin, sulfamethoxazole-trimethoprim and tetracycline) was performed by the disk diffusion method. Susceptibility tests were interpreted using European Committee on Antimicrobial Susceptibility Testing (EUCAST) guidelines. Escherichia coli ATCC 35218 was used as a control for bacterial growth and susceptibility to antibiotic disks.

Next-generation sequencing
Isolates were shipped to Belgium for whole genome sequencing and subsequent genomic analysis. V. cholerae isolates were cultured overnight in 10 ml Luria-Bertani broth. DNA was isolated using the phenol chloroform protocol [16]. DNA was quantified using the Nanodrop and the Qubit fluorometric quantitation (Thermo Fisher Scientific, Asse, Belgium) and normalized to 0.2 ng/μl. Genomic DNA was simultaneously fragmented and tagged with sequencing adapters in a single step using Nextera transposome (Nextera XT DNA Library Preparation Kit, Illumina, San Diego, CA, USA). DNA was then amplified with a 12-cycle PCR, cleaned up with AMPure beads, and subsequently loaded on a MiSeq paired-end 2 x 150 (reagent kit V2 (300 cycles) or 2 x 300 bp (MiSeq reagent kit V3 (600 cycles) sequence run.

Genomic analysis: Genetic relatedness, toxin phage, drug resistance and virulence
Raw genomic data from each V. cholerae isolates were submitted to the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), and are available under accession number (ERP114722). In order to assess the genetic relatedness of DRC isolates with those from other African countries (e.g. Cameroon, Central African Republic, Kenya, Tanzania, Uganda, Zambia), Asia and South America, a large set of genomes, including the O1 El Tor N16961 and the pre-7 th pandemic O1 M66 isolates was downloaded from the European Nucleotide Archive (ENA), Genbank and Ensembl databases. Paired-end reads from each V. cholerae isolate were assembled de novo to construct a draft genome using the SPADES v.3.11.1 software [17]. The quality of de novo assemblies was assessed using the Quast software (version 4.5) [18]. Each draft genome was analyzed to identify the V. cholerae species-specific ompW [19], the O1 rfbV and O139 wbfZ serogroup-specific [20] as well as classical and El Tor biotype-specific (ctxB, rstR and tcpA) genes [21]. In addition, genomes were screened for the presence of the 7PETspecific gene VC2346 [22]. A SNP-based phylogenomic analysis was conducted using kSNP 3.0 for SNP identification and parsimony tree construction based on the core genome. A first tree included all DRC O1 7PET isolates and representative of 7PET isolates from all regions of the world. The Dendroscope v.3.5.9 was used to root the tree with the N16961 strain [23] as an outgroup. The next tree included non-O1/non-O139 isolates form DRC (n = 5) and from other countries (n = 11), as well as O1 representatives from 6 th pandemic (n = 2), Gulf Coast (n = 4), pre-7 th pandemic (n = 4) and 7PET isolates (n = 2), along with the outgroup Vibrio metoecus (isolate 07 2435) used to root the tree. The MLST analysis was performed on each isolate by using the MLST scheme developed by Octavia et al [24]. The nucleotide sequences of a new allele of the metE gene, and new allelic combinations creating a novel sequence type (ST) were sent to the MLST database curator for allele and ST assignment. The CTX prophage harbored by O1 DRC isolates was compared to representatives of known CTX prophages [25][26].
Raw data from each V. cholerae DRC isolate were aligned to the complete genome of the O1 El Tor reference N16961. Each file was screened for the presence of large deletions. The Freebayes v1.0.2 software [27] was used to call variants from the reference genome. The complete list of mutations was filtered using vcffilter in order to select high quality (QUAL > 20) variants associated with a minimum depth of 20, and then annotated using the SNPeff v.4.3 software [28]. Only mutations with a high or moderate impact (i.e. frameshift deletion, nonsense point mutation, missense, and inframe deletion) were selected.
Each draft genome was then screened for the presence of virulence genes from the Virulence Factors Database (VFDB, http://www.mgc.ac.cn/VFs/), selecting those which were experimentally tested, and for the presence of pathogenic islands (PAI) previously associated with various sub-lineages within the 7 th pandemic, namely virulence factors including Vibrio pathogenicity islands (VPI-1, VPI-2, VSP-I, VSP-II, a novel variant of VSP-II (the VSP-II WASA (West African-South America) and WASA-I, as well as other virulence genes [7,29,30]. A gene was deemed present if it matched the reference sequence, i.e. minimal identity match of 95% with a minimal coverage of 80% of the gene sequence, as previously described [31]. Each draft genome was also screened for the presence of antimicrobial resistance (AMR) genes. The complete list of screened genes was drawn up from the MEGA-Res database (https://megares.meglab.org). In order to selectively identify AMR genes acquired through horizontal gene transfer, the list based on MEGARes data was restricted to genes that were also found in the ResFinder database (https://cge.cbs.dtu.dk/services/ ResFinder/), using BLASTn. In addition SNP-based AMR determinants were identified using ARIBA v.2.12.0 [32] with a home-made database including the parC, gyrA, gyrB, parE and qnr genes. A map of DRC was created using the Raster package [33], implemented in R statistical software version 3.6.1. The size of spots is somewhat correlated with the number of isolates from patients at the location.

Phenotypic results
Antimicrobial susceptibility patterns of V. cholerae isolates (n = 78) are shown in Table 1. Irrespective of their biotype, all V. cholerae isolates displayed resistance to co-trimoxazole and nalidixic acid, whilst retaining susceptibility to tetracycline and chloramphenicol. Nine V. cholerae O1 isolates displayed decreased susceptibility to ciprofloxacin, whereas 9 O1 and 2 non-O1 isolates were resistant to ampicillin.
Both sub-clades showed a distinct geographic pattern with ST69 sub-clade being found in the Lake Tanganyika basin (South-Kivu) and in Maniema provinces, and clustering together with 7PET V. cholerae isolates collected in Western Tanzania in 2015. While ST69 and ST515      were both identified in the Tanganyika basin, ST515 was the only sub-clade found in the Lakes Kivu and Edward basins and expanding northward (Fig 2), hence covering a large area including three lake basins (Tanganyika, Kivu and Edward) and five bordering countries (DRC, Central African Republic, South-Sudan, Tanzania, Uganda and Zambia).
Other major discriminating genetic changes between the two sub-clades included (i) a variation of the 9-nt repeat AATCCAGAT corresponding to a DNP amino acid motif in the VC_1650 (chromosome I) of V. cholerae O1 isolates, with 6 versus 7 repeats for ST69 and ST515 sub-clades; respectively, (ii) the insertion of the AAACGTACA motif corresponding to KRT amino acids in the VC_A0372 (chromosome II), and (iii) the 721G!T transversion in the VC_1798 (chromosome I) leading to the apparition of a premature stop codon in the gene.

PLOS NEGLECTED TROPICAL DISEASES
non-O139 isolates, which are the first to be reported in DRC, did not carry any prophage associated with V. cholerae. They were assigned to two novel sequence types, i.e. ST 612 and ST 613, by the curator of the V. cholerae MLST database (https://pubmlst.org/vcholerae/). Whereas ST612 isolates (n = 4) were closely related between them and to some extent to isolates characterized in Mozambique [34] and Haiti [35], the ST613 DRC isolate could not be related to any characterized V. cholerae isolate.
Regarding the identification of antimicrobial resistance genes, all DRC O1 isolates (n = 73) harbored the integrase gene of the SXT element (Int SXT ) and the SXT/R391 integrative conjugative element (ICE) ICEVchBan5 [40,41]. It is of note that ICEVchBan5 was lacking in non-O1/non-O139 isolates. In addition, all DRC O1 isolates harbored the APH3-DPRIME, APH6, drfA, dhfr, floR and SulI antimicrobial resistance genes. They also harbored the 248 G!A SNP in the quinolone-resistance determining region (QRDR) of the gyrA gene (VC_1258), resulting in the S83I substitution in the gyrA protein. All ST69 plus two DRC O1 ST515 (CTMA-1453 and CTMA-1454) isolates displayed the 254 G!A SNP in parC gene (VC_2430), resulting in the S85L substitution in that gene. It should be noted that both substitutions were reliably detected in genomic regions associated with a high sequencing depth ranging from 14 to 499 (average of 75) for S83I, and from 27 to 256 (average of 101) for S85L.No additional SNPs were found in the QRDRs of gyrA, gyrB, parC, and parE genes, nor were genetic determinants of beta-lactam resistance identified in these isolates. Among non-O1/non-O139 isolates, only ST612 harbored qnrVC and SulII genes. Conversely, ST613 was the only V. cholerae isolate to harbor the beta-lactamase carB gene.

Discussion
In line with the phenotypic features [42], the WGS-based analysis of 78 V. cholerae isolates from eastern DR Congo (2014-2017) confirmed that all O1 (n = 73) were 7PET variants (3 rd wave and T10 transmission event) genetically linked to an eastern African clade [5]. O1 isolates clustered closely in 2 distinct sub-clades consisting of ST69 and the newly assigned ST515. It is worth noting that the complete and/or draft genomes of ST515 were already available in public database but not assigned as ST515. The relatedness of O1 isolates within sublineage T10 was supported by SNP-based phylogeny and common genetic features, among which, the presence of CTX-3 prophage, the VSP-II with the characteristic deletion previously reported in several East African V. cholerae isolates [12], several AMR genes, and a lack of WASA-1 in line with previous characterization of V. cholerae isolates from DRC collected during the period 2006-2014 [4,43]. In all O1 eastern DRC isolates, a consistent low susceptibility to nalidixic acid without resistance to ciprofloxacin was correlated with the presence of the S83I substitution in gyrA. Moreover, a S85L substitution in parC was found in all ST69 isolates and two ST515. Interestingly, a recent study on V. cholerae isolates associated with cholera outbreaks in Yemen linked the presence of gyrA (S83I) and parC (S85L) substitutions with a decreased susceptibility to ciprofloxacin [44]. However, current V. cholerae isolates from eastern DRC differed from those from Yemen as only 6 out of 39 ST69 eastern DRC isolates carrying both gyrA (S83I) and ParC (S85L) substitutions actually showed a reduced susceptibility to ciprofloxacin, and this observation was in agreement with previous data [45].
Unlike other African countries where further introduction events (i.e. T11, T12 and T13) within the 3 rd wave of the 7PET have been reported [15], it is noteworthy that isolates from eastern DRC all belonged only to the T10 introduction event. These results suggest that these T10 isolates have firmly established themselves in the Congolese Albertine rift, becoming an autonomous source of endemic, sporadic and epidemic cholera in the eastern DRC sub-region.
Several genetic features differentiated V. cholerae O1 ST69 and ST515 sub-clades from eastern DRC (Table 2), highlighting the continuous local evolution and adaptation of O1 isolates and supposedly determining their particular geographical distribution pattern. This adaptive potential might indeed be triggered by changing environmental conditions, e.g. altitude, temperature, humidity and anthropogenic impacts, which, in turn, could potentially affect the interaction between the bacterium and its host. For instance, a variation in the number of ATAATCCAG motif repeat can affect V. cholerae growth depending on the range of incubation temperature [46]. Likewise, the serotype switch from Ogawa to Inaba in all ST515 isolates probably results from the webT gene inactivation consecutive to the 5-nt frameshift deletion as suggested earlier [47,48]. This serotype switch could affect patient's immune response to cholera in regions where O1 serotype Ogawa was predominant [47].
The V. cholerae O1 global phylogeny including data from Uganda [16], Tanzania [49], DRC, Central African Republic, and Zambia [4] confirms that the ST515 sub-clade has now spread to several regions of Central and Eastern Africa, including western provinces of DRC up to the Atlantic coast. Whereas the reason why only the ST515 expands so widely, and not the ST69 sub-clade, remains unknown, the hypothesis is that the higher genetic diversity among ST515 isolates results from a high mutation frequency, which could favor their adaptation to changing environmental conditions. Conversely, or synergistically, such increased genetic variation could also result from a rapid regional expansion of ST515 strains, a phenomenon known as a founder flush [50]. However, further work is needed to identify the respective contribution of lateral gene transfer and SNPs in this high genetic diversity. As recently suggested [51], extraction of Variable Number of Tandem Repeats (VNTRs) and Single Nucleotide Variants (SNVs) from WGS data would certainly help clarify the genetic relatedness within this sub-clade. These investigations, which are beyond the scope of this work, are currently ongoing.
As also reported in other countries [34][35]52,53], two V. cholerae non-O1/non-O139 lineages were identified and characterized from cholera-like diarrhea cases in eastern DRC, and were assigned to two novel sequence types, ST612 and ST613. Recent cholera outbreaks affecting the Kasai provinces highlight the urgent need to better understand the factors favoring the endemicity and epidemicity of cholera among the exposed populations. As illustrated with ST69 and ST515 in this study, phylogenetic changes may be associated with local adaptation to eastern DR Congo, clonal expansion of V. cholerae sub-lineages and consecutive spread in neighboring DRC provinces and bordering countries. However, it is noteworthy that, despite the fact that past and current records from healthcare structures keep highlighting the persistence of cholera in the eastern provinces, there are too few reliable and updated data confirming cholera cases in patients with watery diarrhea syndrome. Consequently, genetic data based on in-depth characterization of isolated V. cholerae strains are also too scarce, which significantly hampers our understanding of the local biological mechanisms underlying the association of cholera endemicity and cross-border epidemic outbreaks. While current genetic data fill part of this major gap, they now need to be strengthened by complementary data, especially those from new follow-up studies carried out through regional cross-border cooperation.