A genomic snapshot of Salmonella enterica serovar Typhi in Colombia

Little is known about the genetic diversity of Salmonella enterica serovar Typhi (S. Typhi) circulating in Latin America. It has been observed that typhoid fever is still endemic in this part of the world; however, a lack of standardized blood culture surveillance across Latin American makes estimating the true disease burden problematic. The Colombian National Health Service established a surveillance system for tracking bacterial pathogens, including S. Typhi, in 2006. Here, we characterized 77 representative Colombian S. Typhi isolates collected between 1997 and 2018 using pulse field gel electrophoresis (PFGE; the accepted genotyping method in Latin America) and whole genome sequencing (WGS). We found that the main S. Typhi clades circulating in Colombia were clades 2.5 and 3.5. Notably, the sequenced S. Typhi isolates from Colombia were closely related in a global phylogeny. Consequently, these data suggest that these are endemic clades circulating in Colombia. We found that AMR in S. Typhi in Colombia was uncommon, with a small subset of organisms exhibiting mutations associated with reduced susceptibility to fluoroquinolones. This is the first time that S. Typhi isolated from Colombia have been characterized by WGS, and after comparing these data with those generated using PFGE, we conclude that PFGE is unsuitable for tracking S. Typhi clones and mapping transmission. The genetic diversity of pathogens such as S. Typhi is limited in Latin America and should be targeted for future surveillance studies incorporating WGS.

Introduction Salmonella enterica serovar Typhi (S. Typhi) is the bacterial agent of typhoid fever. With between 9-13 million cases and 116,800 associated deaths annually, typhoid is still a public health problem in many low and middle-income countries (LMICs), particularly in South Asia and parts of sub-Saharan Africa [1,2]. Antimicrobial resistance (AMR) is a major issue, with multi-drug resistance (MDR; resistance to chloramphenicol, ampicillin, and trimethoprimsulfamethoxazole) and fluoroquinolone resistance in genotype 4.3.1 (H58) organisms dominating the global genetic landscape [3,4]. The emergence of extensively-drug resistant (XDR; MDR and resistant to fluoroquinolones and third generation cephalosporins) in Pakistan and more recent reports of resistance to azithromycin in South Asia compound the problem [5,6] Several international studies have aimed to fill data gaps regarding the global distribution of typhoid [7][8][9][10]. However, there have not been large multicenter population-based surveillance studies conducted in Latin America as there have been in sub-Saharan Africa and South Asia, nor is there routine blood culture surveillance, so this region represents a major data gap in global disease burden estimations [11][12][13]. The modelled incidence of typhoid in Latin America varies enormously, and estimates range from 1.0 (0.2-3.9) cases and 169 (32-642) cases per 100,000 person-years [8,14]. A lack of systematic surveillance also means that there are limited contemporary data on the circulating bacterial population, AMR profiles, and potential transmission dynamics within South America. However, a recent study revealed a large number of S. Typhi isolates with a high prevalence of decreased fluoroquinolone susceptibility in Colombia and El Salvador [15].
Pulsed Field Gel Electrophoresis (PFGE) is the conventional method for studying the genetic relationship between S. Typhi isolates in Latin America [16]. Using this method, we recently found that some S. Typhi  indicative of the circulation of common "continental" genotypes [17,18]. However, PFGE has limited discriminatory power to support subtyping and cannot identify genotype 4.3.1, other emerging genotypes, or AMR genes. Whole genome sequencing (WGS) is the gold standard for the investigating population structures, transmission dynamics, and molecular mechanisms of AMR in S. Typhi. In 2015, a landmark global S. Typhi genotyping scheme was published, but comprised only 20 genome sequences originating from Latin America (Argentina, El Salvador, French Guiana, and Peru) [3]. Since then, there have been only three additional publications describing S. Typhi isolated in Latin America and characterized using WGS, generating a further 36 genome sequences [4,19,20] In response to local concern regarding the increase of typhoid fever and the international threat of AMR, the National Surveillance System Public Health (SIVIGILA) of Colombia made typhoid fever a notifiable disease in 2006, requiring laboratory follow-up [21]. Here, we aimed to generate the first insights into the molecular epidemiology of typhoid in Colombia by performing AMR profiling and comparative genotyping using both PFGE and WGS on a crosssectional collection of S. Typhi isolated in Colombia between 1997 and 2018.

Ethics statement
This study was conducted in accordance with the principles expressed in the Declaration of Helsinki. The clinical bacterial isolates were collected through the Colombian Laboratory National Surveillance System under the scientific, technical and administrative standard for health research established in Colombian resolution 8430 of 1993 of the Ministry of Health. Patient data were analysed anonymously; consequently, formal ethical approval for the study was not necessary.

Salmonella Typhi isolates
A total of 1,478 S. Typhi isolates were submitted to the surveillance program at the National Health Institute of Colombia between 1997 and 2018. These organisms were all associated with a reported typhoid and paratyphoid fever event and came from 22 of 32 Colombian departments and the Capital District of Colombia [21]. 1,077 (72.9%) of these isolates were successfully genotyped using the standard routine PFGE pipeline (S1 Fig) and 77 (5.2%) S. Typhi isolates were selected cross-sectionally for WGS (Fig 1 and S1 Table). Our aim was to generate a broad overview of circulating genotypes in Colombia and to identify genotype H58. Therefore, we included isolates from all years, from sampled departments, and a broad range of PFGE patterns, including at least one isolate of each mayor PFGE pattern and including the various AMR phenotypes. These isolates were both from outbreaks defined by the health authorities (n = 12) and sporadic cases (n = 65); 61 isolates originated from blood, 10 from stool, and six from other sources (3 bone marrow, 1 splenic abscess, 1 gluteus abscess, and 1 from a skin swab).

Bacterial identification and antimicrobial susceptibility testing
All isolates were identified using standard biochemical tests (Triple Sugar Iron Agar (TSI), Citrate, Urea, motility), the automated MicroScan, VITEK II system and the Kauffmann-White-Le Minor scheme to identify organisms suspected to be S. Typhi (Difco, United States) [22] Antimicrobial susceptibility testing was performed using the Kirby-Bauer disk diffusion method against amoxicillin-clavulanic acid (AMC), chloramphenicol (CHL), nalidixic acid (NAL), tetracycline (TET), ampicillin (AMP), cefotaxime (CTX), ceftazidime (CAZ), trimethoprim-sulfamethoxazole (SXT), and Meropenem in combination with the MIC-based methods using the MicroScansystems according to manufactures recommendations. Ciprofloxacin (CIP) susceptibility was determined by agar dilution assays according to the CLSI standards of 2019 [23] Extended-Spectrum Beta-Lactamase (ESBLs) activity mediated by bla SHV , bla TEM , and bla CTX-M genes was confirmed by PCR amplification [24].

Molecular subtyping by PFGE
All organisms were subtyped by PFGE following standardized PulseNet protocols. [16]. Briefly, genomic DNA were digested with XbaI (Promega, USA) and subjected to gel electrophoresis. PFGE patterns from the different runs were normalized by aligning the reference digestion pattern of S. Braenderup H9812. Bands were assessed visually and by a computerized program (Gelcompare 4.0 software (Applied Maths, Belgium). Parameters of tolerance and optimization were set to 1.5% and similarities calculated according to Dice coefficient. The Clustering dendrogram was based on the unweighted pair-group method using arithmetic averages (UPGMA). The resulting XbaI patterns were compared with the local database and if indistinguishable (within this 1.5% tolerance) from an existing pattern the isolates was given the same PFGE code; if a unique pattern was deteremined a new PFGE code was assigned. All PFGE pattern codes were assigned following the PulseNet International guidelines for nomenclature, which includes 2 letters for the country or region, 3 letters for the serovar, 3 characters for the enzyme and 4 digits for the profile number (e.g. COINJPPX01.0001 for Colombia) [17] Genome sequencing and SNP analysis DNA was extracted using a Qiacube in combination with the Qiagen QIAamp DNA Mini Kit (Qiagen) at the Colombian National Health Institute (INS), following the manufacturer guidelines. DNA was quantified using a Qubit 2.0 fluorometer (Invitrogen) and 2μg of genomic DNA was subjected to indexed WGS by Illumina MiSeq platform to generate 100 bp paired end reads and 30x genome coverage. Genomic libraries were prepared with Nextera XT library prep Kit FC 121-1031. Raw Illumina reads were assembled using (Velvet v1.2) via an automated pipeline at the Wellcome Sanger Institute [25]. For preliminary analysis and global contextualization and for the detection of non-synonymous mutations in the Quinolone Resistance Determining Region (QRDR) of genes gyrA, gyrB, parC, and are, the assembled genomes were uploaded to PathogenWatch v3.2.2 (https://pathogen.watch/). Genotypes were assigned using GenoTyphi (https://github.com/katholt/genotyphi) Sequenced reads and publicly available sequences were mapped and SNP called against the reference genome S. Typhi CT18 using the Sanger institute pipelines and following quality metrics as previously described [26]. Known recombinant regions such as prophage [4], were manually excluded, and any remaining recombinant regions were filtered using Gubbins (v1.4.10) [27]. The resultant core SNP alignment of 40,998 bp was used to infer Maximum Likelihood (ML) phylogenies using RAxML (v8.2.8) [28], specifying a generalized time-reversible model and a Gamma distribution to model site-specific rate variation (GTR+ Γ substitution model; GTRGAMMA in RAxML) with 100 bootstrap pseudoreplicates used to assess branch support. SNP distances for the core genome alignment of all the novel genome sequences were calculated from this alignment using snp-dists package (https://github.com/ tseemann/snp-dists). SRST2 v0.2.0 [29] was used with the ARGannot [30] and PlasmidFinder [31] databases to detect the molecular determinants associated with AMR; standard cut-offs of >90% gene coverage and a minimum read dept of 5 were used. Maps drawn in inkscape v1.0.1 an open source scalable graphics editor.

PFGE genotyping and isolate selection
PFGE is performed routinely for S. Typhi in Latin America; results are consolidated into the PulseNet Latin America and Caribbean Network database [16,17]. Organisms are given a unique PFGE code according to their genomic digestion pattern; 1,478 Colombian isolates were present in the national surveillance database at the initiation of this project. We selected 77 S. Typhi isolated between 1997 and 2018 to represent the broadest possible diversity (by PFGE; S1 Fig) for WGS. This collection comprised 60 unique PFGE profiles (Fig 1 and S1  Table), including the most commonly circulating restriction patterns in Colombia (e.g., COINXX.JPPX01. 0008-0083-0115) [18]. Twelve isolates also originated from eight outbreaks confirmed by the health authorities (A-H; 8, 4, 24, 9, 5, 2, 6, and 8 patients per outbreak respectively) (Figs 1 and S1); more than one isolate were included from two of these outbreaks (D and G). The selection was skewed towards more recent years based on number of available isolates and for AMR isolates [32].

AMR and population structure of Colombian S. Typhi
The 77 Colombian S. Typhi isolates were subjected to WGS and a phylogenetic tree was constructed from core genome SNPs (Fig 2). We found that genotypic variation in the population of Colombian S. Typhi was generally limited, with the majority of isolates restricted to two groups: major cluster 2 and 3. These clades could be further segregated into clades 2.5 (51/77; 66.2%), 3.5 (20/77; 24.9%), and 2 (4/77; 5.2%). In addition, we identified two isolates in major cluster 1; these organisms belonged to genotypes 0.1.3 and 1.1.
Notably, unlike a recent observation from Chile, we did not identify genotype 4.3.1 (H58) isolates in this set of Colombian sequences, despite being specifically enriched for organisms that exhibited resistance to antimicrobials. However, we did identify 14 organisms in genotypes 1.1, 2.5 and 3.5 that contained a single SNP in the QRDR region (Fig 2 and Table 1), resulting in reduced susceptibility to fluoroquinolones. Overall, and unlike contemporaneous S. Typhi collections from Africa and Asia, this collection contained a limited accumulation of acquired AMR genes. We identified one isolate carrying the sul2 and tetA genes associated with resistance to tetracycline and sulphonamides, respectively. We additionally detected one organism from Bogota, isolated in 2012, which carried bla CTX-M-12 , bla TEM-1 , bla OXA-15, and Sul1, rendering it resistant to ampicillin, cephalosporins, and sulphamethoxazole (Fig 2 and Table 1).

Associations between PFGE and WGS
We next aimed to compare the PFGE patterns of the 77 Colombian S. Typhi with that of phylogenetic structure created by WGS. First, we found that the paired isolates from the outbreaks (D and G) were indistinguishable; these organisms had identical PFGEs patterns and displayed no SNP differences in the WGS data (Figs 1 and 3). However, more generally, the PFGE restriction patterns and position in the dendrogram showed minimal concordance with their corresponding phylogenetic location from the WGS data (Fig 3). For example, three isolates from an outbreak (G) shared an identical PFGE restriction pattern (COINXX.JPPX01.0235). This association was encouraging, but on further investigation, an additional three S. Typhi isolates exhibited this same restriction profile. These three further isolates had no apparent epidemiological association with the specific outbreak, were from different geographical locations across Colombia, and were isolated several years after the outbreak (Fig 3). These isolates were determined to be >40 SNPs away in the phylogenetic tree from the isolates causing the  outbreak. Lastly, we found a number of occasions where isolates within differing major WGS clades shared an identical PFGE digestion pattern. For example, isolates exhibiting the 0006, 0083, and 0250 PFGE patterns could be found in both clade 2.5 and clade 3.5 of the WGSbased phylogeny (Fig 3). As predicted, these data show that PGFE has limited discriminatory power to identify organisms that may or may not be closely genetically related, further supporting the transition to WGS for routine surveillance.

Colombian S. Typhi in a global context
To determine if the detected Colombian genotypes were more likely to be of Colombian origin or introduced from other continents, we placed these contemporaneous Colombian isolates into a global context with an international collection of S. Typhi genome sequences. We constructed a phylogenetic tree of 3,382 publicly available S. Typhi genome sequences with the 77 contemporaneous Colombian isolates; genotype 4.3.1 (H58) sequences were excluded as they were not identified in this collection (Fig 4). The Colombian organisms in clades 2.5 and 3.5 clustered alongside other Colombian organisms within their respective genotypes. The nearest neighbours to these organisms were isolated in India (10592_2_45, genotype 2.5) and Vietnam (10425_1_60, genotype 3.5) in 1997 and 1993, respectively. In the absence of further sampling, these data suggest that clade 2.5 and clade 3.5 are locally circulating genotypes in Colombia. Similarly, the presence of genotypes 1.1 and 0.1.3 in Colombia is indicative of limited circulation of overseas genotypes. Organisms belonging to genotypes 1.1 and 0.1.3 are considered ancient and presently uncommon on the international S. Typhi genotypic landscape and are historically associated with typhoid in Africa [3].

Discussion
Here, in this primary study of WGS data from S. Typhi   study identified three independent introductions of H58 into Chile [20]. In Chile, the spread of these isolates appears to have been contained; however, this observation highlights the need for sustained genomic surveillance to detect any additional introductions and potential increased circulation of genotype 4.3.1 [33].
A key observation is that the prevalence of AMR in Colombian S. Typhi appears to be significantly lower than that observed in South Asia or Africa. This study, despite being enriched for AMR isolates, indicates an exceptionally low background of AMR in S. Typhi, with only one isolate carrying a plasmid containing AMR genes (IncL/M; pOXA-48), with an additional cryptic no-AMR plasmid (IncFIB; pHCM2) also detected. The precise reason(s) for a lower prevalence of AMR in S. Typhi, in Colombia are unknown and requires additional investigation. We hypothesise that a lower prevalence of AMR S. Typhi, in comparison to Asia and Africa, may be related to antimicrobial access and global pathogen dynamics. Generally, antimicrobial is not better regulated in this region than other locations with a high density of LMICs in the past [34]. However, in the last decade many Latin American countries developed their own National Action Plans to combat AMR under the guidance of PAHO [35]. AMR in S. Typhi is not static, and the global trajectory of AMR is increasing; consequently, there is a constant threat of the importation of AMR organisms and sustained surveillance in Colombia remains crucial. These factors highlight the importance of global typhoid surveillance and not purely restricting observations to Africa and Southeast Asia.
We additionally aimed to assess the potential correlations and utility of PFGE for S. Typhi tracking across Latin America. We found that PFGE and SNP based phylogenetic do not correlate especially well. We found the same PFGE patterns in completely distinct primary clusters of the SNP based phylogeny. These observations again indicate that PFGE results in false clustering and is not appropriately sensitive for surveillance requiring high resolution delineation of local/regional population structure and dynamics of S. Typhi or for outbreak detection in Colombia. WGS is a more appropriate method and is therefore slowly being adopted as the gold standard for these purposes internationally. Lastly, we compared the Colombian isolates to publicly available non-H58 global isolates to determine whether Colombian organisms were imported. This global tree highlighted a lack of genomic information from Latin America. It was therefore impossible to determine whether observed cases are the result of introductions into Colombia from other Latin American countries or local endemic transmission. However, we found that even though the Colombian isolates were collected over 20-years, they formed their own clusters and were not closely related organisms from other locations. These observations suggest that the S. Typhi population structure in Colombia is likely driven by sustained endemic circulation of local genotypes.
This study has limitations, the main one being the small sample size of sequenced isolates. The need to select only a subset of samples meant we could have overlooked genotypes and the proportion of the detected genotypes may not be an accurate overview of the distribution. However, this study was aimed to assess S. Typhi genetic diversity in Colombia and we show that in spite of our diverse selection of organisms that 90% of the isolates belonged to two predominate clades. More thorough sequencing strategies are required to more accurately determine the distribution of genotypes.
This study provides an enhanced insight into the molecular epidemiology of S. Typhi in Colombia, constructing the pathogen population structure and identifying the predominant circulating genotypes. Our work demonstrates that routine surveillance with the integration of WGS is necessary not only to improve disease burden estimates, but also to track the national and regional transmission dynamics of S. Typhi and determine AMR profiles. These data will be pivotal to better estimate the burden of typhoid in the region, improve antimicrobial treatment practices and help policymakers to assess the need for typhoid conjugate vaccine introduction. While the population of S. Typhi in Colombia appears isolated, the emergence and spread of AMR variants have been observed internationally [5,6,33]. Consequently, it is critical for improved control and prevention measures that we establish routine WGS surveillance in Colombia and other Latin American countries to strengthen surveillance and monitoring the continental spread of S. Typhi.