First Indian report on genome-wide comparison of multidrug-resistant Escherichia coli from blood stream infections

Background Multidrug-resistant (MDR) E. coli with extended-spectrum β-lactamases (ESBLs) is becoming endemic in health care settings around the world. Baseline data on virulence and antimicrobial resistance (AMR) of specific lineages of E. coli circulating in developing countries like India is currently lacking. Methods Whole-genome sequencing was performed for 60 MDR E. coli isolates. The analysis was performed at single nucleotide polymorphism (SNP) level resolution to identify the presence of their virulence and AMR genes. Results Genome comparison revealed the presence of ST-131 global MDR and ST410 as emerging-MDR clades of E. coli in India. AMR gene profile for cephalosporin and carbapenem resistance differed between the clades. Genotypes blaCTX-M-15 and blaNDM-5 were common among cephalosporinases and carbapenemases, respectively. For aminoglycoside resistance, rmtB was positive for 31.7% of the isolates, of which 95% were co-harboring carbapenemases. In addition, the FimH types and virulence gene profile positively correlated with the SNP based phylogeny, and also revealed the evolution of MDR clones among the study population with temporal accumulation of SNPs. The predominant clone was ST167 (blaNDM lineage) followed by ST405 (global clone ST131 equivalent) and ST410 (fast spreading high risk clone). Conclusions This is the first report on the whole genome analysis of MDR E. coli lineages circulating in India. Data from this study will provide public health agencies with baseline information on AMR and virulent genes in pathogenic E. coli in the region.


Introduction
Escherichia coli is the leading cause of bloodstream infections (BSIs) [1] and other common infections including urinary tract infections (UTIs). As an important commensal component of the biosphere, E. coli colonizes the lower gut of animals and humans and gets released in the environment.
Virulence of E. coli is driven by multiple factors including adhesins, toxins, siderophores, lipopolysaccharide (LPS), capsule, and invasins [2]. It has recently been reported that a large proportion of multi-drug resistant (MDR) E. coli carried by people is food acquired, especially from farm animals [3]. Although most of the MDR E. coli are reported to be community acquired, recently MDR E. coli, which produce extended-spectrum β-lactamases (ESBLs) have been found to be endemic in health care settings [4,5].
Among MDR E. coli, AMR caused by ESBL is mainly due to the bla CTX-M family, particularly bla CTX-M-15 and -14 , compared to the less frequently observed bla SHV and bla OXA families [6][7][8]. As per the literature, carbapenem resistance in E. coli is mostly mediated by bla OXA-48 [9], bla NDM and bla VIM genes [10]. Also, increasingly, resistance is being reported for fluoroquinolones and third-and fourth-generation cephalosporins and ST-131 predominates globally among such MDR E. coli strains [11].
This current study was aimed at identifying the predominant virulent and AMR genes in MDR E. coli circulating in India. Core genome phylogeny was constructed using high quality SNP profiles to analyse the genome wide factors associated with these genes in E. coli isolates analyzed or sequenced.

Isolates and identification
A total of 99257 specimens were received at the Department of Clinical Microbiology, Christian Medical College, Vellore, India for routine screening from BSI during the year 2006 to 2016. Isolation and identification of the organism were carried out using a standard protocol as reported earlier [12]. Of the 1100 samples found culture positive for E. coli, 10% were resistant to carbapenems, of which 60 MDR isolates were selected for further characterization.

Next generation sequencing and genome assembly
Genomic DNA was extracted using a QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany). Whole genome sequencing (WGS) was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) sequencer (Life Technologies, Carlsbad, CA) with 400-bp read chemistry according to the manufacturer's instructions. Data were assembled with reference E. coli strain (NC000913) using Assembler SPAdes v.5.0.0.0 embedded in Torrent Suite Server v.5.0.3.

Genome based MLST analysis
Sequence types (STs) were analysed using multi-locus sequence typing (MLST) 1.8 tool (https://cge.cbs.dtu.dk//services/MLST/). To visualize the possible evolutionary relationships between isolates, STs of the study isolates and the globally reported strains were computed using PHYLOViZ software v2.0 based on goeBURST algorithm. The study used Warwick database for all sequence based MLST analysis of E. coli.

Genome comparison analyses
Gview, interactive genome viewer was used to compare the annotated E. coli genome arrangements with the reference E. coli K12 genome (NC_000913) [13]. Core genome analysis was performed using Roary: the Pan Genome Pipeline v3.11.2 from Sanger Institute [14]. The phylogenetic tree was constructed using the core SNPs using FastTree v2.1.10. To evaluate the effect of recombination regions on the E. coli genomes, SNIPPY was performed to retrieve core SNPs, that was followed by Genealogies Unbiased By recomBinations In Nucleotide Sequences (Gubbins) algorithm [15]. The tree was constructed with midpoint rooting. Further, the tree file was visualised and analysed in iTOL v4 (https://itol.embl.de/). A dendrogram representing core vs pan genes was constructed using hierarchical cluster analysis with hclust method in R.
This Whole Genome Shotgun project has been deposited at GenBank under the accession numbers as mentioned in S1 Table. The version described in this manuscript is version 1.

Ethical clearance
The study was approved by the Institutional Review Board and Ethical committee, Christian Medical College, Vellore, India (IRB No.: 9540 dt 22-07-2015). All the samples were fully anonymized before processing and since our study only utilised isolates received from routine blood cultures, we did not require informed written consent from the patients.

Antimicrobial susceptibility
All 60 E. coli isolates were resistant to carbapenems, quinolones, cephalosporins and beta-lactamase inhibitors (S1 Table). Whereas all the isolates were susceptible to colistin except B7532 and B9021, which exhibited an MIC of 32 μg/ml.

Whole genome sequence analysis
Phylogeny of MDR E. coli. MLSTFinder revealed the different sequence types of the isolates. The study isolates belonged to 6 clonal complexes with 14 different sequence types. Few of the sequence types were observed to share same founder types revealing the evolution of these strains. CC10 and CC 405 were the two major CCs observed with ST-167, ST-410 and ST-405 as the common STs. Interestingly, nine isolates belonging to CC/ST-131 were identified, of which, all were of H-30 clade, except the isolate BA9313 (H-24).
E. coli genome comparison. Whole genome composition of 60 MDR E. coli was compared with the E. coli K-12 reference genome which shows the region of differences between these genomes (S1 Fig). A total of 2,518,792 SNPs were identified in all the analyzed genomes. On minimum, 5957 and maximum, 74713 SNPs were identified in the study MDR E. coli genomes when compared to the reference genome.

Core vs pan genome
Comparison between the core and pan genomes of 60 MDR E. coli isolates revealed 2258 core genes across all 60 isolates among the 17944 total gene clusters. This includes 600 soft core genes in 57 to 59 isolates, 3984 shell genes in 9 to 57 isolates and 11102 genes in less than 9 isolates (Fig 1).
Comparison of virulence and clonal traits. The virulence gene profiles of the 60 isolates were compared to the FimH virulence types, MLST sequence type and SNP phylogeny. The isolates clustered in two distinct groups including ST-131(H-30 clade), based on the virulence genes identified (Fig 2). The sequence types were found to be tightly linked to the groups of virulence gene profile and FimH types.
Antimicrobial resistance genetic determinants. ResFinder revealed the presence of multiple AMR genes in each of the MDR E. coli (Fig 3). Aminoglycoside and beta lactam resistance genes were the most dominant. The most common aminoglycoside resistance genes were aadA5 and aac(6')lb-cr, followed by aadA2 and rmtB, while bla CTX-M-15 followed by bla NDM-5 , bla OXA-1 and bla TEM-1B were most prevalent among beta lactamases. Most of the isolates also harboured mphA, catB4, sul1, tetB, dfrA17 and dfrA12. Interestingly, two isolates, B7532 and B9021 carried mcr-1.1, which is responsible for plasmid-mediated colistin resistance. The two isolates also showed phenotypic resistance to colistin with high MIC (>32 μg/ml). In addition, the phenotypic resistance for other antimicrobials exhibited significant correlation (>80%) with the presence of respective AMR genes.
On genotypic characterization of MDR E. coli isolates, the increasing frequency of antimicrobial resistance in clinical E. coli isolates was found to be associated with bla CTX-M , bla NDM , and mcr genes. In our study, multiple AMR genes for beta lactams, carbapenems, fluoroquinolones, tetracycline, aminoglycosides and colistin were identified. The presence of genotypic AMR genes correlated well with phenotypic expression for beta lactams, carbapenems, fluoroquinolones and tetracycline. Plasmids IncFII majorly carried AMR genes bla CTX-M-15 , bla NDM-5 , aadA2, rmtB, sul1, drfA12, erm(B) and tetA, while IncFI plasmids carried mostly aadA5, sul2, dfrA17, mph(A) and tetB genes. Results from plasmid analysis of the study isolates were previously published elsewhere [17].
The MDR E. coli isolates phylogenetically grouped into four major clades: ST167, ST410, ST405 and ST131. Variant bla NDM-5 , responsible for carbapenem resistance was common in comparison to other bla NDM variants. The bla NDM positive isolates belonging to ST410 and ST405 harboured only the bla NDM-5 variant, whereas ST167 and ST131 isolates had bla NDM-4 and bla NDM-1 respectively, in addition to bla NDM-5 . Interestingly only two isolates out of 60 MDR E. coli had bla NDM-1 , while it is still common among other species of MDR clinical pathogens in India [18]. From Hong Kong, a previous report identified bla NDM-1 as common among E. coli though the sample size was lesser [19], whereas in central China, bla NDM-1 and bla NDM-5 occurrence in E. coli has been reported in equal numbers [20].
Globally, bla OXA-48 type were the most commonly reported carbapenemases among E. coli [21], followed by bla NDM [22], bla IMP [23] and bla KPC [24]. Studies have reported occurrence of bla OXA-48 from as low as 3% to 22% [25,9]. In contrast, a previous report from India on carbapenem-resistant clinical E. coli isolates from 2013 and 2015 has shown that bla NDM was common among carbapenemases in E. coli (70%), followed by bla OXA-48 (24%) and bla VIM (17%). Co-occurences of bla NDM along with bla OXA (5%) and bla VIM (17%) have also been reported [26]. Similar results were seen in our study and the combinations observed were only bla NDM +bla OXA-1 . Though, bla OXA-181 was rare in combination with bla NDM (n = 1). These observations confirm that bla NDM is prevalent among E. coli followed by bla OXA in India, which is otherwise the most prevalent elsewhere.
There has been a global concern on aminoglycoside resistance in Gram-negatives. Acquired 16S-RMTases are known to confer extremely high level of aminoglycoside resistance, due to which key aminoglycosides including gentamicin, tobramycin, and amikacin are ineffective against carbapenem resistant strains [27]. Accordingly, plazomicin, a new aminoglycoside agent identified to combat against carbapenem-resistant Enterobacteriaceae, was found inactive if the isolates co-produced 16S-RMTases [28]. In this study,~95% of the RMTase positive E. coli co-harboured carbapenemases, which worryingly contributes to the already high burden of carbapenem resistance. Similar to our study, Taylor et al. [29] and Poirel et al. [30] have reported 83.1% and 45.4% co-occurrence of carbapenemases in 16S RMTase producing Enterobacteriaceae, respectively.
Our study shows that, for cephalosporin resistance, the isolates from ST131 and ST405 clades carried bla CTX-M-15 in 100% and 92% of their respective clades, whereas ST167 and ST410 isolates carried 18% and 43% bla CMY genes in addition to 63% and 100% bla CTX-M-15 . However, 54.34% bla CTX-M was reported previously in ESBL positive isolates from India [31]. Among the study isolates, ST167 carried significantly (P<0.05) lesser bla CTX-M-15 in comparison to other clades. Similarly, ST167 and ST410 carried bla CMY in addition for cephalosporin resistance which was not seen in ST131 and ST405. Recently, plasmid-mediated colistin resistance is being increasingly reported in E. coli [32][33][34]. This study also observed two isolates (B7532, B9021) with mcr-1.1 expressing high MIC of >32 μg/ml to colistin and both the isolates, from the same time period and ward, were closely related with same sequence type (ST624). After this observation made in 2007 strains, there have been no reports of mcr. The antimicrobial susceptibility of E. coli has been shown to vary geographically [35]. Among the different clonal groups observed elsewhere, E. coli ST131 was previously reported to be most commonly associated with community acquired infection [36][37], which recently were highly associated with healthcare settings. Also, ST131 was reported earlier as the predominant lineage carrying bla CTX-M-15 and other ESBLs. Most of the MDR E. coli carrying bla CTX-M-15 from different countries in Europe and North America were homogenously grouped into the E. coli O25:H4-ST131 [6,[36][37]. In our study, 87% of the isolates carried bla CTX-M-15 , among various STs, with only nine isolates of ST131. Among the observed STs in this study, bla CTX-M-15 was previously reported for its association with ST617, ST405 and ST131 [37].
Virulence genes observed among the E. coli isolates varied according to the different clades observed. The comparison of the virulence gene type with SNP based phylogeny revealed the acquisition and deletion of virulence genes. Genes iss, capU and gad were observed in ST167 clade. ST131 possessed iha, sat, cnfl and senB, in addition to iss and gad. ST131 strains in our study have lost the capU genes. Further, ST405 clade also lost iss and gained eilA and air genes with FimH type 29. Few isolates of ST405 retained iha and sat genes belonging to FimH 27 type within ST405. ST410 (FimH 24) that predominantly had ipfA gene, on the contrary, lost all other genes, except gad gene in two isolates. Overall, gad gene served as backbone for ST167, ST131 and ST405 clades, while ipfA was consistent in ST410. Ours is the first study that compares the evolution of virulence pattern with phylogeny, which explains the emergence of a stable clinical virulent phenotype.
FimH, that had been reported as a major candidate for the development of a vaccine against pathogenic E. coli [45] is responsible for producing mannose-sensitive bacterial adhesion [45]. Though high nucleotide conservation of >98% was observed in fimH alleles, minor sequence differences have been reported to correlate with differential binding and adhesion phenotypes [46]. Fim-H types in our study correlated well with the STs.
Our study shows that with a SNPs based phylogeny, higher discrimination between the clinical MDR E. coli isolates is apparent. Therefore, more such studies with integrated approach to analysing pathogenic E. coli in India are required to fully understand and follow the dynamic virulence and AMR landscape of this rapidly evolving group of pathogens.

Conclusions
To the best of our knowledge, this is the first report on SNP phylogeny in comparison with AMR and virulence traits in E. coli in India. The study revealed the prevalence of bla NDM-5 among the clades ST131, ST405 and ST410 clades. bla CTX-M-15 was responsible for cephalosporin resistance in ST131 and ST405 clades whereas, ST167 and ST410 carried both bla CTX-M-15 and bla CMY genes. For aminoglycoside resistance, rmtB was positive for 31.7% of the isolates, of which 30% were co-harbouring carbapenemases. The FimH types and virulence gene profile positively correlated with the SNP based phylogeny. However the predominant ST131 epidemic clone was smaller in our study population while ST167 and ST405 clones with multiple AMR genes were predominant. Further larger studies are needed to rule out any possible bias. Isolates with iss, capU and gad virulence genes were the major type. Moreover, SNP based phylogeny revealed evolution of the MDR clones among the study population, which suggests that continuous WGS level molecular surveillance would be necessary to keep track of the spread of MDR clones in India.
Supporting information S1