Extended-spectrum beta-lactamase (ESBL)-producing and non-ESBL-producing Escherichia coli isolates causing bacteremia in the Netherlands (2014 – 2016) differ in clonal distribution, antimicrobial resistance gene and virulence gene content

Background Knowledge on the molecular epidemiology of Escherichia coli causing E. coli bacteremia (ECB) in the Netherlands is mostly based on extended-spectrum beta-lactamase-producing E. coli (ESBL-Ec). We determined differences in clonality and resistance and virulence gene (VG) content between non-ESBL-producing E. coli (non-ESBL-Ec) and ESBL-Ec isolates from ECB episodes with different epidemiological characteristics. Methods A random selection of non-ESBL-Ec isolates as well as all available ESBL-Ec blood isolates was obtained from two Dutch hospitals between 2014 and 2016. Whole genome sequencing was performed to infer sequence types (STs), serotypes, acquired antibiotic resistance genes and VG scores, based on presence of 49 predefined putative pathogenic VG. Results ST73 was most prevalent among the 212 non-ESBL-Ec (N = 26, 12.3%) and ST131 among the 69 ESBL-Ec (N = 30, 43.5%). Prevalence of ST131 among non-ESBL-Ec was 10.4% (N = 22, P value < .001 compared to ESBL-Ec). O25:H4 was the most common serotype in both non-ESBL-Ec and ESBL-Ec. Median acquired resistance gene counts were 1 (IQR 1–6) and 7 (IQR 4–9) for non-ESBL-Ec and ESBL-Ec, respectively (P value < .001). Among non-ESBL-Ec, acquired resistance gene count was highest among blood isolates from a primary gastro-intestinal focus (median 4, IQR 1–8). Median VG scores were 13 (IQR 9–20) and 12 (IQR 8–14) for non-ESBL-Ec and ESBL-Ec, respectively (P value = .002). VG scores among non-ESBL-Ec from a primary urinary focus (median 15, IQR 11–21) were higher compared to non-ESBL-Ec from a primary gastro-intestinal (median 10, IQR 5–13) or hepatic-biliary focus (median 11, IQR 5–18) (P values = .007 and .04, respectively). VG content varied between different E. coli STs. Conclusions Non-ESBL-Ec and ESBL-Ec blood isolates from two Dutch hospitals differed in clonal distribution, resistance gene and VG content. Also, resistance gene and VG content differed between non-ESBL-Ec from different primary foci of ECB.


INTRODUCTION
Sciences V.25.0 (SPSS, Chicago, Illinois, USA) and R Version 3.4.1. Boxplots were made with 103 R packages ggplot2 and ggpubr and bar charts were made with Graphpad Prism Version 8.0.1. 104

Serotyping 117
We assigned serotypes by using the web-tool SerotypeFinder 2.0 from the Center for Genomic 118 was calculated for non-ESBL-PEc and ESBL-PEc isolates. Serotype distribution among non-121 excluding isolates in which no definitive serotype could be defined and the occurrence of contigs for antimicrobial resistance genes using the ResFinder 3.1.0 database (acquired 127 resistance genes), date of download 24 January 2019, and the Comprehensive Antibiotic 128 Resistance Database (CARD) (all resistance genes), date of download 1 March 2019 [18,19]. 129 The thresholds for coverage length and sequence identity were 80% and 95%, respectively. A 130 resistance gene count using each of the databases was made per isolate, which was defined as 131 the total number of resistance genes (using CARD) and the total number of acquired resistance 132 genes (using ResFinder) identified, respectively. In case of double detection of identical 133 resistance genes within a single isolate, they were only counted once. The resistance gene 134 scores were compared between non-ESBL-PEc and ESBL-PEc with the non-parametric 135 Wilcoxon rank sum test (for this comparison only, the scores of the ESBL-PEc isolates were 136 corrected for presence of the ESBL gene). Resistance gene scores were then analysed for non-137 ESBL-PEc and ESBL-PEc separately and were compared between isolates with different 138 epidemiological characteristics and different STs using Kruskal-Wallis one-way ANOVA. In case 139 of an overall ANOVA P value <0.05, post-hoc pairwise comparisons were made and the Holm-140 Bonferroni P value correction was applied to account for multiple testing. For pairwise 141 comparisons, the non-parametric Wilcoxon rank sum test was used. 142

Virulence genes 143
The presence of putative virulence factor genes (VG) was identified using abricate version 144 was only counted once. These virulence scores were then compared between isolates with 157 different epidemiological characteristics and between different STs using Kruskal-Wallis one-158 way ANOVA. In case of an overall ANOVA P value <0.05, post-hoc pairwise comparisons were 159 made with the non-parametric Wilcoxon rank sum test and the Holm-Bonferroni P value 160 correction was applied to account for multiple testing. 161

Patient characteristics 163
The isolate collection consisted of 212 phenotypic non-ESBL-PEc and 69 ESBL-PEc blood 164 isolates ( Fig. 1). Distribution of age, sex, onset of infection and primary foci were comparable 165 between non-ESBL-PEc and ESBL-PEc bacteremia episodes ( ESBL-PEc, respectively. The occurrence of different STs did not differ between nosocomial and 178 community onset ECB (S1 Figure and S2 Table). ST131 was the dominant ST among ESBL-179 positive ECB episodes with a primary urinary (63%) and hepatic-biliary focus (57%), which was 180 higher as compared to other primary foci of ESBL-positive ECB (i.e. 21% among primary 181 hepatic-biliary focus, see S3 Figure and S4 Table). The NJ-phylogenetic tree of all isolates can 182 be found in the Supporting Information (S5 Figure).   (34.8%) ESBL-PEc isolates, which largely reflected the prevalence of ST131 in each group 196 (  Table).

Antimicrobial resistance genes 208
In total, 110 unique resistance genes were identified with CARD and 69 unique acquired 209 resistance genes were identified with ResFinder 3.1.0 (see S7 Table). ESBL-genes were isolates remained genotypically ESBL-negative. One of these isolates was positive for bla CMY-2 216 (AmpC gene).
(IQR 43 -50) for ESBL-PEc iolates (P value < 0.001). The median acquired resistance gene 219 count for non-ESBL-PEc versus ESBL-PEc was 1 (IQR 1 -6) versus 7 (IQR 4 -9) (P value < 220 0.001). Among non-ESBL-PEc, total and acquired resistance gene counts were not different 221 between community and hospital-onset ECB episodes (S8 Figure and S9 Table). Among non-222 ESBL-PEc, there were statistically significant differences in resistance gene count for different 223 primary foci of ECB, but absolute differences were small (S8 Figure). The median acquired 224 resistance gene count of non-ESBL-PEc isolates from ECB with a primary hepatic-biliary focus 225 was 1 (IQR 1 -1), whereas for a primary urinary focus this was 2 (IQR 1 -6) (P value ≤0.001), 226 primary gastro-intestinal focus this was 4 (IQR 1 -8) (P value ≤0.01) and unknown primary 227 focus this was 1 (IQR 1 -6) (P value ≤0.0001) (S9 Table). Among ESBL-PEc isolates, there 228 were no statistical significant differences in total and acquired resistance gene counts between 229 community and hospital-onset ECB or different primary foci of ECB (S8 Figure and S9 Table). 230 Among non-ESBL-PEc, there was heterogeneity in total resistance gene count between almost 231 all dominant STs; this was not the case for acquired resistance gene count ( Fig. 3 and S10 232 Table). No statistically significant differences were observed among ESBL-PEc isolates of 233 different STs (S10 Table). that occurred >5% within ESBLs or non-ESBLs were grouped into main groups, the rest was categorized as "Other".

244
Results of the pairwise comparisons can be found in S10 Table. 245 246

Virulence genes 247
Of the 49 predefined ExPEC-associated VG, 44 (89.8%) were detected in at least one E. coli 248 blood isolate (S11 Table). The median VG score was 13 (IQR 9 -20) for non-ESBL-PEc and 12 249 (IQR 8 -14) for ESBL-PEc blood isolates (P value = 0.002). In one non-ESBL-PEc isolate no 250 predefined ExPEC-associated VG was detected, while a maximum VG score of 25 was found in 251 two non-ESBL-PEc isolates. 252 For non-ESBL-PEc and ESBL-PEc isolates, there was no significant difference in the VG 253 score between isolates that caused community or hospital onset ECB (S13 Table). Non-ESBL-254 PEc isolates that caused ECB with a primary gastro-intestinal focus (median 10, IQR 6 -13) 255 and hepatic-biliary focus (median 11, IQR 5 -18) had lower VG scores as compared to isolates 256 with a primary focus in the urinary tract (median 15, IQR 11 -21) (P value = 0.007 and P value 257 = 0.036, respectively, see S12 Figure and S13 Table). Among non-ESBL-PEc and ESBL-PEc, 258 there were no statistical significant differences in VG scores between isolates of patients without versus with a urinary catheter, between patients alive or deceased after 30 days or between patients admitted to the intensive care unit (ICU) versus a non-ICU ward (data not shown). 261 There was heterogeneity in VG scores between non-ESBL-PEc of different STs, this 262 was less pronounced for ESBL-PEc isolates ( Fig. 4 and S14 Table). ESBL-negative ST38 had 263 the lowest average VG score (median 7, IQR 6 -7) and ESBL-positive ST12 had the highest 264 VG score (median 23, IQR 23 -23). Median VG score of both ESBL-negative and ESBL-265 positive ST131 isolates was 13 . All pairwise comparisons between ESBL-266 negative STs yielded Holm-Bonferroni adjusted P values < 0.05, except for the comparison 267 ST12 versus ST73 and all pairwise comparisons that included ST38. 268

272
Boxplots display median VG score and inter quartile range (IQR); and every dot represents a single isolate. Only STs 273 that occurred >5% within ESBLs or non-ESBLs were grouped into main groups, the rest was categorized as "Other".

274
Results of pairwise comparisons can be found in S14 Table. 275 276

DISCUSSION 277
In this study, we found that ESBL-producing E. coli blood isolates were different from non-278 has been described before [29,30]. In contrast, ST73, a ST that is known for its susceptible 283 antibiotic profile [29], was only identified among non-ESBL-PEc blood isolates. The association 284 between ESBL phenotype and STs in E. coli, which is repeatedly found, implies that the 285 molecular backbone of strains can increase (or decrease) its propensity to acquire and 286 subsequently maintain plasmids carrying ESBL genes. A recent large-scale study that 287 compared the pan-genomes of invasive E. coli isolates, including ST131 and ST73, suggested 288 that due to ongoing adaptation to long term human intestinal colonisation and consequent 289 evolutionary gene selection, ST131 might have become able to reduce the fitness costs of long 290 term plasmid maintenance [31,32]. Interestingly, in our study, isolates that belonged to ST73 291 had low resistance gene content but relatively high VG score as compared to other STs. 292 Furthermore, the average VG score among non-ESBL-PEc was slightly higher in comparison to 293 ESBL-PEc blood isolates, which demonstrates that ESBL-positivity in E. coli is not necessarily 294 related to an increased VG content. In case we would assume that VG content is associated 295 with virulent potential, i.e. the ability of a strain to cause invasive disease, then these findings do 296 not support the theory that increased virulence of resistant strains causes the increased 297 incidence of resistant ECB as compared to sensitive ECB. This theory has been suggested for 298 other pathogens, such as MRSA [1,33,34]. Still, results of the current study show that molecular 299 characteristics of ESBL-PEc cannot be merely generalized to non-ESBL-PEc blood isolates, 300 highlighting the importance of not preselecting on ESBL-positivity when investigating the 301 molecular epidemiology of ECB. 302 One of our hypotheses was that the distributions of STs, resistance gene and VG 303 content would differ between ECB episodes of community and hospital onset and between 304 different primary foci, as a possible result of different levels of sub specialization of intestinal E.
subgroups, but found that these differences in molecular content mostly depended on performed in Scotland [35]. In that study, there were combinations of virulence genes as well as 309 a particular accessory gene composition that differentiated between STs rather than between 310 epidemiological factors. The association between ST69 and community onset ECB, as found in 311 the Scottish study, was not identified in the current study. Other differences were the large 312 proportion of E. coli isolates from ECB episodes that were deemed hospital-acquired (62%) as 313 compared to our study (18.4% for ESBL-negative and 36.2% for ESBL-positive ECB) and in that 314 study, analyses were not stratified for ESBL-positivity. More studies that combine clinical 315 characteristics with molecular characteristics of ECB are important, because these data help to 316 further elucidate the role of host-specific factors versus strain-specific factors in the 317 pathogenesis of ECB. Since different determinants of ECB might indicate different targets for 318 surveillance or infection-prevention, a thorough understanding of the molecular epidemiology is 319 needed to reduce the occurrence of this invasive infectious disease with potential severe clinical 320

consequences. 321
We identified serotype O25:H4 as the most prevalent serotype causing ESBL-negative 322 as well as ESBL-positive ECB in the Netherlands, followed by O6:H1. The serotype distribution 323 among non-ESBL-PEc was more heterogeneous as compared to ESBL-PEc, similar to the 324 differences in clonal diversity between these two groups. A large recent European surveillance 325 study that included 1,110 E. coli blood isolates from adults between 2011 and 2017 showed that 326 there is heterogeneity in serotype distribution among different countries, which highlights the 327 need for country specific data [17]. Furthermore, we showed that the coverage of the new 328 potential 10-valent vaccine was higher as compared to the 4-valent vaccine and was actually 329 doubled for non-ESBL-PEc bacteremia. The findings of the current study can be used for future 330 studies and can help further evaluation and implementation of E. coli vaccines. epidemiological characteristics and highly discriminatory genetic data. There are also important 333 limitations. Firstly, E. coli is a heterogeneous species, of which the seven MLST genes only 334 constitute a small proportion of the entire gene content. Because we also only investigated 335 presence of a small fraction of the genes that are commonly part of the accessory genome, 336 such as virulence and acquired resistance genes, but did not assess the entire accessory gene 337 pool, we could have missed genomic differences between isolates that are reflected in the 338 accessory gene pool only. Secondly, we selected E. coli isolates from a tertiary care center and 339 teaching hospital from the Netherlands from two different regions, which we considered to be 340 representative of the Netherlands. The description of strains that were identified here might not 341 be entirely generalizable to other countries since there could be differences between circulating 342 E. coli strains, dependent on local population characteristics and antimicrobial resistance levels. 343 Thirdly, many pairwise comparisons between subgroups were performed, which increases the 344 risk of false-positive findings (i.e. type I errors). Even though we applied a strict p-value 345 correction for multiple testing, this naturally does not eliminate the risk of false-positive findings. 346 The analyses on resistance gene and VG content should therefore be viewed as hypothesis 347 generating. 348 In conclusion, there are molecular differences between non-ESBL-PEc and ESBL-PEc 349 blood isolates that reach beyond their phenotypic ESBL positivity. Future genomic research of 350 E. coli should preferably focus on E. coli without preselection on ESBL-positivity, to limit the risk 351 of inferring characteristics of resistant E. coli to the E. coli population as a whole. Furthermore, 352 more studies are needed to better understand repeatedly found associations between gene 353 content and STs, which could aid the development of targeted preventive interventions. 354 Environment (RIVM) and the University Medical Center Utrecht (UMCU) and the authors 358 received no specific funding for this work. 359

ACKNOWLEDGMENTS 360
We would sincerely like to thank Kim van der Zwaluw, Carlo Verhulst and Judith Vlooswijk for 361 their contributions in the laboratory execution of the study. We would also like to thank Janetta 362