Application of mini-MLST and whole genome sequencing in low diversity hospital extended-spectrum beta-lactamase producing Klebsiella pneumoniae population

Studying bacterial population diversity is important to understand healthcare associated infections’ epidemiology and has a significant impact on dealing with multidrug resistant bacterial outbreaks. We characterised the extended-spectrum beta-lactamase producing K. pneumoniae (ESBLp KPN) population in our hospital using mini-MLST. Then we used whole genome sequencing (WGS) to compare selected isolates belonging to the most prevalent melting types (MelTs) and the colonization/infection pair isolates collected from one patient to study the ESBLp KPN population’s genetic diversity. A total of 922 ESBLp KPN isolates collected between 7/2016 and 5/2018 were divided into 38 MelTs using mini-MLST with only 6 MelTs forming 82.8% of all isolates. For WGS, 14 isolates from the most prominent MelTs collected in the monitored period and 10 isolates belonging to the same MelTs collected in our hospital in 2014 were randomly selected. Resistome, virulome and ST were MelT specific and stable over time. A maximum of 23 SNV per core genome and 58 SNV per core and accessory genome were found. To determine the SNV relatedness cut-off values, 22 isolates representing colonization/infection pair samples obtained from 11 different patients were analysed by WGS with a maximum of 22 SNV in the core genome and 40 SNV in the core and accessory genome within pairs. The mini-MLST showed its potential for real-time epidemiology in clinical practice. However, for outbreak evaluation in a low diversity bacterial population, mini-MLST should be combined with more sensitive methods like WGS. Our findings showed there were only minimal differences within the core and accessory genome in the low diversity hospital population and gene based SNV analysis does not have enough discriminatory power to differentiate isolate relatedness. Thus, intergenic regions and mobile elements should be incorporated into the analysis scheme to increase discriminatory power.

Introduction Klebsiella pneumoniae (KPN) frequently causes community and hospital acquired infections including pneumonia, urinary tract infections and pyogenic liver abscesses [1,2]. The main KPN transmission reservoirs are the gastrointestinal tract and the hands of hospital personnel and patients. Nosocomial KPN isolates often display highly resistant phenotypes with an extended-spectrum beta-lactamases producing (ESBLp) KPN prevalence between 2% and 55% [3][4][5].
Typing methods to discriminate different bacterial isolates from the same species are essential epidemiological tools. In populations with a high prevalence of ESBL, the knowledge of bacterial population structure and dynamics is especially important in outbreak detection and intervention. To monitor the bacterial population, cheap, rapid and robust methods are needed. In our previous study, we proved that mini-MLST, a method derived from multi locus sequence typing (MLST) in which costly and time-consuming sequencing is replaced with high resolution melting analysis, is suitable for long term prospective KPN population screening. Currently, besides KPN [6], mini-MLST has been established for Staphyloccoccus aureus [7], Enterococcus faecium [8] and Streptococcus pyogenes [9]. However, its lower discriminatory power makes mini-MLST insufficient to identify outbreak strains and it needs to be combined with more sensitive methods.
Recently, whole genome sequencing (WGS) has revolutionized our ability to differentiate between bacterial strains at the entire genome's DNA sequence level. For bacterial typing, WGS has two major approaches-core genome multilocus sequence typing (cgMLST) and single nucleotide variant analysis (SNV) [10]. CgMLST is based on an allele numbering system of a pre-determined set of genes. An advantage of cgMLST's approach is its inter-laboratory portability and the existence of public databases e.g. https://www.cgmlst.org/ncs, http:// enterobase.warwick.ac.uk/ or https://pubmlst.org/. SNV analysis is based on mapping raw sequence reads against a reference genome and detecting nucleotides that vary within the dataset. This approach provides an even higher resolution power than cgMLST, but is far more computationally intensive than cgMLST analysis and interpreting the results is more complex [11,12].
As generating WGS data become more accessible, rapid and cheap, bottlenecks remain in proper pre-sequencing sample selection and post-sequencing data analysis [11]. The main bottlenecks include the need to critically evaluate the raw sequencing data, some knowledge and skills in programming and improvements in data analysis to translate the enormous amount of obtained data into understandable results for health professionals [13]. Knowledge of the local bacterial population's genetics characterisation is also crucial for the results to be correctly interpreted, as there are no general thresholds of relatedness [10].
The main objectives of this study were i) to characterise the ESBLp KPN population in our hospital using mini-MLST prospective typing ii) to evaluate core and accessory genome single nucleotide variant analysis contribution in possible outbreak detection within the low diversity ESBLp KPN hospital population.

Clinical isolates
The study was conducted at the University Hospital Brno (Brno, Czech Republic), a tertiary care hospital with more than 2,000 beds and 5,000 employees. There are more than 1,000,000 people treated in out-patient clinics and over 70,000 patients hospitalized every year. During the systematic strain collection between 7/2016 and 5/2018, we collected all ESBLp KPN  [14]. All isolates were collected during routine practice and were made completely anonymous.

DNA isolation
For mini-MLST, bacterial genomic DNA (gDNA) was isolated using Chelex 100 Resin (Bio-Rad, USA). The Bacterial culture was homogenised in 100 μ of 5% w/v Chelex 100 Resin with a vortex. The suspension was incubated for 10 min at 100˚C and then centrifuged for 2 min at

Mini-MLST
Mini-MLST was performed with primers described by Andersson, Tong [6] on a RotorGene 6000 platform (Corbett Research, Australia). The 20 μL reaction volume contained 10 μL 2× SensiFAST HRM mix (Bioline Reagents, UK), 0.4 μM of each primer, 1 μL of extracted genomic DNA (30 ng) and deionized water to a final volume of 20 μL. Thermo cycling parameters were: 95˚C for 3 min, 40 cycles of 95˚C for 5 s, 65˚C for 10 s and 72˚C for 20 s, then one cycle of 95˚C for 2 min and 50˚C for 20 s, followed by HRM ramping from 70 to 95˚C, increasing by 0.2˚C at each step. The results were interpreted using the current version of our conversion key, which is available for free download at http://www.cmbgt.cz/mini-mlst/t6353.
After reference mapping, all positions with less than 10× coverage and all ambiguous positions (less common base represented at least 10% of bases in the target position) were removed from further analysis. UGENE software was used to obtain consensus sequences [17]. Resistome and virulome analysis was carried out using public online databases (http://www. genomicepidemiology.org/, http://bigsdb.pasteur.fr/). The cgMLST was performed using Ridom SeqSphere+ (Ridom, DE) with an incorporated KPN cgMLST scheme. This included 2,358 target genes whose alleles were used to generate a cgMLST dendogram. For the core genome SNV minimum spanning tree 2,042,000 aligned nucleotide sites were analysed. In total, 4,179,387 aligned nucleotide sites were accounted for by the core and accessory genome SNV minimum spanning tree analysis. The WGS data were used to determine the MLST type of all sequenced strains. (2.6%, n = 24) and MelT20 (2.3%, n = 21) were the most predominant (Fig 1). The remaining 30 MelTs were present in less than 2% of all isolates.
The MLST, resistome and virulome analysis. First, in silico STs were determined and resistome and virulome were compared for all 46 samples (Fig 2). MLST analysis (based on seven house-keeping genes rpoB, gapA, mhd, pgi, phoE, infB and tonB) showed ST uniformity within individual MelT groups, except isolate S13 from MelT26 which, unlike the other isolates in this MelT group, was ST29. This isolate was the only MelT26 detected in 2014. All other sequenced MelT26 isolates from 2016 and 2017 were ST1271. ST1271 and ST29 only differ in the MLST scheme's phoE allele (allele phoE 4 in ST1271 against allele phoE 6 in ST26) (S1 Table).
Comparing resistome and virulome correlated with the MLST analysis results and confirmed a high level of similarity inside the MelTs (see detailed results in S1 Table). Isolates clustered together, as in the in silico MLST analysis (including separation of isolate S13 from other isolates) with one exception in MelT132, which was divided into two lineages, A and B, according to the different resistome and virulome composition. Lineage A included S19, S20, S21 and S44; lineage B included S22, S24, S25, S26, S35 and S36.
cgMLST and SNV analysis. CgMLST analysis provides more discriminatory power than traditional MLST. In this study, we used a cgMLST scheme consisting of 2,358 targets processed by Ridom SeqSphere+ software (Ridom, DE). A cgMLST dendrogram based on 2,251 targets present in all analysed strains was constructed (Fig 2). The other 107 targets from the original 2,358 targets were absent in some or all of the strains and were excluded from all further analyses. The cgMLST divided 46 isolates into 7 clearly distinguishable clusters corresponding to their MelTs groups, with two exceptions. The S13 isolate differed from the other MelT26 isolates in 584 targets, while the average number of different alleles within the other MelTs ranged from 0 to 19 differences between two samples of the same MelT. The second exception was MelT132, which was divided into two subpopulations with 127 different alleles between them. Both exceptions correlated with the previous in silico MLST and resistome and virulome analysis. The average number of different alleles between individual MelTs was 1827 (range from 1809 to 1844 differences).
SNV analysis was carried out for both a core genome (n = 2,251 genes present in all analysed strains) and core with accessory genome (n = 4,084 genes present in all analysed strains). In the core genome, 29,535 SNV positions were identified in total. In general, the core genome SNV number inside each MelT varies from 0 to 23 SNV (MelT145 0-23 SNV, n = 12; MelT139 0-21 SNV, n = 8; MelT26 0-20 SNV, n = 8; MelT266 0-14 SNV, n = 4; MelT132 cluster A 0-22 SNV, n = 5; MelT132 cluster B 0-4 SNV, n = 6; MelT281 0 SNV, n = 2 and MelT130 0 SNV, n = 2) (Fig 3A). The S13 isolate differed by 2,311 SNV from other MelT26 isolates, while between individual MelTs, the number of SNV ranged from 9,595 to 9,821. The MelT132 isolates were divided into two lineages, A and B, similar to the cgMLST analysis. Both MelT132 lineages differed with 137 SNV between them. The complete distance matrix for the core genome SNV analysis is showed in S2 Table. The core and accessory genome SNV analysis was done with 4,084 gene targets present in all 48 isolates (Fig 3B). The distance matrix for the core and accessory genome's SNV analysis is showed in S3 Table. The major topology difference between the core (A) and the core and accessory (B) genome panels was the position exchange between the MelT281 and the MelT132 clusters. Additionally, using the core and accessory genome panel (B), the MelT26 and the MelT132 clusters were linked with the S13 isolate that was only linked to MelT26 cluster when using the core genome panel (A).
Infection and colonisation pair isolates analysis. We analysed 11 pairs (each pair included a rectal swab and a blood culture) of ESBLp KPN isolates obtained from 11 patients to set the SNV cut-off, which determined the relatedness of isolates (Fig 2). From 22 ESBLp KPN isolates, 6 distinct STs were identified: ST405 (n = 6), ST433 (n = 6), ST323 (n = 4), ST1271 (n = 2), ST458 (n = 2) and ST23 (n = 2). The isolates from each patient share the same ST within the pair. The core genome's SNV number within each pair varies in range from 0 to 22 SNV, while together with the accessory genome's SNV number, varies in range from 0 to 40 SNV. Based on paired sample analysis, we established relatedness cut-off values for our ESBLp KPN population as the highest SNV number within pair isolates, 22 SNV for the core genome and 40 SNV for the core with the accessory genome. Both the highest SNV values were for pair S39/S40 that had time lapses between samples of 47 days. The next highest SNV values were only 3 SNVs for the core genome and 5 SNVs for the core and accessory genome, despite time lapses of up to 90 days."

Discussion
We are currently using the following protocol in routine practice. ESBLp KPN collected from high-risk departments are prospectively tested with mini-MLST to determine the MelT. When the strains MelT differ, transmission is unlikely. When we observe an increased incidence of one MelT, we investigate the potential epidemiological linkages and then we decide if there is a possible outbreak and need for WGS analysis. Meanwhile, early epidemiological measures can be implemented to prevent further spread of infection.
KPN mini-MLST is a cheap, rapid and robust method for epidemiological strain typing introduced by Andersson, Tong [6], with advantages in its robustness, reproducibility and portability between laboratories as it is based on a well-established MLST method. In studies with a large number of samples, mini-MLST also can be used for sample sorting and preselecting for more detailed analysis. We evaluated this method for prospective ESBLp KPN   Fig 2. WGS results of 46 selected ESBLp KPN isolates. The cgMLST dendogram was made using Ridom Seqsphere+ software and is based on sequence similarity in the 2,251 core genome targets shared by all 46 isolates. Mini-MLST was done experimentally and MLST was done in silico using WGS data. Mini-MLST, MLST, cgMLST and SNV analysis of pair samples (blood culture/rectal swab pairs are highlighted in the same colour). The time lapse indicates time between collecting the colonizing and infecting isolates. � Samples obtained from one patient, but with no rectal swab.
https://doi.org/10.1371/journal.pone.0221187.g002 population screening and to solve healthcare associated infection outbreaks in our previous study [14]. During the mini-MLST results evaluation, we were unable to determine the MelT of several isolates as their mini-MLST allele combination did not corresponded to any MelT in the published conversion key. On the date of the original article publication there were 863 STs, while in 7/2019 there were 4126 STs, from which it was not possible to determine some of the newly described STs' MelT according to the original conversion key. Following this, we constructed a new automatic algorithm (applicable to already existing and future-designed mini-MLST schemes) and created a new conversion key that will be regularly updated. The latest version of the key is available on http://www.cmbgt.cz/mini-mlst/t6353. Based on our previous study, we started prospective screening in certain departments of our hospital in 7/2016. Since then, we have observed six MelTs account for 82.8% of isolates, from a total of 38 MelTs identified between 7/2016 and 5/2018 (Fig 1). The most pronounced changes in proportional representation were observed in MelT145 (from 38.1% in 2016 to 29.5% in 2018), MelT139 (from 10.9% in 2016 to 3.6% in 2018), MelT132 (from 2.5% in 2016 to 4.7% in 2018) and MelT266 (from 1.7% in 2016 to 5.2% in 2018). We did not detect any emerging MelT which spread to more than 1% of the study population. This suggests a relatively stable low diversity bacterial population which cannot be divided more by using the mini-MLST method only and needs methods with more discriminatory power.
WGS has become a powerful method for most epidemiological studies and hospital outbreak investigations, as it provides multiple analyses from a single technique, including data on MLST, resistant genes, virulence genes, plasmids and genome comparison [18]. Currently, epidemiological investigations have mainly focused on an allele-based approach (cgMLST, wgMLST) and SNV analysis. While the allele-based approach is particularly suitable for multicentre studies and clustering large populations of bacteria [19], SNV analysis is especially useful when dealing with outbreak episodes where we are expecting the analysed strains to be similar. The number of differences in alleles or SNV that still identify isolates as similar or related and include them in outbreak episodes differs in various publications [11]. According to our best knowledge, there are no general standards or rules to set a sufficient threshold value. For this reason, it is not possible to simply compare the results of individual studies and freely use previously published cut-off values, the cut-off being typically determined based on the specific results from those publications and mostly varying between 0 and several dozen SNV [10]. Also, comparing outbreak strains and the local bacterial population is not commonly done in most studies, which may give important information to determine relatedness and assess outbreaks.
In order to evaluate the role of WGS in low diversity population outbreak analysis, we sequenced 46 ESBLp KPN isolates belonging to the four mostly spread MelTs from the monitored period from 7/2016 to 5/2018, isolates from the same MelTs originally collected for our mini-MLST pilot study in 2014 [14] and colonization/infection pair samples. First, we performed in silico MLST to correlate the STs with MelTs and characterise our population in the worldwide KPN population context. All predominant STs in our population have previously been described in literature. Serotype K1 strains belonging to ST23 (MelT281) were described as a frequent cause of invasive infections and liver abscesses [20]. ST321 (MelT139), ST323 (MelT266), ST1271 (MelT26) and ST29 (MelT26) were described as long-term, persistent MDR strains in hospital facilities [21][22][23]. ST405 (MelT132) was often described as an OXA-48 Molecular typing of low diversity ESBL-producing Klebsiella pneumoniae population producer [24] and ST433 (MelT145) as hyper virulent biofilm producing strains [25]. The only discrepancy between MLST and mini-MLST was in sample 13, which belonged to ST29, in contrast with the other MelT26 isolates, which belonged to ST1271. S13 was the only MelT26 isolate collected in 2014.
Second, we used public online databases to determine the profile of resistance genes, virulence factor genes and efflux pump genes (S1 Table). Compared to the mini-MLST results, resistome, virulome and efflux pump gene analysis only separated S13 from others belonging to MelT26 (as well as MLST) and divided MelT132 into two clearly separate clusters (Isolates from clusters A and B from MelT132 both belonged to ST405). In summary, these analyses do not have sufficient discriminatory power for the epidemiology of low diversity hospital bacterial populations. However, they provide information on the resistance and virulence of the examined strains, which can help manage the spread of hyper-resistant and hyper-virulent strains and may be useful in patients' treatment.
The cgMLST provides more discriminatory power than previously mentioned analyses, since it is based on hundreds or thousands KPN genes' allelic similarities depending on the study's design [26][27][28]. Despite analysing 2,251 genes, we did not find enough allelic differences between isolates belonging to the same MelT to draw a conclusion about isolate relatedness.
To analyse the SNV analysis results, we have to set the SNV cut-off values to determine ESBLp KPN isolate similarity. We used the general premise that the colonization strain is in most cases the cause of the subsequent infection [26,29,30]. Therefore, the number of SNV between colonization/infection pair isolates defines the value of the difference level at which the isolates can still be considered similar. Based on our results, we established two sets of cutoff values. The first set was based on all pairs and cut-off values were 22 SNV for the core genome and 40 SNV for the core and accessory genome (Fig 2). The stricter cut-offs were set with excluded SNV values for pair S39/S40 and values were 3 SNV for the core genome and 5 SNV for the core and accessory genome. Both cut-off values set reflected the low diversity in the highly selective ESBLp KPN hospital population. Using both of our cut-off values on our population, even isolates without evident epidemiological associations were clustered together, which makes evaluating outbreaks difficult and may lead to erroneous conclusions.
We are aware that our study is a single-centre study with a small number of isolates sequenced. Especially to determine cut-off, more paired isolates should be analysed. However, due to the specific hospital's environment, we can expect highly-selected bacterial populations to also be found in other hospitals.
To our best knowledge, this is the first study where prospective molecular typing is combined with WGS to define the epidemiological background and the genetic structure of the hospital's bacterial population. We proved that mini-MLST is a cost effective means of ruling out epidemiological linkage, but only complete genome analysis can provide strong evidence in favour of epidemiological linkage. Our findings showed there were only minimal differences within the core/accessory genome in the low diversity hospital population and gene based SNV analysis does not have enough discriminatory power to differentiate isolates' relatedness or evaluate whether it is an outbreak or not. Thus, intergenic regions and mobile elements should be incorporated to the analysis scheme to increase the discriminatory power. Therefore, when evaluating any molecular biological data, it is necessary to analyse them to concord with the epidemiological background.
Supporting information S1