Genomic and Evolutionary Analysis of Two Salmonella enterica Serovar Kentucky Sequence Types Isolated from Bovine and Poultry Sources in North America

Salmonella enterica subsp. enterica serovar Kentucky is frequently isolated from healthy poultry and dairy cows and is occasionally isolated from people with clinical disease. A genomic analysis of 119 isolates collected in the United States from dairy cows, ground beef, poultry and poultry products, and human clinical cases was conducted. Results of the analysis demonstrated that the majority of poultry and bovine-associated S. Kentucky were sequence type (ST) 152. Several bovine-associated (n = 3) and food product isolates (n = 3) collected from the United States and the majority of human clinical isolates were ST198, a sequence type that is frequently isolated from poultry and occasionally from human clinical cases in Northern Africa, Europe and Southeast Asia. A phylogenetic analysis indicated that both STs are more closely related to other Salmonella serovars than they are to each other. Additionally, there was strong evidence of an evolutionary divergence between the poultry-associated and bovine-associated ST152 isolates that was due to polymorphisms in four core genome genes. The ST198 isolates recovered from dairy farms in the United States were phylogenetically distinct from those collected from human clinical cases with 66 core genome SNPs differentiating the two groups, but more isolates are needed to determine the significance of this distinction. Identification of S. Kentucky ST198 from dairy animals in the United States suggests that the presence of this pathogen should be monitored in food-producing animals.


Introduction
Salmonella enterica subsp. enterica is a major cause of human and animal salmonellosis worldwide. The majority of human salmonellosis in the United States is caused by several serovars, namely Enteritidis, Typhimurium, Newport, and Javiana [1]. Although it is currently assumed that all serovars are potentially pathogenic to humans, the association between S. enterica and non-human animal illness as well as aymptomatic carriage is not as clear and frequently animals shedding several serovars such as Kentucky, Enteritidis, and Seftenberg are asymptomatic [1][2] [3]. S. Kentucky is frequently isolated from both asymptomatic cattle, poultry and poultry products in the United States, but has been isolated from other sources such as the environment and domesticated dogs [2][3] [4] [5][6];;. Recently, the global spread of multi-drug resistant S. Kentucky ST198 has been described, indicating this serovar is an emerging public health threat [7]. In North America, human clinical cases of S. Kentucky ST198 infections have been associated with travel to the Middle East, Southeast Asia or Africa [7][8] [9], and clinical cases caused by S. Kentucky ST152 are relatively rare.
Salmonella Kentucky has been identified as the most frequently isolated serovar from nonhuman non-clinical cases in the United States [1] and until recently, has mostly been considered a concern with poultry due to its high prevalence in broilers as well as the established link between poultry products and human salmonellosis. Further, S. Kentucky isolated from poultry and poultry products in the United States has been reported to be resistant to multiple antibiotics [4][10] [11][12] [13]. This serovar is currently the most frequently isolated serovar from poultry in the United States, recently supplanting Enteritidis and Heidelberg in poultry flocks [2]. In recent years, research on dairy farms has demonstrated that S. Kentucky is frequently isolated from dairy cows and dairy production operations in the United States and the isolation rates of this serovar in dairy cows appear to be increasing [3][14] [15]). What remains unclear is how S. Kentucky is capable of colonizing these two distantly related hosts and if there are genomic differences between poultry-associated and bovine-associated S. Kentucky.
Both S. Kentucky ST152 and ST198 have been isolated globally, but the degree of relatedness among S. Kentucky isolates recovered on a global scale is not yet known. Multi-Locus Sequence Typing (MLST) analyses have demonstrated that there are no common alleles between these two STs, indicating that they are distantly related [7]. However, comprehensive description of the genomic differences between these two STs has not yet been conducted, and a comparison of the two may identify genomic factors involved in host-specificity and virulence potential. In a whole-genome phylogenetic analysis of S. enterica Timme et al. [16] demonstrated that S. Kentucky was polyphyletic; a phenomenon identified in other serovars [17]. For S. enterica and other bacteria, polyphyly has been associated with lateral gene transfer (LGT) of the antigenic coding regions [17] [18]. The historical application of serology to strain naming and description has thus been misleading in that distantly related strains that have acquired an antigenic coding region from the same source through LGT have been presumed to be similar based on this scheme, when in fact their genomes may be highly diverged.
The objective of this study was to infer the phylogeny of S. Kentucky, identify the genomic features associated with the apparent specificity of S. Kentucky ST152 to bovine and poultry hosts/products, and to identify differences between S. Kentucky ST152 and ST198 genomes. To accomplish this we sequenced the genomes of 49 dairy-cow associated isolates and two poultry-associated isolates that were collected from across the United States, and coupled these data with the genomes of 68 isolates previously collected from poultry, poultry products and production environments and clinical cases in the United States, Canada, and the Middle East.

Materials and Methods
Salmonella Kentucky isolates from dairy cows were recovered from previously collected National Animal Health Monitoring System (NAHMS) samples as well as an eight-year dairy farm monitoring program conducted in south-central Pennsylvania [3] [19] (two poultry-associated isolates were supplied by S. Parveen). Isolates were serogrouped following the methods of Herrera-Leon et al. [20]and Karns et al. [21]and serotypes were identified by National Veterinary Services Laboratories (NVSL; Ames, IA). Isolates were streaked onto tryptic soy agar, and a single colony was inoculated in tryptic soy broth overnight at 37°C. This inoculum was centrifuged, decanted, and then processed for DNA extraction using a Qiagen DNeasy Kit (Qiagen, Valencia, CA). Nextera XT libraries were made for each sample and pooled into equimolar concentrations following the manufacturer's instructions (Illumina, San Diego, CA). Paired-end sequencing (2 X 151 bp) was conducted on an Illumina NextSeq 500 sequencing platform with a High-Output flow cell. Data were demultiplexed and trimmed to remove adaptor sequences using the BCL2FastQ program and PhiX reads were removed using DeconSeq [22]. Reads were further cleaned using Trimmomatic [23] and assembled using SPAdes 3.6.2 [24]. ST56 complex genomes were downloaded from NCBI GenBank prior to February 2016 and ST15 complex genomes were downloaded prior to December 2015 (PRJNA242614, PRJNA273513, PRJNA78335, PRJNA78339, PRJNA78337, PRJNA225734, PRJNA225734, PRJNA66693, PRJNA20069, PRJNA19457, PRJEB6491, PRJNA337914, and PRJNA186035) (S1 Table). Genomes from BioProject PRJEB6491 [25] were labeled as 915c and 917c in this analysis. Bovine-associated S. Kentucky isolates used in this study will be provided upon request.
Three methods were used to identify SNPs among the S. Kentucky genomes. To identify high quality SNPs based on read coverage, Lyve-SET [26] [27]and CFSAN SNP Pipeline [28] [29] were used. For the Lyve-SET analysis SNP identification was conducted with a minimum 10X coverage requirement and the CGP read cleaner. When fastq files were not available, assembled genomes were included in the project/asm directory prior to SNP detection. Default settings were used for the CFSAN SNP Pipeline analysis. For this analysis only genomes for which fastq files or.sff files were available could be used for the SNP search and therefore one ST152 (strain CDC 191) genome and several ST198 genomes were excluded (strains CVM 43824, CVM 43780, CVM 43756). To identify SNPs in assembled genomes, Parsnp from the Harvest package was used [30] [31]. Regions undergoing high levels of recombination were removed using the-x option (PhiPack) and all genomes were included using the-c option. The chromosome of S. Kentucky CVM21988 was selected as the reference genome for all analyses. Parsnp was used to determine the core-genome SNP differences between S. Kentucky ST152 and ST198 genomes and other subclade A1 serovars identified by Timme et al. [16]. SNPs were annotated using snpEff [32].
To infer the phylogeny of S. Kentucky ST152/318/2132 isolates the multi-fasta SNP matrices from each of the three SNP detection methods were imported into MEGA6 [33] and RAxML [34]. Maximum Parsimony (MP) trees were inferred for all S. Kentucky isolates (ST198, ST152/318/2132, and unknown STs) as well as S. Kentucky isolates with non-S. Kentucky isolates using MEGA6. For phylogenetic inference of ST152/318/2132 isolates a Maximum Likelihood phylogeny was inferred using RAxML version 8.2.8 with the General Time Reversible (GTR) selected as the model of nucleotide substitution and all other parameters set to default settings. MP and ML analyses were each conducted with 1000 bootstrap replicates. The bootstrapped MP tree inferred from the Parsnp method of SNP detection was used to determine the genealogical sorting index (gsi) [35] for bovine and poultry isolates using the GSI webserver [36]. MLST data were retrieved from the University of Warwick Salmonella enterica MLST database [6]and through the Center for Genomic Epidemiology server [37].
The resulting SNP matrices were also analyzed using the Bayesian clustering program STRUCTURE v2.3.4 [38]. This program clusters individuals based on patterns of SNP differences without enforcing a bifurcating tree-like structure and, thus, may reveal additional details about the genomic content, similarity, and differences among isolates. In particular, STRUCTURE is suitable for detecting evidence for admixture (i.e., individuals whose genomes appear to be a combination of SNPs associated with distinct groups). STRUCTURE analyses were run with default settings and 60,000 generations, the first 10,000 of which served as the burnin. STRUC-TURE results were visualized using DISTRUCT v1.1 [39].
Putative plasmid sequences were identified with PlasmidFinder [40] using the Enterobacteriaceae database with the detection thresholds set to 95% sequence identity. The identified HSP fragments were compared to the NCBI database (BLASTN analysis) to identify similar plasmids. The presence/absence of protein coding genes was conducted using BLASTP. Putative genomic islands in S. Kentucky CVM29188 (ST152) and S. Kentucky CVM N51290 (ST198) were identified using IslandViewer 3 [41].

Results and Discussion
Description of Isolates and Antibiotic Susceptibility Testing of Bovineassociated Isolates In total 119 S. Kentucky genomes were utilized in the in silico analyses of this study. Of these, 49 dairy cow-associated isolates were selected from a collection of isolates that have been recovered from routine analysis of milk, milk filters, fecal samples from dairy cows, and the dairy cow farm environment in the United States over a 13-year period (Table 1). Two poultry isolates recovered from a broiler operation in the Eastern Shore of Maryland were used in this study, and the remaining poultry-associated genomes (n = 59) were gathered as assembled genomes and raw sequencing reads from the NCBI GenBank database and the SRA database, respectively. All human clinical isolates (n = 6) were all gathered from NCBI (two as raw sequencing reads and four as assembled genomes).
For in-house isolates (bovine-associated isolates and two poultry isolates, and ATCC 9263) antibiotic susceptibility tests (AST) were conducted. All bovine-associated isolates and ATCC 9263 were susceptible to all tested antibiotics. Poultry isolate ARS-CC5795 was resistant to amoxicillin/clauvulanic acid, ampicillin, cefitoxin, ceftiofur, and ceftriaxone, while ARS-CC5805 (poultry isolate) was resistant to streptomycin and tetracycline. The absence of antibiotic resistance in a diverse collection of S. Kentucky recovered from dairy cows is consistent with other studies and is notable due to the reported prevalence of antibiotic resistance among poultry-associated S. Kentucky within the United States [4][10] [11][12] [13][42] [43]. Antibiotic resistance conferring plasmids are frequently identified in poultry-associated S. Kentucky ST152 isolates [12] while antibiotic resistance in human clinical S. Kentucky ST198 is associated with acquisition of the Salmonella Genomic Island 1 (SGI1), plasmids, and core genome polymorphisms [7] [9][44] [45].  [46]. In these assays, each ST differs from another ST by at least one of seven alleles with some sequence types of the same serovar sharing no alleles. Based on the number of deposited sequences in the MLST database, ST152 and ST198 are among the most frequently isolated S. Kentucky sequence types. However, they share no common alleles indicating they are highly diverged from each other [7]. These two STs are part of two larger groups of "ST complexes" which consist of closely related STs. ST152 is a member of the ST15 complex which to date also includes ST151, 212, 318, and 723 among several others. ST198 is a member of ST56 complex, which also includes ST727, 835, and 1680. Outside of these defined complexes there are other closely related STs sharing one or more alleles with members of each complex. A pairwise comparison of seven MLST loci of ST152 and ST198 identified 42 SNPs resulting in a 1.25% sequence divergence (data not shown). When applying the same MLST criteria to the S. Kentucky genomes used in this study, three STs were identified ( Table 1). All of the poultry-associated isolates were ST152 with one identified as 2132 (one allele difference with ST152), while two bovine-associated isolates were ST318 (one allele difference with ST152 and ST2132), 42 were ST152, and three were ST198. Two strains could not be accurately typed due to abbreviated genes at the ends of contigs. S. Kentucky CVM N47729, isolated from a chicken breast, shared six of seven alleles with ST152 and S. Kentucky CVM N42453 shared six of seven alleles with ST198. Eight S. Kentucky genomes gathered from the NCBI database were identified as ST198. These included five human clinical isolates, two ground beef isolates, and the S. Kentucky type strain ATCC 9263. These results are consistent with those of the MLST database in that ST152 from the United States are commonly isolated from poultry and cattle. A single ST318 isolate is included in the MLST database and, consistent with our study, was isolated from cattle in the mid-Atlantic region of the United States. Based on data entries in the MLST database ST198 isolates were recovered on an apparently broader global scale and were more frequently associated with human infections and poultry, cattle, and occasionally other organisms, and the environment.
To investigate the relatedness among the S. Kentucky isolates and infer their evolutionary history on a genome-wide scale, SNPs across the genomes of all S. Kentucky and closely related serovars were identified. Using a core-genome SNP matrix determined from representative genomes of S. enterica subclade A1 as described by Timme et al. [16], two distant lineages of S. Kentucky were identified (Fig 1). These two S. Kentucky lineages correspond to the ST15 complex (ST152, 318 and 2132) and ST56 complex (ST198) of S. Kentucky sequence types described above (Table 1).
SNPs within the S. Kentucky genomes (excluding other subclade A1 serovars) were identified using three methods and phylogenetic trees were inferred from these data. Phylogenies inferred from data derived from the three SNP-detection methods were, for the most part, approximately similar in topology. For all SNP detection methods and both methods of tree inference the ST15 complex was observed to have two major sublineages (labeled as Lineages For five of the six analyses Lineage 1.0 is composed to several clusters of bovine isolates (n = 39) labeled as clusters 1.1, 1.2.1, 1.2.2 and 1.2.3. Cluster 1.1 consists of five ST152 isolates, one recovered from Texas and four from Wisconsin in 2007. The Wisconsin isolates grouped together, separately from the single Texas isolate. Cluster 1.2.1 consisted of one isolate from Pennsylvania and one from Virginia. Cluster 1.2.2 consisted of three isolates, two of them were typed as ST318 and collected from the same farm in Pennsylvania while the third isolate was a highly diverged ST152 isolate collected from Ohio. Cluster 1.2.3 consisted of 29 isolates, mainly from Pennsylvania with one isolate from New York, and two from Ohio. These results, however, are not consistent across all tree inference methods as the ML analysis of the SNPs detected with Parsnp place all bovine isolates except ARS-CC621 in Lineage 1.0 (Fig 3), albeit with low bootstrap support, and consistent with other methods place all poultry isolates in Lineage 2.0.
The topology of Lineage 2.0 is moderately to strongly supported for all analyses and is composed of three to four clusters of strains. For the Parsnp MP and Lyve-SET and CFSAN SNP Pipeline MP and ML analyses the ancestral Lineage 2.0 node moderately to strongly supported (80 to 94%) with much stronger support for the node from which the bovine-associated isolate ARS-CC621 and all poultry-associated isolates descended for all analyses.
For all analyses excluding the Parsnp ML analysis the Lineage 2.0 clusters included one group of bovine/human-associated genomes, (cluster 2.1), one bovine-associated genome (cluster 2.2) and two groups of poultry-associated genomes ( year period. These clusters are monophyletic (geneaological sorting index = 1, P <0.0001) and do not include any bovine isolates indicating complete lineage sorting from the most recent common ancestor that gave rise to poultry-associated isolates. All poultry-associated clusters were moderately to strongly supported for all MP analyses, while ML analyses resulted in lower bootstrap support for clusters 2. Within the paraphyletic bovine-associated groups presence of different clusters in the same region (Wisconsin, Texas, and Pennsylvania) indicates multiple evolutionary sublineages of S. Kentucky ST152 are circulating in dairy cow herds of the same region. For example, clusters   [47]. For example, migratory birds are known vehicles of enteric bacteria and have been suggested to be a transmission route to livestock [48] and some wild bird species such as European starlings (Sturnus vulgaris) are known to concentrate in areas of cattle operations where their presence has been associated with increased S. enterica contamination of feed and water [49].
Within the poultry-associated clusters, a clear geographic pattern of distribution was not observed. For example, isolates from New Mexico, Tennessee, California, and British Columbia were identified in both clusters 2.3 and 2.4. Along with the presence of isolates from the same state in multiple clusters, geographically disparate isolates grouped together as well. For instance, some isolates from British Columbia (ABBSB1008-2) are sister taxa on the same phylogenetic sublineage with one from Georgia (CVM N43820), two regions that are separated by over 4500 km. Similarly, isolates from California also clustered with those from Tennessee, Massachusetts, New York, Colorado, and Washington State in cluster 2.4. This lack of geographic clustering among the poultry isolates is most likely due to the fact that the majority of these isolates were not collected on-farm, but rather after poultry product processing. Sequencing of isolates collected from farms would help identify any possible geographic signatures present in the currently circulating ST152 populations.
Based on the genomes analyzed in this study there is strong evidence to support the hypothesis that bovine-associated ST152 isolates are phylogenetically distinct from poultry-associated ST152 isolates. However, these data only reflect those S. Kentucky ST152 isolates collected in North America in recent years. It is known that ST152 isolates have been recovered from various sources worldwide and as far back as the 1950s, but what is not well documented in the literature is how frequently ST152 are recovered from cattle and poultry in other continents, or if this ST152-Bovine/Poultry relationship occurs in other regions globally, i.e, if the ST152 strains have entered bovine and/or poultry populations outside of North America or if they are circulating through the populations of other animals. At present, the metadata of only five ST152 strains from outside of North America are deposited in the MLST database and none of these were recovered from poultry or bovine sources, but rather fish meal and human clinical cases. Addition of other ST152 genomes from a variety of sources may result in a somewhat different phylogenetic relationship among strains than what is presented here, as well as the presence of other evolutionary lineages not detected in this study.
Genomic Polymorphisms within the ST152/ST318/2132 Isolates (ST complex 15) In total there were 2662 SNPs identified in the core genome analysis by Parsnp, and 3353 and 3336 identified by Lyve-SET and CFSAN SNP Pipeline, respectively. We further identified Lineage and host-associated SNPs. The Lineage 1.0/2.0 divergence event for all trees excluding the Parsnp ML tree is marked by eight identified SNPs; four in intergenic regions and four in protein coding genes resulting in three synonymous mutations and one non-synonymous mutation (SeKA_A1002, SeKA_A1027, SeKA_A1948, SeKA_A4700) ( Table 2).
In a comparison of bovine-associated isolates to poultry-associated isolates an average of 217, 247, and 281 SNP differences were identified between the two by the Parsnp, Lyve-SET, and CFSAN SNP Finder analyses, respectively (Parsnp range = 62 to 308 SNPs, Lyve-SET range = 72 to 377 SNPs, and CFSAN SNP Pipeline range = 72 to 489). Within these SNP matrices there were four SNPs in protein coding genes, all resulting in non-synonymous mutations, that defined the bovine-poultry divergence for all strains (SeKA_A1094, SeKA_A2591, SeKA_A2812, SeKA_A4467) ( Table 2). No evidence of the role of these genes in host-specificity, colonization of the gut by S. enterica or persistence of S. enterica in cows, poultry, poultry and bovine products, or the environment could be found in the literature. However, hemolysin-3 (SeKA_A2591) has been shown in Gram-negative and Gram-positive organisms to be involved in the infection process of other mammals, particularly Vibrio vulnificus and Bacillus cereus [50] [51]. The hemolysin-3 in S. Kentucky demonstrates 71% and 45% amino acid identity to that of V. vulnificus and B. cereus. In vitro and in vivo assays need to be conducted to further elucidate the potential roles of these protein-coding genes in persistence in the poultry and bovine hosts or specificity to either environment.
A Bayesian analysis of SNP characters using the STRUCTURE v 2.3.4 program was consistent with the inferred ML and MP phylogenies (Fig 8) in showing that there were primarily two distinct groups. However, these results provided additional details that help explain the placement of certain isolates within the phylogenetic analysis. For example, STRUCTURE showed that isolates within Cluster 1.

Plasmid Detection
Using the assembled genomes, multiple plasmids were detected in the S. Kentucky ST152 isolates while several were identified as plasmid-free (Table 3 and S2 Table). Thirty three poultryassociated isolates encoded sequences similar to the ColV plasmid pCVM29188_146 plasmid (IncFIB), which is somewhat consistent with previous studies of poultry-isolated S. Kentucky in North America that have shown that the majority of these strains (72.9%) harbored markers of ColV plasmids [52]. However, this study also demonstrated that S. Kentucky isolates collected on farms were more likely to harbor this plasmid than those collected from retail meats [52], while the majority of isolates analyzed here were recovered from the latter. This plasmid has been shown to be involved in enhanced survival of S. Kentucky in poultry and is highly similar in gene content and nucleotide sequence similarity to the ColV plasmid of avian pathogenic Escherichia coli (APEC) [52]. The presence of this plasmid was restricted to poultry-associated isolates, providing further evidence that it is involved in interactions with the poultry host. However, it should be noted that Fricke et al. [12] identified six bovine-associated isolates with PCR markers indicative of the presence of APEC-like plasmids similar to those of pCVM29188_146 suggesting that more bovine-specific S. Kentucky need to be evaluated for the presence of this or highly similar plasmids. Although, Johnson et al. [52]demonstrated the role of this plasmid in enhanced extracellular survival in poultry, its absence from poultry-associated S. Kentucky suggests there are other factors involved in specificity of these strains to the poultry host. Similar to Ladely et al. [13] the IncFIB plasmid type was not evenly distributed across poultry-associated strains as it was identified in 81% of cluster 2.4.1.1 to 2.4.2 isolates but only in 25% of cluster 2.3.1 to 2.3.3.2 isolates.
The most frequently detected plasmid sequences in poultry-associated isolates were those similar to the canonical pCVM29188_46 (IncX1) plasmid, which was also not detected in any of the bovine-associated isolates (Table 3 and S2 Table). The biological role of this plasmid has not been well-elucidated, but its high prevalence among these strains indicates it may play a significant role in the survival of S. Kentucky within the poultry host, poultry production environment, or in transmission of S. Kentucky between animals. However, this would need to be further evaluated in vivo. Coupling the plasmid presence/absence data with the inferred phylogeny suggests that poultry-associated S. Kentucky acquired this plasmid, as well as the IncFIB plasmid after diverging from the most recent common ancestor shared with the bovine-associated isolates.
The majority of isolates that harbored plasmids were from the mid-Atlantic region of the United States. The absence of plasmids in many bovine-associated isolates (25%) suggests they are not necessary for survival or persistence within the bovine gastrointestinal tract or the dairy farm environment. However, the high level prevalence of IncI1 plasmids in cluster 1.2.3 isolates indicates that they may provide an advantage in these environments, and their restriction to isolates from the mid-Atlantic and Northeast United States may be indicative of their importance to the ecology of these strains in this region.

Phylogeny and Genomic Polymorphisms within the ST198 Isolates (ST complex 56)
Within the S. Kentucky ST198 lineage two major clusters were identified (labeled here as clusters 198.1 and 198.2) (Fig 9). Cluster 198.1 consists of isolates recovered from agricultural sources in the United States (dairy cow feces, dairy cow milk, ground beef, and ground turkey) and the S. Kentucky type strain ATCC 9263. Cluster 198.2 consists of five human clinical isolates, two of these collected from the same patient in Kuwait in 2012 (915c from a sacral wound and 917c from a stool sample) [25], and three were collected from human clinical cases in the United States. It is not known if these three infections were acquired abroad or within the United States, as metadata for these isolates are lacking in the public database. Source attribution would need to be verified before the locations of infections can be confirmed. S. Kentucky ST198 infections have been reported in Africa, Eastern Europe, and Southeast Asia and travel-acquired infections with these organisms has been reported in people traveling to these regions [7] [8]. Until a more comprehensive geophylogeny of S. Kentucky ST198 strains is conducted assumptions about strain traceback are speculative.
Using Parsnp, with S. Kentucky CVM29188 as a reference genome, there were 532 SNPs identified in the core-genomes of the ST198 isolates with an average of 209 SNP differences between the clusters 198.1 and 198.2 genomes (range = 169 to 228 SNPs). Cluster 198.1 isolates collected from agricultural sources in the United States had, for the most part, fewer SNPs between each other (mean = 138, range = 10 to 210) than they did with the human clinical isolates (cluster 198.2) (Table 4). Similarly, the human clinical isolates (cluster 198.2) had fewer SNP differences between each other than they did with the cluster 198.1 isolates (mean = 32, range = 2 to 48). Interestingly, in a genome-genome comparison fewer SNPs were detected between ATCC 9263 and 198.2 genomes than between ATCC 9263 and 198.1 genomes. However, there were fewer SNPs between ATCC 9263 and the 198.1 cluster than ATCC 9263 and the 198.2 cluster when the average base pair differences over all of the sequence pairs were calculated per cluster.
There were 66 conserved SNP differences identified between clusters 198.1 and 198.2 (Table 5). Of particular interest are the DNA gyrase subunit A (gyrA) (AEX15_13770) substitutions Ser83 ! Phe in 198.2; Asp87 ! Tyr in CVM 43780, CVM 43756, and CVM 43824; and Asp87 ! Asn in 915c and 917c. These substitutions confer resistance to nalidixic acid, an attribute that is presumed to have emerged in ST198 isolates in the early 2000s in Egypt [7]. Substitution Ser80 ! Ile in DNA topoisomerase IV subunit A (parC) (AEX15_04910) conferring resistance to ciprofloxacin, was also observed in 198.2 isolates. These substitutions are characteristic of ST198 strains circulating in North and East Africa and the Middle East [7] suggesting that the infections caused by these strains (S. Kentucky CVM 43780, CVM 43756, and CVM 43824) may have been acquired outside of the United States or from exposure to imported products.
A Bayesian analysis of SNP differences among isolates using the STRUCTURE program indicated that for the most part 198.1 and 198.2 isolates were distinct with the exception of ATCC 9263 and CVM N41913, which were basal within the cluster 198.1 lineage (Fig 10). For these two isolates a proportion of SNPs detected in their genomes (ATCC 9263 = 0.24 and CVM N41913 = 0.078) were characteristic of those identified in 198.2.
All cluster 198.2 genomes were identified as encoding ORFs homologous to the complete sequence of Salmonella Genomic Island 1 inserted at the trmE-yidY locus (SGI-K used as a reference; NCBI accession AY463797) (Fig 11). However, the structure of this island was difficult to discern due to the presence of SGI1 ORFs on multiple contigs in the 198.2 cluster genomes. SGI1 is an integrative and mobilizable element found in several S. enterica serovars and other non-salmonellae and responsible for resistance to multiple antibiotics [53] [54]. The structure of this island is known to be highly variable [44] [55]. Four of five SGI1 encoding genomes encoded regions known for resistance to mercury. However, this regions is absent from in CVM 43756 (Fig 11). For all cluster 198.2 genomes excluding 915c, strB (streptomycin phosphotransferase) and strA were not detected.
Based on the PlasmidFinder analysis, plasmids were not detected in any of the ST198 strains except for S. Kentucky CVM 51290 (Table 3). However, strains harboring plasmids carrying antibiotic resistance genes have been collected from human clinical cases [9] suggesting that, like ST152 and other S. enterica serovars, plasmid presence is variable in ST198. The IncI1 plasmid sequence identified in S. Kentucky CVM 51290 is similar to others currently in a NCBI  plasmid database and shows 99% sequence similarity across 90% of the putative plasmid region with plasmid pSL476_91 from S. Heidelberg str. SL476, and 99% similarity across 88% of the plasmid region with S. Kentucky plasmid pCS0010A_95. Few conclusions, based on genomic data, can be made about the ST198 group at this time due to the limited number of available S. Kentucky ST198 genome sequences deposited in public databases. However, it is clear that there is considerable genomic diversity among the genomes that have been sequenced to date. Sequencing of more ST198 genomes will allow for a more comprehensive understanding of the geophylogeny, global diversity, and presence of variable regions, such as SGI1, in these isolates.
It is important to note that ST198 has been isolated from avian and bovine sources in the United States [6] indicating that it may have the potential to become established in poultry flocks, dairy herds, other livestock, and/or wildlife in the United States. However, more work needs to be conducted to understand its apparent limited distribution in these animals as well as the presence of any potential wildlife reservoirs. Further, more work needs to be conducted to estimate the virulence potential of the ST198 that is endemic in the United States and abroad.

Differences between S. Kentucky ST152 and ST198 Isolates
Although frequently considered to be similar groups of strains due to their serological-based nomenclature, S. Kentucky ST198 and S. Kentucky ST152 isolates demonstrate considerable differences in their genetic backgrounds that have not been adequately described. Using    Fig 12). In this analysis there were between 29,952 and 29,990 SNPs between CVM29188 and ST198 genomes. Thirteen serovars had fewer SNP differences with S. Kentucky CVM29188 than did S. Kentucky ST198 with CVM29188 (Table 6, Fig 12), while, 11 serovars of subclade A1 demonstrated fewer SNP differences with ST198 S. Kentucky ATCC 9263 than did S. Kentucky CVM29188 with S. Kentucky ATCC 9263. These data indicate a significant difference in the nucleotide sequence content between ST152 and ST198 genomes and coupled with the core-genome phylogeny (Fig 1) indicate that S. Kentucky ST152 and ST198 are more similar to other serovars than they are to each other.
In an analysis comparing only ST152 genomes to ST198 genomes 37,769 SNP differences were detected (S3 Table). These SNPs were annotated and results demonstrated a significant number of polymorphisms in the protein-coding regions of the genome (S4 Table). Of a total of 4452 annotated protein-coding genes, at least one SNP was detected in 3713 (83.5%) genes while no SNPs were detected in 737 (16.5%) genes. Three hundred seventeen of these genes had ! 20 conserved SNP differences with ST152. Of particular interest are several protein coding genes that are involved with interactions with the animal host environment, specifically in the colonization and infection processes by S. enterica. For example, 239 conserved SNP differences were identified in SeKA_A3913, a 16,683 bp ORF annotated as a conserved hypothetical protein, located within SPI-4, and involved in colonization of the bovine gastrointestinal tract [56]. It should be noted that there was an appreciable level of sequence divergence in this gene within ST152 isolates, particularly in isolate ABB1087-1, and this is most likely due to the likelihood of synonymous substitutions to accumulate across a large protein coding gene. SeKA_A0255 of ST152, an ORF homologous to the secreted virulence protein SlrP (STM0800) of S. Typhimurium LT2, had 84 conserved SNP differences with ST198 genomes. A large repetitive protein (SeKA_A2250) homologous to BapA (STM2689) involved in host-colonization, organ invasion, and biofilm formation [57] encoded 68 conserved SNP differences between the two STs. An invasion-like protein (SeKA_A1129), homologous to STM1669 (ZirT), involved in virulence modulation [58] encoded 51 conserved SNP differences between the two STs. ShdA (SeKA_A1896), an outer-membrane protein that is expressed in the intestine and is involved in long-term fecal shedding [59] [60], had 48 conserved SNP differences between the two STs. Multiple effector proteins also demonstrated appreciable divergence between ST152 and ST198 (SeKA_A1084 = sseJ; SeKA_A1053 = sifB; SeKA_A1621 = sopA; SeKA_A0837 = sseC, SeKA_A0835 = sseB; SeKA_A2465 = sopD; SeKA_A2283 = pipB; SeKA_A2398 = sipD; SeKA_A0839 = sseE; SeKA_A2393 = sptP; SeKA_A0841 = sseF; SeKA_A2380 = avrA). These proteins interact with host-cells and are intimately involved in the infection process [61][62] [63] and their sequence divergence may result in different interactions in the human, cow, and poultry hosts, with the potential for different outcomes for each host. However, to fully assess the biological consequences of these differences in vivo analyses targeting these regions would need to be conducted.
Regions identified in one ST but absent in the other, as well as regions present in both but flanked by unique ORFs were identified by conducting a whole-genome reciprocal BLASTP analysis (Table 7). Sixteen regions of contiguous protein-coding genes present in ST198 genomes but absent in ST152 genomes, or highly diverged and flanked by unique regions were identified. Of these, six were completely or partially identified as genomic islands. Of particular interest in the ST198 genomes is an eight ORF region (region 3 in Table 7) responsible for sialic acid transport, a key feature of mammalian pathogens that allows them to scavenge sialic acid and utilize it as a carbon source in the host intestine (AEX15_07505, AEX15_07550) [64][65] [66]. This region is flanked on one side by a tRNA-Ser locus suggesting it is a putative transmissible genomic island. A BLASTP analysis demonstrates there is a unidirectional match at low to moderate amino acid similarity (23 to 70% similarity) when these ORFs are compared  interest within the ST198 genomes consists of 15 ORFs involved inositol catabolism (region 4) (AEX15_17525, AEX15_17625). This region was also identified in several other serovars and has been identified as a myo-Inositol utilization island by Kröger and Fuchs [67]. Chaudhuri et al. [68] demonstrated this region as being associated with proliferation in the host gut. Further, in vivo studies have shown that salmonellae with an inactive reiD (the orphan regulator of the myo-inositol utilization island) demonstrated reduced fitness following oral infections or chickens, calf, and pigs [68]. S. Kentucky ST152 isolates, however, lack this island and, based on their prevalence in this environment, are suitably adapted for persistence in the bovine gut. A third region of interest is a nine ORF region identified within SPI-6 and encoding genes involved in alpha-fimbriae expression, a transcriptional regulator and a mobile genetic element (region 5) (AEX15_23130, AEX15_23170). This region is homologous to the Typhi-colonization factor (tcf) operon (STY0345-STY0348) and should be further investigated as a possible mechanism by which ST198 isolates colonize and infect human and animal hosts. Seventeen contiguous regions of ST152 genomes were determined to be absent in ST198 genomes by a similar reciprocal BLASTP analysis. Eight of these regions were identified as putative genomic islands. Several of these regions were annotated as arrays of hypothetical proteins with no known function. Other islands encoded ORFs annotated as being involved in metabolic functions such as region 2 (ORFs involved in glycolysis and gluconeogenesis, inositol catabolism, benzoate degradation) (SeKA_A0372, SeKA_A0399), region 9 (ORFs involved in glycolysis and gluconeogenesis, mannitol utilization) (SeKA_A2624, SeKA_A2631), and region 15 (sorbitol dehydrogenase, fructose-bisphosphate aldolase) (SeKA_A0714, SeKA_A0725). The presence of these operons in strains frequently isolated from dairy cows and poultry suggests their roles in these environments should be further investigated.
In both STs ten homologous insertion loci flanking different protein encoding regions were identified, indicating that they may be hotspots for integration of laterally transferred DNA in S. enterica. Five insertion loci encoded tRNA sequences and one encoded a tmRNA sequence, which are known to be regions of genomic island insertion [69]. For the most part the ST-specific islands were conserved among all members of each ST indicating that they may play a significant role in the ecology of these strains and/or they are anchored in the genomes of either host ST.

Conclusions
Based on the analysis conducted herein, there is a phylogenetic difference between poultryassociated and bovine-associated North American S. Kentucky ST152 isolates which is discernible due to four core-genome SNPs. However, several clusters of bovine-associated S. Kentucky ST152 isolates are more closely related to some poultry-associated isolates than they are to other bovine-associated isolates. This divergence is also associated, in silico, with the presence of several plasmids, one of which (ColV) has been demonstrated to enhance the colonization capacity of these strains in the chicken gastrointestinal tract, the other being a high frequency IncX1 plasmid with no known biological role. The influence of the core-genome SNPs and this IncX1 plasmid on the ecology of S. Kentucky ST152 in the bovine and poultry hosts, or survival in food sources or the environment should be further evaluated in vivo.
A significant difference in gene content and core-genome nucleotide sequence divergence between S. Kentucky ST152 and ST198 isolates was also observed. Based on the methods used in this study, both sequence types are phylogenetically more closely related to other serovars than they are to each other and their shared nomenclature stems from the high level of similarity between their antigenic coding regions which may have transferred laterally between sequence types. Several genomic elements in ST198, such as a sialic acid transport region, inositol catabolism and a homolog of the Typhi colonization factor (tcf), and differences in amino acid sequence of virulence-associated proteins may result in different interactions in the human, bovine, and poultry gastrointestinal tracts, an area that requires further research.
Although the predominant S. Kentucky strains recovered from both dairy cows and poultry in the United States do not appear to cause considerable disease in the animal hosts, the apparent high prevalence in these food animals represents a food safety risk to consumers of beef, dairy, and poultry products. Although infrequent, human clinical salmonellosis caused by S. Kentucky has been reported in the United States, but there is not much available data on the sequence types of human-clinical S. Kentucky in this country. The recent establishment of CIP R S. Kentucky ST198 in poultry in France and Poland and other regions along with the detection of this ST in dairy cows, raw milk, ground beef, and ground turkey suggests the presence of this potential emerging pathogen should be monitored in food-producing animals.
Supporting Information S1