On the Evolutionary History, Population Genetics and Diversity among Isolates of Salmonella Enteritidis PFGE Pattern JEGX01.0004

Facile laboratory tools are needed to augment identification in contamination events to trace the contamination back to the source (traceback) of Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis). Understanding the evolution and diversity within and among outbreak strains is the first step towards this goal. To this end, we collected 106 new S. Enteriditis isolates within S. Enteriditis Pulsed-Field Gel Electrophoresis (PFGE) pattern JEGX01.0004 and close relatives, and determined their genome sequences. Sources for these isolates spanned food, clinical and environmental farm sources collected during the 2010 S. Enteritidis shell egg outbreak in the United States along with closely related serovars, S. Dublin, S. Gallinarum biovar Pullorum and S. Gallinarum. Despite the highly homogeneous structure of this population, S. Enteritidis isolates examined in this study revealed thousands of SNP differences and numerous variable genes (n = 366). Twenty-one of these genes from the lineages leading to outbreak-associated samples had nonsynonymous (causing amino acid changes) changes and five genes are putatively involved in known Salmonella virulence pathways. While chromosome synteny and genome organization appeared to be stable among these isolates, genome size differences were observed due to variation in the presence or absence of several phages and plasmids, including phage RE-2010, phage P125109, plasmid pSEEE3072_19 (similar to pSENV), plasmid pOU1114 and two newly observed mobile plasmid elements pSEEE1729_15 and pSEEE0956_35. These differences produced modifications to the assembled bases for these draft genomes in the size range of approximately 4.6 to 4.8 mbp, with S. Dublin being larger (∼4.9 mbp) and S. Gallinarum smaller (4.55 mbp) when compared to S. Enteritidis. Finally, we identified variable S. Enteritidis genes associated with virulence pathways that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future outbreaks involving S. Enteritidis PFGE pattern JEGX01.0004.

S. Enteritidis remains a significant pathogen and a substantial threat to the food supply. It also represents one of the most genetically homogeneous serotypes of Salmonella, and certain clonal lineages remain intractable to differentiation by commonly used conventional subtyping methods [38][39][40][41][42][43][44][45][46]. The unusual genetic homogeneity observed among certain lineages of S. Enteritidis strains remains intriguing. Recent population genetic studies suggest that most S. Enteritidis strains belong to a single multilocus genotype [4][5][6]. A subpopulation of this clone was shown to associate more frequently with egg-related salmonellosis and clinical illness [4]. Thus, specific requirements for colonization and survival in infected poultry may select for only a few genotypes of S. Enteritidis in the poultry environment. The random amplification of polymorphic DNA (RAPD), real-time polymerase chain reaction (RT-PCR), and Phage typing (PT) methods [2,7,9,45,46] from diverse isolates within S. Enteritidis have revealed only a limited amount of genetic variation. More recently, more resolved discriminations of these salmonellae have been reported using rapidly-evolving CRISPR elements [5,17]. Conversely, rather than targeting a subset or region of variation in the S. Enteritidis chromosome, whole genome sequencing (WGS) will capture all of the genetic variation that exists among these highly clonal lineages. To date, only a few strains of S. Enteritidis are available as complete genomes [47][48] along with close relatives S. Gallinarum [11] and S. Gallinarum biovar Pullorum [49]. These isolates have genome sizes around 4.7 mbp. The basic pan genomes are described in these initial studies, but currently, there are no published NCBI draft comparative genomes or associated manuscripts describing variation within S. Enteritidis. In this study, we describe the natural genetic variation within S. Enteritidis isolates associated with a widespread egg contamination event and retaining pulsed-field gel electrophoresis (PFGE) pattern JEGX01.0004 and analyze the comparative evolutionary genetics within this important foodborne pathogen and several of its closest relatives.
In 2010, the Centers for Disease Control and Prevention (CDC) along with many state laboratories identified a nationwide increase in S. Enteritidis isolates submitted to PulseNet (http://www.cdc. gov/salmonella/enteritidis/). Epidemiological investigations suggested that shell eggs were the most likely source of this increase. FDA, CDC, and state partners conducted traceback investigations and found many of the restaurants involved received shell eggs from a single company (http://www.fda.gov/food/newsevents/ whatsnewinfood/ucm222684.htm). As a result, on August 13, 2010, one egg producer initiated a nationwide voluntary recall of shell eggs that had been sold to distributors and wholesalers in 22 states and Mexico. A record 380 million shell eggs were recalled under many different brand names. On August 19, a second egg producer initiated an additional recall of eggs that went to grocery stores, distributors, and wholesalers in 14 states. The second producer shared a contaminated feed supply with the first and was geographically nearby. In all, more than 500 million eggs were involved during this nationwide recall.
The primary goal of this study was to examine the genetic variability of isolates collected during the 2010 S. Enteritidis shell egg outbreak within the PFGE pattern JEGX01.0004, a pattern comprising over 40% of all of the S. Enteritidis isolates submitted to the national database. We also included several other isolates with similar PFGE patterns to JEGX01.0004 found in the associated egg-farm environment. We went on to describe the genetic diversity and evolutionary history of 106 new draft genomes for this virulent pathogen within this narrow but important sampling of S. Enteritidis diversity. As a result, we were able to provide new genetic targets useful for distinguishing S. Enteritidis isolates otherwise indistinguishable by several current methodologies. Once validated, these new SNP targets can be interrogated using widely available DNA sequencing through capillary electrophoresis (CE), short-read pyrosequencing, realtime PCR, or mass spectrometry of PCR amplicons. Finally, this study evaluates the potential use of targeted genomic sequencing with next generation sequencing (NGS) for rapidly resolving future S. Enteritidis outbreaks in eggs.

Salmonella Enteritidis strains
A set of 67 food, environmental, and clinical S. Enteritidis isolates collected from farms and egg sources linked to the 2010 egg contamination event was included for whole genome sequencing. Specifically, 36 S. Enteritidis isolates, originating from environmental swabs, were collected directly from various farm sources implicated in the contamination event (e.g., egg wash water). Four S. Enteritidis were isolated directly from shell eggs, liquid eggs, or other egg-containing food sources known to be contaminated during this time period. Two S. Enteritidis isolates were obtained directly from chicken feed or components thereof at the implicated farms. An additional 25 clinical isolates, collected during the time of the egg contamination event (2010) and retaining common PFGE patterns to the egg S. Enteritidis isolates, were kindly provided by the Centers for Disease Control and included for sequencing. In addition, 39 isolates, collected earlier in time and unrelated to the contamination event, were added as reference S. Enteritidis for the WGS analysis. These included 13 isolates with two-enzyme matching PFGE patterns, seven singleenzyme matching patterns, indistinguishable in either the primary (XbaI n = 3) or secondary (BlnI n = 4) enzyme, and 19 isolates with no common PFGE patterns to the contamination event. These isolates also were used to further investigate the phylogenetic utility of phage-typing. Included in this group of 39 were 10 of unknown PT and, 14 of historical PT8 isolates. The remainder were 15 isolates of S. Enteritidis from ten other diverged PTs such as PT1, 21, 2, 4, 14b, 13, 13a, 23, 28 and 35. S. Enteritidis strains were phage-typed by previously described methods [2] at the National Microbiology Laboratory, Canadian Science Centre for Human and Animal Health, Winnipeg, Manitoba, Canada. Strains that reacted with phages but retained unrecognizable lytic patterns were atypical and were designated atypical or RDNC (reacts but does not conform). Specific PFGE pattern names, PTs, and other metadata associated with the S. Enteritidis strains are listed in Table 1 (PTs are included in the tree label names).

Growth of bacterial strains, and genomic and plasmid DNA isolation
Genomic DNA was isolated from overnight cultures as follows: each initial pure culture sample was taken from frozen stock, plated on Trypticase Soy Agar, and incubated overnight at 37uC. After incubation, cells were taken from the plate and inoculated into Trypticase Soy Broth culture for DNA extraction. All samples were representative cultures from a full-plate inoculation and were not single colonies. Genomic DNA was extracted using Qiagen DNeasy kits.

Library construction and genome sequencing
For this study, all S. Enteritidis isolates were shotgun sequenced using the Roche 454 GS Titanium NGS technology [50]. This platform provided longer read lengths relative to other sequencing methods and has a relatively shorter time to generate raw sequence information. Taxon sampling included one new isolate each of S. Gallinarum and S. Gallinarum biovar Pullorum, two isolates of S. Dublin and 106 new isolates of S. Enteritidis including a few isolates differing by PFGE patterns, and the majority of isolates sharing the same PFGE pattern (Table 1). These Salmonella serotypes have been considered to be close relatives traditionally.       Each isolate was run on a quarter of a titanium plate that produced roughly 250,000 reads per draft genome resulting in an average genome coverage of about 206.

Genome assembly and annotation
De novo assemblies were created for each S. Enteritidis isolate using the Roche Newbler run Assembly software (v. 2.6). All draft genomes were annotated using NCBI's Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP, [51]). Comparison of the de novo assemblies against the complete genome for S. Enteritidis strain 125109 (GenBank accession: AM933172) using Mauve [52] identified several large contigs that did not map to the reference genome: phage RE-2010 (Accession: HM7700079), plasmid pOU1114 (Accession: DQ115387, strain SL909), plasmid from strain CDC_2010K_1729 (pSEEE1729_15), plasmid from strain CDC_2010K-0956 (pSEEE0956_35), and plasmid from strain 607307-2 (pSEEE3072_19). The reference sequence used for mapping reads was comprised of the complete S. Enteritidis genome (AM933172, which includes the P125109 phage) plus the 5 additional elements previously described.

Comparative genomic analysis
SNPs were identified by mapping the 454 reads to the reference genome using Roche Newbler runMapping software (v. 2.6). SNPs were defined as positions where one or more isolates differed from the reference sequence with coverage $46 and with $95% of the reads containing the SNP, excluding insertions and deletions [indels] The alignments were then screened to find non-gap phylogenetically informative nucleotide positions (i.e. minor allele count $2). The mapped consensus base for each isolate at the reference SNP positions were then concatenated in a multiple FASTA file for phylogenetic analysis. The maximum likelihood tree was constructed using GARLI [53] with 1000 bootstrap replicates. All GARLI analyses were performed with the default parameter settings and the GTR+C+I nucleotide substitution model. SNPs in single copy protein coding genes were identified using the same criteria by mapping the isolate reads to the annotated CDS regions in AM933172. Multiple alignments for genes with SNPs were created using the UCLUST [54] software package. There were 366 genes that met the SNP criteria that were present in 95% or more of the 106 isolates. These 366 genes represent a conservative estimate of the set of variable genes as we have eliminated indels and CDS regions that could not be reliably predicted and annotated. A phylogenetic tree also was built with TNT [55] and characters were optimized onto the tree to assess character evolution for several of the critical nodes on the tree associated with the outbreak implicated farm isolates [56] as well as for identifying SNPs specific to S. Enteritidis.
Phylogenetic analyses of the clonal S. Enteritidis data set including multiple outgroups were performed on the concatenated informative SNP matrix described above. Approximately 99% of the sites in the 5MB Salmonella genomes are phylogenetically uninformative (i.e. showing no differences that provide clustering information) and eliminating them dramatically reduces computation time and memory requirements. Additional, phylogenetic analyses were performed on the set of 366 concatenated genes containing informative SNPs.

Accessions
Whole genome shotgun accessions (WGS), bioproject accession numbers are listed in Table 1.

Genome size, order and conservation
New draft genomes are provided for 110 Salmonella isolates including 106 S. Enteritidis, and four closely related outgroups, two S. Dublin and one each of S. Gallinarum, and S. Gallinarum biovar Pullorum (Table 1). While synteny and genome organization were largely stable among these isolates, genome size differences were observed due to variation in the presence or absence of several phages and plasmids including phage RE-2010 [57], phage P125109 [11], plasmid pOU1114 [58], and several newly observed plasmid mobile elements pSEEE1729_15, pSEEE0956_35 and pSEEE3072_19 (Figs. 1 and 2, Table 1). One of these, pOU1114, is a newly finished complete plasmid known from partial data to reside within S. Enteritidis and its close relative S. Dublin. pSEEE3072_19 is closely related to the previously characterized S. Enteritidis plasmid pSENV [59]. Presence or absence of mobile elements in S. Enteritidis contributed to a genome size ranging from 4.6 to 4.9 mbp, with S. Dublin being relatively larger (,4.9 mbp) and S. Gallinarum smaller (4.55 mbp) when compared to the S. Enteritidis genomes collected here. A bimodal split centered on 4.7 mbp was noted, which largely corresponds to mobile elements that partition predictably between phylogenetic lineages (Table 1, Figures 1, 3).
Most clinical isolates are phylogenetically close to isolates from two egg farms A set of 106 ecologically diverse food, environmental, and clinical S. Enteritidis strain isolates, associated with the time period surrounding the 2010 egg contamination event, were included for whole genome sequencing. Strains with expanding diversity and representing three important levels for comparison were included in the analysis. The first group of 60 strains represented a highly homogeneous set of environmental, farm, food, and clinical S. Enteritidis isolates sharing a common PFGE pattern and temporally associated with the 2010 egg contamination event.
The second tier of 30 strains included a set of historical environmental, food, and clinical S. Enteritidis isolates that retained identical or highly similar PFGE patterns but were unassociated with the 2010 egg contamination event, unrelated in time, location or isolation source. Finally, the last group of 16 isolates was also unrelated to the 2010 egg event and included a series of S. Enteritidis strains with more diverged PFGE patterns and phage types away from the 2010 egg S. Enteritidis isolates. These strains served largely as genetic references, effectively allowing for a testing of the phylogenetic monophyly of the 2010 egg-associated S. Enteritidis isolates. As an example, these isolates include other phage types such as PT4, PT23, PT14b, and PT1 and date back over 50 years in time.
Phylogenetic analysis of these genomes revealed several interesting observations. First, the S. Enteritidis PFGE Pattern JEGX01.0004 plus related strains and strains with similar PFGE patterns formed a monophyletic group distinct from other neighboring serovars S. Dublin, S. Gallinarum, and S. Gallinarum biovar Pullorum. Previous comparative genomics studies [12,[14][15][16][17] have shown that S. Enteritidis, S. Dublin, S. Gallinarum biovar Pullorum and S. Gallinarum form a natural group, a finding supported by our results. Second, within S. Enteritidis, nine lineages were defined from the tree (Figure 3). Genetic diversity between different serovars included thousands of differences while variability between the nine lineages of S. Enteritidis labeled C1-C9, ranged only in the order of 100 to 600 nucleotide changes. Within lineage variation was usually less than 100 bp with the exception of lineage C7 which had over 200 bp of intra-clade variability ( Table 2).
Among the isolates compared, results for clinical isolates sorted into each of the major lineages (Clades C1, C2, C3 and C5, Figure 3) with most falling into clades C1 and C2. It is noteworthy that no apparent increase in substitutions was observed for the isolates that passed through patients compared to their environmental clones. If there was an increase or expansion in genetic diversity among the clinical isolates studied, compared to other food and environmental S. Enteritidis collected in relation to the 2010 egg event, one would expect observed genetic diversity to have been expressed as increased or longer branch lengths among the terminal tree nodes leading to the 2010 clinical isolates in the tree. In general, this was not observed. Albeit, several clinical isolates (i.e., SEEE9845 and SEEE4647 both from Ohio) reflect the accumulation of just a few additional SNPs in the tree as their terminal branches project slightly from the base of the 2010 egg isolates in clade 1. However, comparable subtle genetic variations among environmental and egg isolates were also noted as well in the tree indicating that no additional or overt pressure to change was applied in vivo for the clinical strains included here among the 2010 egg and environmental isolates. For example, environmental isolates from Ohio (e.g., SEEE1117 and SEEE1618), also in clade 1, vary comparably in their branch lengths to the aforementioned clinical isolates.
Clades C7, C8 and C9 contained a diversity of isolates from unrelated and historical freezer stocks that were not connected to the large shell egg outbreak (Table 1). Additionally, environmental S. Enteritidis isolates taken from Farm 1 were found in clades C6 and C1, while environmental S. Enteritidis isolates from Farm 2 were observed in Clades C4, C2 and one isolate in C1. It is important to note that in our S. Enteritidis strain tree presented here, the phylogenomic data sort in a largely hierarchical fashion. That is, isolates associated with the 2010 S. Enteritidis egg event do cluster most closely together with additional SNP diversity providing higher resolution for related strains within the contamination event. Additionally, nearly all of the reference isolates retaining common PFGE patterns but unassociated with the egg event sort adjacent to but outside of the 2010 S. Enteritidis egg, clinical, and farm swarm of isolates. Surprisingly, however, several of these genetically similar S. Enteritidis reference strains lacking any temporal relatedness to the 2010 egg event do partition with other egg isolates. One S. Enteritidis isolate from 2004, for example, formed a sub-clade with two clinical isolates from Tennessee within the larger clade 2 in the genome tree ( Figure 3). Also in clade 2, a historical S. Enteritidis isolate from California (1441) sorted closely with two S. Enteritidis clinical isolates from Minnesota collected from 2010 and during the egg event. The substantial number of SNPS that partition strains within S. It is important to note that many S. Enteritidis strains with common phage-types are polyphyletic (do not sort into a single group) in the whole-genome sequence tree. S. Enteritidis strains designated as PT8, for example, are phylogenetically distributed across clades 1, 2, 3, 5, 6, 7, and 8 suggesting that despite retaining this common phenotypic feature, phage types are phylogenetically distinct and diverged among their genome sequences. This observation is not unexpected [9] given the intrinsic horizontal movement of phage restriction across diverged strains of S. enterica.

Genetic variation defining S. Enteritidis
More than 50 genes vary with SNPs that define S. Enteritidis separately from the outgroups compared in this study (Table 3). For example, the multicopper oxidase gene, (cueO, locus tag SEN0173), represents one gene with numerous genetic signatures unique to S. Enteritidis strains. This gene and protein alignment show a dozen SNP differences and three amino acid differences Figure 3. Phylogenetic tree based on the maximum-likelihood method implemented in GARLI. Numbers associated with branches represent the percent of 1000 bootstrap replicates supporting the major clades C1 through C9. Acquisition of ALFR00000000 putative plasmid pSEEE1729_15 is defined by a star at the base of C1. doi:10.1371/journal.pone.0055254.g003 Table 2. Pairwise SNP distances+/2SD between major lineages identified in the phylogenetic tree (C = clade).     that appear to be present in all S. Enteritidis examined. Serovardefining signature amino acid differences include E to Q (position 132), P to L (position 337), and L to S changes (position 342).

Genetic variation defining S. Enteritidis outbreak lineages
At least 366 genes varied among S. Enteritidis strains comprising the egg-associated foodborne isolates, the farm environmental samples, and temporally-associated clinical samples (Table S1). Of the 366 genes that varied, 21 had nonsynonymous changes that were optimized to one of the branches supporting egg-associated clades C1, C2 or the shared lineage leading to C1 and C2 collectively (Table 4). These variable genes represent micro-evolutionary changes that arose within this highly clonal lineage of Salmonella persisting in the food supply and chicken farm environment; thus they may play a role in the subsequent rapid subtyping of isolates in future food contamination events involving S. Enteritidis pattern JEGX01.0004.

Specific genes associated with implicated farm isolates
Nucleotide substitutions in 17 genes, 11 of which were nonsynonymous were identified at the node uniting isolates from the two egg farms (Table 4). In addition, isolates obtained from Farm 1 shared nonsynonymous changes in two genes SthB and YjjP. Farm 2 S. Enteritidis isolates shared substitutions in nine genes, eight of which were nonsynonymous.

Discussion
Like other molecular epidemiology studies of Salmonella employing genomic technologies [19][20][21][22][23], this work demonstrates that comparative NGS methods can be employed to clearly augment food contamination investigations by genetically linking the implicated sources of contamination with farm and clinical isolates. The genomic evidence herein corroborates epidemiological conclusions from outbreak investigations based on statistical analysis and source tracking leads. However, with NGS, one can gain additional detailed micro-evolutionary knowledge within the associated outbreak and reference isolates; thus providing additional evidence linking implicated farms to some of the clinical isolates but not others initially associated with this foodborne contamination. Moreover, the level of genetic resolution obtained using NGS methods permits a delimiting of the scope of an outbreak in the context of an investigation even for the most genetically homogeneous salmonellae (e.g., S. Enteritidis). In this study, NGS data retrospectively supported the decision to recall a half a billion shell eggs by revealing numerous nucleotide and amino acid changes (SNPs) found in both eggs and from hen houses; the changes were also shared with some food and clinical isolates. It is noteworthy that the comparative NGS results reported here provided additional resolution, with new genomic data, that some clinical isolates collected during the time of the egg contamination event and with the same PFGE Pattern JEGX01.0004 may not be linked to the implicated farm isolates studied. That is, while most of the strains collected during this time period and sharing a common PFGE pattern fall into clades 1 and 2 ( Figure 3) with the egg and farm isolates, several strains known to be unrelated to the outbreak, including historical isolates from 2004, interrupt these lineages, indicating additional potential sources of contamination. Data mining associated with these novel genomes should provide new genetic targets for tool development in public health laboratories and that will augment investigations during highly clonal outbreaks of Salmonella pathogens. Akin to earlier findings of NGS-based differentiation of S. Montevideo isolates associated with pepper and spiced meats [19][20][21], the signature genetic differences uncovered here will provide additional insight into what will likely remain a common pattern of S. Enteritidis associated with the food supply. This bolus of unique genetic identifiers yielded from whole-genome sequencing clearly earmark NGS as a valuable tool for augmenting future molecular epidemiology investigations both for rapidly distinguishing distinct serotypes and PFGE types as well as providing markers that can differentiate highly clonal outbreak lineages into insightful isolate sublineages.
By using a targeted comparative genomic approach that spanned nearly the entire genomic complement of the highly homogeneous S. Enteritidis variants included here (i.e., PFGE pattern JEGX01.0004), a robust genotyping SNP panel was compiled that not only discriminated this S. Enteritidis clone from other closely related strains but also fully resolved member isolates within this cluster. This is an important alternative to other methods that have been examined for surveying genomic diversity among foodborne pathogenic strains. One such approach uses NGS to examine diversity among a pooled isolate set instead of on pure cultures, but as expected, this approach is far less robust. As an example, a recent genotyping panel for 0157 STECs revealed lower diversity among the isolates using the selected NGS-based genotyping panel than a two-enzyme PFGE method [60]. Specifically, the authors reported finding over 16,000 variable SNPs, but by pooling STEC isolates and sequencing at low coverage, critical SNPs defining major lineages and sublineages went undetected in this analysis. This was likely due to the failure of the ''pooling'' approach to link signature SNPs back to a particular source genome. While strain ''pooling'' may be a faster way to collect SNP data, it may not be an optimal method when discriminating a specific lineage of strains or an isolate cluster of interest. In contrast, comparative genomics approaches rely on high-coverage draft genomes coupled with rigorous phylogenetic analyses and character optimization to resolve accurate evolutionary and genetic relatedness among closely related strains. With such information, individual SNPs can be evaluated in an evolutionary context (i.e., whether they define lineages or represent homoplasy due to convergent gains or character reversals). Indeed, a targeted phylogenetic approach produces a robust genotyping panel because the resultant SNPs can be carefully chosen to represent diversity among targeted isolates while omitting uninformative SNPs [19][20][21]. Conversely, ''pooling'' strategies might work better within clonal outbreak lineages where hundreds not thousands of SNPs are present.
Mobile elements, such as phages and plasmids, are often the most promiscuous portions of the bacterial genome including Salmonella [61]. The mobilome, as it is often collectively referred, appears to be regularly rearranging among closely related clonal lineages of Salmonella [19,21]. As expected, S. Enteritidis shows a similar susceptibility to loss and gain of these elements [62], as do other members of the Enterobactericeae. In addition to seeing variability among these elements, several new plasmids were discovered, suggesting that additional mobile elements were previously undescribed across the Salmonella genome. Recent examples of new phages and plasmids are being published regularly [63][64]. It is becoming apparent that a renewed effort to describe and identify the complete mobilomes of newly sequenced isolates should be undertaken, especially for pathogenic strains that persist and emanate from the environment. From these data, it would appear that mobility of these elements is not restricted to close members. At least one of the newly discovered Salmonella plasmids (pSEEE1729_15) had its closest BLAST match to an E. coli 0157:H7 strain EC4115 [26], suggesting that parts of the mobilome may be transferred from other related enterobacterial species. Moreover, observations of this nature clearly broaden the possibility of new acquisitions into the S. Enteritidis pan genome [62].
Natural selection has been reported in other Salmonella isolates and appears to be a major component of the evolution of this pathogen [18,22]. Some of the genes that vary are found on the mobilome, such as the putative phage terminase gene, supporting the notion that there are actively evolving genes on some mobile elements. This strategy for evolution could provide a scenario whereby highly selected genes could be shaped by natural selection and then easily distributed among the various members of a serotype and other more distant lineages through mobile genetic elements.
Some investigators are beginning to search for genetic determinants for survival and virulence of S. Enteritidis in chickens, mice, and cell culture models. Through observing which genes varied in environmental farm and clinical isolates, such insight was sought in the hopes of identifying potential contributing factors to outbreaks. One study linked SNP variability in a stress response gene (rpoS) to isolates able to infect poultry [8]. We observed nonsynonymous variability in a gene (phoP) that has been demonstrated to be a regulator of rpoS [65,66] and that gene varied uniquely in the lineage defining Clades 1 and 2 ( Table 4). The phoP gene also is thought to be important to S. Enteritidis virulence based on evidence from a mouse model [67]. This change was observed in the SNPs listed in Table 4, which are a conservative subset of variable SNPs and genes, although these SNPs were chosen for potential diagnostic utility and not for a full description of comparative genomics purposes within these isolates.
Another recent hypothesis for the genes involved in salmonellosis, focuses on the ABC transporter genes and the ability of pathogens to acquire nutrients for survival during host infection [68,69]. Our study shows variability in an ABC transporter for methionine specific for clades 1 and 2 ( Table 4). The S. Enteritidis model that Osborne et al. [68] tested for in vivo with an ABC transporter of alanine is similar to the natural variability for a similar gene in the implicated farm and associated clinical isolates. If this model, affirmed in cell culture studies, holds in chickens, then infections in chickens and eggs in 2010 may be related to the ability of S. Enteritidis to survive in a poultry host due to the enhanced access to methionine. The ABC transporters have been hypothesized to be an important new acquisition for all of subspecies I Salmonella enterica [15]. Perhaps the ABC transporter gene gave Salmonella subspecies I an overall enhanced ability to survive in a warm blooded vertebrate host, and later mutations of the gene allow some serotypes to have special affinity for one host over another. It is common to see serotype specific Salmonella that are more common to one host, such as S. Kentucky in cattle and S. Enteritidis in poultry and eggs. Another nonsynonymous gene change observed is in the threonine/serine transporter tdcC gene ( Table 4), demonstrating that several transporter genes are evolving within these critical isolates.
Salmonella's ability to gain access to another valuable resource such as metals, like Fe, Mn, and Zn, may help give this foodborne pathogen a competitive edge in the vertebrate gut [70]. Variability in genes related to metal acquisition may help Salmonella bypass a process called nutritional immunity. We see another nonsynonymous change unique to the outbreak-associated isolates in a ferrochelatase gene (hemH), lending support to this hypothesis. Another hypothesis, argues that diversification within the Salmonella fimbriae gene clusters has been implicated as a source for virulence [71] through possible host specific intestinal adhesion mechanisms. At least three genes from gene complexes (bcfC, safD, and stbE) show unique amino acid changes that may define S. Enteritidis (Table 3) and one fibrial gene (fimD) shows a unique amino acid change leading to clades 1 and 2 ( Table 4).
The nonsynonymous changes that we see among genes that vary for clades 1 and 2 suggest that there may not be a single cause for increased risk of infection and outbreak stemming from chickens and shell eggs. Rather a combination of several of these genetic factors that raise the risks for Salmonella invasion may be causing contaminations in the food supply today. The fact that 5 of the 21 nonsynonymous changes varying among the outbreak isolates (Table 3) are putatively involved in virulence-based pathways strongly suggest that there may be multiple and potentially synergistic causes to the expanding rate of S. Enteritidis populations. This also suggests that the other genes ( Table 3 and 4) that vary in S. Enteritidis should be carefully examined and experimentally tested, as more of these are likely to be associated with an increase in virulence and infection [67,69,71].
Based on both PCR and sequencing evidence, numerous studies have found little genetic variation within S. Enteritidis [6][7][8][9]. Our genomic diversity estimates for the S. Enteritidis PFGE Pattern JEGX01.0004 examined in this study are consistent with other diversity comparisons described between two S. Enteritidis isolates of phage type 13 [7]. This variation was observed both as SNP variation among 366 genes as well as the presence and absence of numerous phages and plasmids among these close relatives. This genetic variability was used to define the most variable genes and to assess population and phylogenetic evolutionary patterns for these important foodborne pathogens. In this study, our comparative genomics approach allowed us to cluster clinical isolates within the context of their environmental source, farm isolates, many of which were associated with a large national shell egg recall. Numerous genetic changes clearly link some clinical and environmental isolates to the farms that were implicated in the recall of over a half a billion eggs. One known plasmid in S. Enteritidis was completely sequenced, and three plasmids were reported. Several of the genes that varied with nonsynonymous changes had previously been associated with virulence pathways in prior in vitro experiments.