Comparative genomics identifies distinct lineages of S. Enteritidis from Queensland, Australia

Salmonella enterica is a major cause of gastroenteritis and foodborne illness in Australia where notification rates in the state of Queensland are the highest in the country. S. Enteritidis is among the five most common serotypes reported in Queensland and it is a priority for epidemiological surveillance due to concerns regarding its emergence in Australia. Using whole genome sequencing, we have analysed the genomic epidemiology of 217 S. Enteritidis isolates from Queensland, and observed that they fall into three distinct clades, which we have differentiated as Clades A, B and C. Phage types and MLST sequence types differed between the clades and comparative genomic analysis has shown that each has a unique profile of prophage and genomic islands. Several of the phage regions present in the S. Enteritidis reference strain P125109 were absent in Clades A and C, and these clades also had difference in the presence of pathogenicity islands, containing complete SPI-6 and SPI-19 regions, while P125109 does not. Antimicrobial resistance markers were found in 39 isolates, all but one of which belonged to Clade B. Phylogenetic analysis of the Queensland isolates in the context of 170 international strains showed that Queensland Clade B isolates group together with the previously identified global clade, while the other two clades are distinct and appear largely restricted to Australia. Locally sourced environmental isolates included in this analysis all belonged to Clades A and C, which is consistent with the theory that these clades are a source of locally acquired infection, while Clade B isolates are mostly travel related.


Introduction
Salmonella infection is a leading cause of gastroenteritis, with non-typhoidal Salmonella enterica estimated to cause approximately 94 million cases globally per year [1]. In 2015, there were 17 012 notifications of Salmonellosis in Australia, 4811 of which were from the state of Queensland (QLD), an increase of 67% on the 2887 QLD notifications received in 2010, and the highest level of Salmonella nofitications in Australia [2,3]. More than 2500 serotypes of S. enterica have been identified, however two of these serotypes, S. Enteritidis and S. Typhimurium account for approximately 60% of cases globally [4]. In Australia, S. Enteritidis is among the five most a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 common Salmonella serotypes reported, [5,6], and although rates of notification are not as high as those for S. Typhimurium, concerns regarding its emergence in Australia make it a priority for epidemiological surveillance. The majority of S. Enteritidis cases in Australia are associated with travel, and nationally only approximately 10% are thought to be locally acquired [6]. However, in QLD the number of locally acquired infections is the highest in Australia, and the relative proportion of locally acquired infections compared to overseas acquired infections is greater than the national average and that of other states at 20% [6]. Prior to this study, molecular surveillance has not been performed on S. Enteritidis cases isolated in Queensland and local sources of infection are unclear.
S. Enteritidis is reportedly one of the most genetically homogeneous serotypes of Salmonella [7], and while current methods of typing such as Phage typing and MLST are able to classify S. Enteritidis isolates into broad groups, they lack the high level of resolution suitable for delineating epidemiologically linked clusters. Whole genome sequencing (WGS) has been successfully used as a tool in epidemiological surveillance and outbreak investigations and it has the ability to provide a very high level of resolution, comparing differences across the entire genome. In order to take advantage of the improved resolution that WGS provides, the Public Health Microbiology laboratory at the QLD Department of Health routinely sequences all contemporary S. Enteritidis isolates received by the laboratory. WGS analysis has identified the existence of 3 distinct genomic clades among the QLD S. Enteritidis isolates sequenced and we report on comparative genomics to determine how these clades fit within the global S. Enteritidis population.

Sequencing of bacterial strains
Salmonella enterica serovar Enteritidis strains included in this study were isolated from clinical (n = 206) or environmental samples (n = 11) as outlined in S1 Table. DNA was extracted from isolates grown overnight at 37˚C on horse blood agar, using the QiaSymphony DSP DNA Mini kit (Qiagen) according to the manufacturer's protocol. DNA was prepared for sequencing using the Nextera XT kit (Illumina) and sequenced on the NextSeq500 using the NextSeq 500 Mid Output v2 kit (300 cycles) (Illumina) according to the manufacturer's instructions. Raw sequence files and associated metadata have been submitted to the European Nucleotide Archive with project accession number PRJEB22598.

Genomic epidemiology
In order to elucidate the genetic relationship between S. Enteritidis isolates in the QLD community and how patient isolates relate to those from animals and foods, sequences from 206 clinical and 11 environmental isolates of S. Enteritidis (S1 Table), as well as the publicly available reads from one serovar Berta isolate, one S. Typhimurium isolate and one S. Gallinarum isolate(Genbank accession numbers SRR1060548, NC_003197 and AM933173 respectively), were mapped to the S. Enteritidis reference genome P125109 (Genbank accession number NC_011294). Regions identified by BratNextGen [16] as being involved in recombination were excluded from the analysis, and high quality core SNPs were identified using Snippy. In total 43 431 variable sites were identified and an alignment of these SNPs was generated and used to construct a maximum likelihood phylogeny using RAxML with the serovar Berta isolate as an outgroup. The phylogeny showed that the S. Enteritidis isolates grouped into 3 distinct clades, which we have named Clades A, B and C (Fig 1). Clade A is approximately 12000 SNPs distant from Clade B and 16000 SNPs distant from Clade C, while the distance from Clade B to Clade C is approximately 17000 SNPS. Within each clade there is a relatively low level of diversity with fewer than 500 SNPs found between isolates from the same clade. Clade B had the highest number of isolates with 121 isolates belonging to this clade, while Clade A had 78 and Clade C was the smallest with 18 isolates. Among the isolates tested, 11 were from environmental sources such as food and poultry farms, as detailed in S1 Table. Eight of these clustered with Clade A, and three with Clade C, while none of the environmental isolates belonged to Clade B. There was no association with the environmental source of the isolates and whether it fell into Clade A or C, with isolates from both of these clades being found in chickens and eggs, and isolates from both clades being found in food. Two isolates were isolated from domestic animals, both of which belonged to Clade A (S1 Table).
Isolates from the three clades had distinct phage types (PT) and MLST types (ST). For Clade A isolates, the highest number, 56 (72%) were PT26, 17 (22%) were untypable and the remaining 5 isolates (6%) were made up of PT23, PT35, and reactive but did not conform (RDNC). Clade B isolates showed more variation in phage type with 41 (34%) PT1, 35 (29%) PT35, 19 (16%) RDNC, 18 (15%) PT21b var, 9 (7%) PT6a, 5 (4%) PT21b, with 10 (8%) being untypable or with no phage type information available and the remaining 9 isolates (7%) were made up of PT1b, PT1d, PT6, PT47, PT14b, PT21a, and PT4a. Clade C isolates showed the least phage type variation with 12 (67%) PT14 var and 6 (33%) RDNC (Fig 1). The phage types seen in the QLD isolates are different from those commonly observed in other countries where PT4, PT8 and PT13a are most common [5]. In particular certain phage types are generally found to be predominant in different regions, with PT4 most common in Europe, PT8, PT13 and PT13a most common in North America, and PT1, PT6a and PT21 most common in Asia [17][18][19][20][21]. In the QLD isolates, different phage types were associated with the different clades. Clade B contained high numbers of PT1 and PT6a, PTs commonly seen in Asia, with the next most common PTs in Clade B being PT21b and PT21b var. These are relatively new phage types, with PT21b first described in 1995 in the UK [22]. The presence of globally common PTs in Clade B isolates is consistent with the observation that the majority of these cases were associated with a reported overseas travel history and that the highest number of overseas acquired S. Enteritidis infections in Australians is associated with travel to Asia [6]. Clade A largely consists of PT26, a PT not commonly seen in other countries but relatively common among Australian S. Enteritidis strains and associated with locally acquired infection [6]. Clade C also consists of an unusual PT, PT14 var as well as a relatively high number of isolates that were reactive but did not conform to known phage types.
The STs seen in the QLD isolates were also associated with the different clades. Clade A isolates were either ST180 (24%) or ST3304 (76%), Clade B were predominately ST11 (86%), but 14% were ST1925 which is a single locus variant of ST11. Clade C isolates were all ST1972 (Fig 1) Among S. Enteritidis strains worldwide, ST11 is the most common ST, accounting for 89% of the 17867 S. Enteritids entries in the EnteroBase database (http://enterobase.warwick.ac.uk/ -accessed 07/09/2017), and for 95% of S. Enteritidis isolated in England and Wales between April 2014 and March 2015 [23]. The single locus variant of ST11, ST1925 had 84 entries in EnteroBase, while ST180, ST3304 and ST1972 have been previously been recorded for a relatively low number of S. Enteritidis strains with 19, 2, and 11 entries respectively in the EnteroBase database.
In order to determine how the clades identified in Australian S. Enteritidis isolates related to international S. Enteritidis strains, the publicly available short read sequence files of 170 S. Enteritidis isolates from a range of geographic locations (S2 Table) were downloaded from the European nucleotide archive (ENA) and SNP analysis was performed as described above together with the QLD isolates. A maximum likelihood phylogeny was generated using RaXML with the serovar Berta as an outgroup. As has been described in previous studies [24,25], we noted the presence of a 'global epidemic clade' that includes the reference strain P125109. This clade includes 124 isolates from Asia, Europe and the Americas as well as the 121 Australian Clade B isolates. Deng et al., further subdivided this global clade into 5 lineages, with Lineage 3 containing the reference strain P125109 and clinical strains from Thailand and the USA, and Lineage 5 containing clinical and environmental strains from the USA [26]. The clade B strains included in the current study were also observed to divide into these lineages, with the majority of QLD Clade B isolates (n = 117) clustering together with Lineage 3 isolates, into what we have referred to as the P125109-like lineage because it contains the reference strain. The majority of the 63 international strains in this lineage are from Asia, with 33 of the 63 Asian strains included in this analysis clustering into this lineage (S1 Fig), while the remaining 17 strains were from other global locations. Three of the remaining four QLD Clade B isolates clustered into the lineage corresponding with Lineage 5 in Deng et al. This lineage contains the LK5 strain and so we have referred to it as LK5-like. This lineage contained strains from a more diverse range of global locations, with many from the USA, Canada and Europe and eight Asian strains. We also observed the presence of a third, smaller subclade consisting entirely of thirteen Asian isolates and one QLD Clade B isolate.
With the exception of a small number of international strains, Clades A and C were almost exclusively comprised of Australian isolates. Clade A also contained two isolates from Germany, one from the UK, one from Vietnam, one from the USA, one from Vanuatu and one from New Zealand (S2 Table). These were all ST 180, one of the STs associated with the QLD Clade A isolates, and the New Zealand strain had previously been described as being a divergent strain of S. Enteritidis by Deng et al. [26]. In addition to the Australian isolates, Clade C also contained two isolates from the UK and two from the USA (S1 Fig), all of which were also ST1972. The novel African lineages identified by Feasy et al., [24] showed a correlation between clade and invasive disease. Unfortunately due to the limited number of invasive S. Enteritidis isolated in the time period covered by this study (n = 6) no correlation between invasive disease and clade could be observed for the QLD isolates (S1 Table). Comparative genomics S. Enteritidis from Queensland, Australia

Comparative genomic analysis
In order to identify similarities and differences between the three clades identified in QLD S. Enteritidis, the genomes of isolates from each of the clades were compared to each other and to that of P125109. De novo assembled contigs from each isolate were ordered against the P125109 sequence using Mauve and then concatenated. These were then aligned to the P125109 genome using the Mauve plugin in Geneious R10 and regions of difference (RODs) were identified. Specific regions of difference such as prophages, pathogenicity islands and gene sets that have been previously described in studies of the S. Enteritidis genome [17,25,27,28] were also investigated using Ridom SeqSphere+. Overall, the genomes of QLD clade B isolates were highly similar to P125109, containing all of the gene sets highlighted as RODs by Thomson et al [28], as well as all previously identified pathogenicity islands and with a very similar prophage profile. Clades A and C however, had a number of differences in their genomes when compared to Clade B and P125109 as described below and summarised in Fig 1. Regions of difference. Thomson et al., identified 17 gene sets that differed between S. Enteritidis, S. Gallinarum and S. Typhimurium and labelled these as RODs [28]. These RODs contain genes involved in a variety of functions including metabolism, membrane transport and potential virulence factors [28,29]. Of the 17 RODs, five were absent in Clade A isolates, ROD4, ROD17, ROD25, ROD34, ROD37, while in Clade C isolates, with the exception of ROD25, the same regions were missing as well as ROD13 and ROD28 (Fig 1). A few Clade C isolates also lacked ROD21 but in all the other Clade C isolates there was a different ROD21 which had a maximum identity of 96% and coverage of 43% with ROD21 in Clade A and B isolates. ROD21 in P125109 is a 26.5 kb genomic island which contains genes involved in virulence, and for the global transcriptional silencer H-NS and its antagonist [27]. The ROD21 in Clade C isolates is a 21.3 kb genomic island which is located just downstream of the integration site for P125109 ROD21 and integrates into the tRNA-Asn present in P125109 at SEN_t035. The Clade C ROD21 has an integrase that shares 97% amino acid identity with that from P125109 ROD21 and shares a nucleotide identity above 98% with 8 of the 25 ROD21 genes in P125109 (SEN_RS10280, SEN_RS10310, SEN_RS10315, SEN_RS10335, SEN_RS10400, SEN_RS10405, SEN_RS10410, SEN_RS10415.) ROD21 was found to be absent in S. Enteritidis belonging to the East and West African clades reported in [24]. ROD9, which contains genes involved in T6SS is present in a degenerate form in P125109 and in Clade B isolates, but was found to be complete in Clade A and Clade C isolates and is highly similar to the same region in S. Gallinarum 297/91 where it includes SPI-19 [30]. ROD42 was another region that was absent in P125109 and Clade B, but present in Clade A and Clade C isolates. ROD42 is present in S. Typhimurium LT2 and S. Gallinarum 297/91 where it encodes for C4-dicarboxylate transporters [28]. Interestingly, with the exception of ROD42, all of the RODs that were absent in either Clade A or Clade C isolates were also either degenerate or completely absent in S. Typhimurium LT2 and are thought to have either been lost by this serovar when it diverged from the common ancestor it shares with S. Enteritidis, or to have been acquired independently by S. Enteritidis following divergence [28]. The international isolates that clustered into QLD Clades A and C shared the same ROD profiles as the QLD strains from the same clade, with the exception of ROD21, which was absent from 5 of the 7 international Clade A isolates, but present in the two isolates from Germany.
Phage content. Prophages are important drivers of diversity in S. enterica, and there is a high degree of diversity in the prophage content of different serovars of Salmonella, and even within serovars. For example, while S. Enteritidis P125109 and S. Enteritidis LK5 both contain five prophage regions, the actual prophages present in these two strains differ [28,31].
Likewise, the three clades observed in QLD isolates also demonstrated diversity in their prophage content. Clade B isolates possessed 2 different phage profiles, with some having the same prophage profile as P125109 and some having the same profile as LK5. This difference correlated with the clustering of Clade B isolates with the P125109 lineage or the LK5 lineage in the phylogenetic analysis in S1 Fig. Those in the P125109 lineage contained φSE10, φSE12, φSE12A and φSE14 and φSE20, while those in the LK5 lineage lacked φSE20 but contained the RE-2010 sequence described by Zheng et al., located in the same region as in LK5 [25].
The prophage profiles of Isolates belonging to Clade A and Clade C differed from that of the Clade B isolates. In both Clade A and Clade C isolates the integration site for φSE10 was empty. Prophage φSE10 contains the virulence-contributing genes sseI, gtgE, and gtgF and is also absent in serovar Gallinarum [17,28,31]. The integration site for φSE14 was also empty for isolates from both Clade A and Clade C. When the de novo assembled contigs for each isolate were submitted to SeqSero for molecular serotyping, a region that is commonly used as a marker for S. Enteritidis, sdfI [26,32] was found to be absent in isolates from Clade A and Clade C, meaning that they were not identified as S. Enteritidis [15]. Investigation of the location of this region indicated that it is found on φSE14. It has previously been reported that φSE14 is unstable and can be spontaneously excised from the chromosome [33], so it is possible that it has been lost from Clade A and Clade C strains. Alternatively, it may have been later acquired by Clade B strains as Porwollik et al., found that 2 out of 8 strains isolated prior to 1950 were lacking φSE14 [17]. Virulence studies found that the absence of this prophage had no effect on the ability of S. Enteritidis to colonise a mouse model compared to strains that carry φSE14 [33].
In Clade C isolates φSE12 had been replaced by a Gifsy-2 like prophage about 35.3 kb in size located at the same site. The integrase is the same as in φSE12 and there is a high level of identity for five sections of the φSE12 sequence with insertions in between. The five sections are in the same order in both φSE12 and the Gifsy-2. Therefore it appears that the φSE12 sequence in P125109 and other Clade B genotypes is a degenerate form of the Gifsy-2 in Clade C isolates. The φSE12A just downstream of the Gifsy-2 prophage has high identity to the same phage remnant in P125109 except for a missing 1.8 kb sequence corresponding to genes 9-11 in the P125109 prophage. The same sequence is also missing in the corresponding sequence in S.Typhimurium LT2 (GenBank NC_003197 nt 1962481-1967919) so that Clade C isolates have virtually the same sequence as LT2. One of the Gifsy-2 genes present in Clades B and C, SEN1140, is reported to be important in early colonisation and inflammation [29].
The situation is somewhat different in Clade A isolates. All Clade A isolates lack all or most of the φSE12 prophage. Some are missing the entire sequence including the integrase, others have a Gifsy-1 prophage of about 51 kb with almost the same integrase as φSE12 located in the φSE12 site and contain a number of φSE12 genes, including the virulence factors sopE and sodC [28,31]. Some Clade A isolates have a different Gifsy-1 prophage in the same location as Gifsy-3 in S.Typhimurium 14028S (GenBank NC_016856 nt 1284474-1284687). In these strains the φSE12 site is occupied by the integrase and subsequent excisionase from φSE12 followed by the next three genes from the Gifsy-1 found in Clade A strains that carry the Gifsy-1 suggesting an earlier loss of the Gifsy-1 prophage. In all three types of Clade A isolates there is a full φSE12A sequence just downstream of the φSE12 site with high identity to the sequence in P125109.
Isolates from both Clade A and Clade C lack φSE20, which is a 41 kb phage related to φST64B from S. Typhimurium [27,28]. φSE20 is intact in the P125109-like Clade B isolates and in P125109 where it is thought to be a recent acquisition and has been implicated in S. Enteritidis virulence in mice and the invasion of chicken ova [34]. Betancor et al., have suggested that the presence of φSE20 in S. Enteritidis is associated with the emergence of particular isolates as epidemic strains in Uruguay as they have found that 3 out of 6 pre-epidemic isolates lacked φSE20 while only 5 out of 108 epidemic and post-epidemic isolates lacked φSE20 [27]. Similarly, φSE20 was absent in isolates from the 1940s and 1950s tested by Porwollik et al., but was present in the more recent isolates [17]. As mentioned above, Clade B LK5-like isolates do not contain φSE20 but do possess a Fels2-like prophage called RE-2010.
When compared to the P125109 genome, Clade C isolates also appear to have a~42 kb region inserted downstream of SPI-9, between SEN2612 and SEN2613. This sequence is similar to the RE-2010/Fels-2 prophage located in the same region in the LK5 strain and in S. Typhimurium LT2 (NC 003197, STM2694), although the prophage does not have all of the Fels2 genes present. Clade A isolates have a~12 kb region inserted in the same location, also showing similarity to the RE-2010/Fels-2 prophage, however many of the genes are absent.
The international isolates that clustered into QLD Clades A and C shared the same phage profiles as the QLD strains from the same clade.
SPI-6 encodes a T6SS, contains a fimbrial gene cluster and the invasin pagN [24,42]. In P125109 SPI-6 is degenerate with several of the Type VI secretion system genes that are present in S. Typhimurium LT2 missing. As with P125109, clade B isolates also have a degenerate SPI-6, however SPI-6 in clade A and clade C isolates is complete and similar to that for S. Typhimurium LT2, although with some differences in genes 25-30 of the island in Clade A and genes 27-30 in Clade C.
SPI-17 is a SPI with an undefined function. It is a prophage remnant and contains genes with high homology to P22 phage conversion genes, and contains genes know to be involved in O-antigen conversion in other bacteria [28,42]. This SPI was found to be present in all Clade B and Clade C isolates, but is absent from isolates belonging to Clade A.
SPI-19 contains genes that encode for a Type VI secretion system. This SPI is largely absent from P125109 which has a 24 kb internal deletion in SPI-19, corresponding with ROD9 described above, with only 16 of the 30 SPI-19 ORFs present [28,30]. SPI-19 is also degenerate in Clade B isolates, but like ROD9 is present and complete in the Clade A and C isolates.
Antimicrobial resistance genes. De novo assembled contigs were screened for the presence of acquired antimicrobial resistance (AMR) genes using Abricate and defined chromosomal point mutations using ResFinder. As has been previously reported for S. Enteritidis, all isolates contained the cryptic aminoglycoside acetyltransferase gene aac(6')-Iy [24,43]. Additional acquired AMR genes were detected in 39 (18%) isolates, all of which belonged to Clade B, with the exception of one Clade A isolate that had bla TEM-1B present (Fig 2) One isolate contained a marker for quinolone resistance qnrS1, which has also been seen in Malaysian S. Enteritidis isolates [44]. The tetracycline resistance marker tet(A) was the most common antimicrobial resistance gene present, found in 24 isolates, and 21 isolates had a bla TEM gene, a gene commonly associated with resistance to extended spectrum β-lactamases in Salmonella [45,46]. One isolate had a marker for trimethoprim resistance, dfrA1. This isolate had multiple resistance markers, also carrying sul1, bla TEM and tet(A) genes. Six isolates had the AMR genes strA and strB as well as sul2, bla TEM and tet(A). Point mutations in the gyrA gene, associated with reduced susceptibility to quinolones were found in 38 isolates, all of which belonged to Clade B. Nine isolates had more than one acquired resistance marker, and seven isolates had four or more acquired resistance genes (Fig 2), a finding that is consistent with the incidence of multi-drug resistant S. enterica reported in other studies [24,[45][46][47][48]. This incidence of AMR genes in S. Enteritidis is similar to that seen in the global epidemic clade described by Feasy et al., [24], however, the almost complete lack of AMR genes in the Clade A and Clade C isolates is somewhat unusual.
Plasmid content. Plasmids are also known to play an important role in Salmonella pathogenicity. De novo assembled contigs were searched for the presence of plasmid sequences using PlasmidFinder, and results were confirmed by BLAST [49]. Of the plasmid sequences found, nearly all were found in Clade B isolates. Out of 121 Clade B isolates, 109 had an IncF1B plasmid almost identical to the plasmid from S. Enteritidis str. CDC_2010K_0968 (Genbank accession number CP007529). This is a truncated relative of the pSLT plasmid in S. Typhimurium str. LT2 (Genbank accession number NC_003277). None of these plasmids contained genes associated with antimicrobial resistance. Forty-six Clade B isolates contained an IncX1 plasmid and nearly all of the acquired antimicrobial resistance genes found in Clade B isolates were associated with these Inc X1 plasmids. There were six related but different IncX1 plasmids found with five different antimicrobial resistance gene profiles. The IncX1 plasmid found in 19 isolates had only tetracycline resistance genes. There were two variants of an IncX1 plasmid in seven isolates with resistance genes bla-TEM , streptomycin A and B, sulphonamide 2 and tetracycline A and R. Another nine isolates had an IncX1 with bla-TEM only, and there were three other IncX1 plasmids in singleton isolates, one with bla-TEM only, one with bla-TEM , trimethoprim, sulphonamide 1 and tetracycline genes and one with bla-TEM and tetracycline genes. Eight other isolates had IncX1 plasmids with no antimicrobial resistance genes. Two Clade B isolates had related but different IncI1 plasmids.
Among Clade A isolates there were two isolates with the same IncF1 plasmid, which is only distantly related to pSLT and another two isolates had two related but different IncI1 plasmids, one in combination with a colpVC-related plasmid. None of the Clade C isolates had any plasmid or antibiotic resistance genes.

Conclusions
Many studies of S. Enteritidis have commented on the homogeneity of this serovar [7,26,50,51], however it has recently been demonstrated that there are diverse lineages of S. Enteritidis that exist globally [24,41]. We have identified the existence of two additional lineages of S. Enteritidis circulating in the QLD population in Australia. These appear to be largely restricted to Australia with strains belonging to these novel clades demonstrating PTs and STs that are uncommon among reported global S. Enteritidis strains, including one ST, ST3304 which has only been reported in Australian strains. Previous studies have compared the genome of S. Enteritidis with that of different serovars and different phage types and identified differences in prophage content, pathogenicity islands and the presence or absence of genes [17,25,27,28]. By conducting comparative genomic analysis of the QLD isolates compared to S. Enteritidis P125109 we have shown that each clade has a unique collection of prophage and genomic islands. Feasy et al., proposed that the differences seen in the novel East and West African lineages could indicate different ecological niches outside the human host [24]. This could also be true of the novel Australian lineages, and further investigation of environmental niches of these clades is warranted. Comparative genomics S. Enteritidis from Queensland, Australia When the QLD isolates were analysed in the context of international strains, it was clear that Clades A and C were highly diverged from the majority of the international strains tested, and that these Clades were comprised mostly of Australian isolates. Strains belonging to Clade A and Clade C do not appear to be exclusive to Australia however, having been isolated and sequenced in other countries (S2 Table). Of the four divergent S. Enteritidis described by Deng et al., in their study, one from New Zealand (77-0915) clustered with the QLD Clade A strains and another, SARB19 from Switzerland clustered close to Clade C [26]. Despite the existence of strains similar to Clade A and Clade C isolates outside of Australia, it appears that strains belonging to these clades are more prevalent in Australia than they are in other countries and make up a higher proportion of S. Enteritidis isolated from cases of Salmonellosis than is seen elsewhere in the world. The number of environmental isolates in this study is small and so the conclusions that can be drawn on environmental sources of QLD S. Enteritidis are limited. However, the fact that environmental isolates sourced from QLD were from Clades A and C aligns with the theory that these clades are a source of locally acquired infection, while Clade B isolates are mostly overseas acquired, although we cannot exclude the possibility that Clade B isolates are present in the QLD environment. The isolation of S. Enteritidis in swabs from a chicken farm and in egg pulp is not surprising given the knowledge that poultry and poultry products are a source of S. Enteritidis, however, it appears that the strains of S. Enteritidis associated with chicken in QLD are substantially different from those found in flocks from other geographic locations. In particular the lack of φSE20, proposed to be important for virulence and invasion of the oviduct in chickens [34], in Clade A and C isolates may explain in part the observation that S. Enteritidis has not caused the same problems in the Australian poultry industry as has been seen elsewhere in the world. The isolation of two Clade A isolates from domestic animals, may also indicate pets as a potential reservoir for infection. The low presence of AMR genes in the Clade A and Clade C isolates is notable. Australia has strong regulations in place regarding the use of antimicrobials in agriculture and animal processing and Salmonella enterica multi drug resistance rates from Australian livestock have been reported as low [52]. It is possible that this absence of resistance markers in locally acquired Clade A and C cases may reflect the absence of significant antimicrobial pressure in as yet unknown local environmental sources.
The bulk of QLD Clade B isolates were found to group into a cluster corresponding with Lineage 3 described by Deng et al., [26]. In our analysis, this group consists primarily of Asian strains, and also contains the majority of Asian strains included in this study. The remaining Clade B isolates either clustered into a group corresponding with Lineage 5 described by Deng et al., or into a smaller clade consisting entirely of Asian strains. As mentioned above, the PT and ST profiles seen in the QLD Clade B isolates is consistent with them being related to travel, and the clustering of these strains primarily with strains from Asia is consistent with the popularity of travel to Asia among Australians.
Several studies have speculated on the evolution of S. Enteritidis and similar serovars [25,28,31,41]. Thomson et al., proposed that S. Enteritidis PT4 gained φSE12 after diverging from the common ancestor it shares with S. Typhimurium. Following this, S. Enteritidis then diverged from S. Gallinarum and gained φSE14 and φSE20. QLD Clade A and Clade C isolates lack φSE10, φSE14 and φSE20 so it is tempting to speculate that these clades are older than Clade B and that Clade B diverged from these clades before gaining φSE10, φSE14 and φSE20. Clades A and C also possess complete forms of SPI-6 and SPI-17, while in Clade B these genomic islands have degenerated, so it is possible that these genomic islands have degenerated in Clade B following divergence. Other studies have also shown that older S. Enteritidis isolates lack φSE14 and φSE20 [17,27]. Therefore it is possible that Clades A and C represent an older population of S. Enteritidis, that has not undergone the same changes seen in Clade B and the global epidemic strains of S. Enteritidis. Further study of the evolution of these clades is required before conclusions can be drawn, however such studies would prove beneficial in gaining a greater understanding of the S. Enteritidis population in QLD.
Supporting information S1 Table.