Tracing the environmental footprint of the Burkholderia pseudomallei lipopolysaccharide genotypes in the tropical “Top End” of the Northern Territory, Australia

The Tier 1 select agent Burkholderia pseudomallei is an environmental bacterium that causes melioidosis, a high mortality disease. Variably present genetic markers used to elucidate strain origin, relatedness and virulence in B. pseudomallei include the Burkholderia intracellular motility factor A (bimA) and filamentous hemagglutinin 3 (fhaB3) gene variants. Three lipopolysaccharide (LPS) O-antigen types in B. pseudomallei have been described, which vary in proportion between Australian and Asian isolates. However, it remains unknown if these LPS types can be used as genetic markers for geospatial analysis within a contiguous melioidosis-endemic region. Using a combination of whole-genome sequencing (WGS), statistical analysis and geographical mapping, we examined if the LPS types can be used as geographical markers in the Northern Territory, Australia. The clinical isolates revealed that LPS A prevalence was highest in the Darwin and surrounds (n = 660; 96% being LPS A and 4% LPS B) and LPS B in the Katherine and Katherine remote and East Arnhem regions (n = 79; 60% being LPS A and 40% LPS B). Bivariate logistics regression of 999 clinical B. pseudomallei isolates revealed that the odds of getting a clinical isolate with LPS B was highest in East Arnhem in comparison to Darwin and surrounds (OR 19.5, 95% CI 9.1–42.0; p<0.001). This geospatial correlation was subsequently confirmed by geographically mapping the LPS type from 340 environmental Top End strains. We also found that in the Top End, the minority bimA genotype bimABm has a similar remote region geographical footprint to that of LPS B. In addition, correlation of LPS type with multi-locus sequence typing (MLST) was strong, and where multiple LPS types were identified within a single sequence type, WGS confirmed homoplasy of the MLST loci. The clinical, sero-diagnostic and vaccine implications of geographically-based B. pseudomallei LPS types, and their relationships to regional and global dispersal of melioidosis, require global collaborations with further analysis of larger clinically and geospatially-linked datasets.

Introduction Burkholderia pseudomallei is the causative agent of melioidosis, an often fatal disease that is endemic in tropical regions globally, including the "Top End" of the Northern Territory, Australia [1]. Clinical presentations of melioidosis are highly varied [2][3][4][5] with pneumonia being the most common presentation [6]. Melioidosis mortality rates range from <10% in the NT to >40% in regions of Asia and Southeast Asia where prompt diagnosis and accessibility to drugs and hospital care can be limited [2,6,7]. Global melioidosis mortality has been predicted to be considerably higher than mortality from dengue and leptospirosis combined and similar to that from measles [8]. In addition, the bio-threat status of B. pseudomallei and the lack of a current vaccine makes this organism of high importance to public health in many regions globally.
Variable genetic markers in B. pseudomallei with known geographical associations include the mutually exclusive Burkholderia intracellular motility factor A (BimA) variants bimA Bm and bimA Bp , the filamentous hemagglutinin 3 (fhaB3) gene and the Burkholderia thailandensis-like flagellum and chemotaxis (BTFC/YLF) gene clusters [9,10]. These genetic markers have been used to elucidate strain relatedness, origin and virulence potential, with bimA Bm being strongly linked to neurological melioidosis in Australian studies [9]. The lipopolysaccharide (LPS) of B. pseudomallei, a type II O-polysaccharide [11], consists of A, B and B2 variants [12]. As with bimA Bm /bimA Bp , YLF/BTFC, and fhaB3, the distribution of LPS genotypes varies between Australia and Southeast Asia [9,12] and it is unknown what LPS types predominant in other endemic regions. In Thailand and Australia, LPS type A is the most abundant, followed by LPS B, whereas LPS B2 has not been detected in Thailand but has been detected in Australia [12]. Despite regional differences in LPS genotype prevalence, the distribution of these types across different geographical regions within a single melioidosis-endemic region has not been thoroughly investigated. Elucidating the geospatial patterns of LPS genotypes is potentially important for B. pseudomallei virulence studies, the development of LPS-based sero-diagnostics, and for consideration of vaccines that have an LPS component.
In this study, we used a combination of WGS, bivariate logistic regression (with the cluster feature) and geographical mapping to examine the geographical distribution of LPS types in the Top End region. We first investigated LPS geographic correlations using a large number of clinical isolates (n = 1005) from our 28-year Darwin Prospective Melioidosis study (DPMS). We then investigated LPS type distributions for 340 environmental strains collected from known Top End locations to determine whether LPS genotypes for clinical isolates, which have usually only presumptive infection locale data, matched that of the environmental isolates, where location is definite. Lastly, we determined bimA Bm /bimA Bp diversity in the 340 environmental strains to determine whether the bimA variants and LPS types share similar geographical footprints in the NT.

Ethics statement
This study was approved by the Human Research Ethics Committee of the NT Department of Health and the Menzies School of Health Research (HREC 02/38).

B. pseudomallei isolates used in this study
A total of 1,345 B. pseudomallei isolates from the Top End region were used in this study for bivariate analysis and geographical mapping. 1,005 clinical isolates were included in this study and 999 (excluding the 6 B2 strains) were used for bivariate statistical analysis, and environmental isolates (n = 340) were used for geographical mapping. The 1,005 clinical isolates were primary B. pseudomallei isolates corresponding to 1,005 patients enrolled in the 28-year DPMS. The clinical isolates were assigned to one of four Top End geographic regions based on the presumptive location of infection, which is assigned based on the individual patient's history of illness and activities at their residential location or elsewhere if considered relevant. These regions have been segregated and assigned as outlined by the Australian Bureau of Statistics (https://www.abs.gov.au/; statistical area level 3): Darwin and surrounds; Darwin remote (includes West Arnhem), Katherine and Katherine remote (representing one region, Fig 1), and East Arnhem (Fig 1). Melioidosis is very uncommon in the southern half of the NT and this region was not included in our study.
The 340 Top End environmental isolates were processed and subsequently B. pseudomallei was confirmed as previously described [13][14][15][16][17]. The location for each of the 340 environmental isolates was determined using Global Positioning System coordinates, with mapping of each isolate coordinate using ArcGIS (https://www.arcgis.com/index.html) and maps were exported as JPEG and manipulated using INKSCAPE 0.92 (https://inkscape.org/release/ inkscape-0.92/). For comparison, we included an additional 61 isolates of Queensland (n = 38), Western Australian (n = 12) and Papua New Guinean (n = 11) origin.

LPS and bimA genotypes
The assignment of LPS types for the 1,005 clinical genomes has previously been determined [18] using the Basic Local Alignment Search Tool (BLAST). The clinical genomes were not assigned a bimA variant as this has previously been performed on our clinical strains [10]. LPS and bimA genotypes were determined for each of the 340 Top End environmental genomes, with an LPS type being assigned to those additional 61 isolates from Western Australia, Papua New Guinea and Queensland, Australia. In brief LPS A (wbil to apaH in K96243

Statistical analysis of LPS type, locale, and multilocus sequence type (MLST) using the clinical B. pseudomallei dataset (n = 999)
For each clinical strain, the presumptive geographical location of infection and its sequence type (ST) (https://pubmlst.org/bpseudomallei/) were analysed using Stata version 14.2 (Stata-Corp LP, College Station, TX, USA). Bivariate logistics regression using the cluster feature was performed for LPS type with locale using the Darwin and surrounding region as the reference level, and bivariate Pearson's χ 2 (frequency >5 in all cells) was performed for LPS type with ST or BimA variants: A P < 0.05 was considered significant. Environmental footprint of the Burkholderia pseudomallei lipopolysaccharide genotypes

Phylogenomic analysis of 175 B. pseudomallei genomes
We performed comparative analyses of 175 B. pseudomallei genomes that represent both the global diversity of this bacterium and within-Northern Territory population diversity (S1 Table). Of these 175 genomes, 144 were publicly available and 31 were sequenced as part of this study. Genomic DNA for 31 isolates lacking WGS data was extracted as previously described [19]. These isolates were sequenced at Macrogen, Inc. (Gasan-dong, Seoul, Republic of Korea) or Australian Genome Research Facility Ltd. (Melbourne, Australia) using the Illumina HiSeq2000 and Illumina HiSeq2500 platforms (Illumina, Inc., San Diego, CA) and these data are available on NCBI (S1 Table) Comparative analysis of the 175 genomes was carried out to investigate geographical clustering of the LPS types, to determine the genetic relatedness of suspected ST homoplasy cases [20,21], and to ascertain if suspected ST homoplasy cases were intercontinental or intracontinental. Orthologous biallelic single-nucleotide polymorphisms (SNPs) were identified from the WGS data using the default settings of SPANDx v3.2 [22]. The closed Australian B. pseudomallei genome MSHR1153 [23] was used as the reference for read mapping. A maximum-parsimony (MP) phylogenetic tree was reconstructed based on 219,075 SNPs identified among the 175 genomes using PAUP � 4.0.b5 [24]. Recombinogenic SNPs were identified using Gubbins v2.2.0 (default parameters) [25]. STs were determined using the BIGSdb tool [26].  Table 1). Using the clinical B. pseudomallei dataset, we performed bivariate logistics regression (with the cluster feature) of LPS type A and B correlations and presumptive infecting locale, using Darwin urban as the reference level. Low LPS B2 numbers (n = 6) precluded statistical analysis of this genotype, leaving 999 DPMS isolates for the analysis. Of these six B2 isolates, five were from presumptive infections occurring in Katherine and Katherine remote communities, with the remaining LPS B2 strain linked to infection in the Darwin and surrounding region. Bivariate logistics regression adjusted with the cluster feature demonstrated that the odds of getting a clinical isolate with LPS B in the Darwin remote, Katherine and Katherine remote and East Arnhem regions, respectively is 4.7 (95% CI 2.4-9.6; p<0.001), 14.6 (95% CI 6.1-34.7; p<0.001) and 19.5 (95% CI 9.1-42.0; p<0.001) times higher compared to the odds in the Darwin and surrounding region. LPS B prevalence was highest in patients infected in the Katherine and Katherine

Geographic clustering of LPS types was also observed in NT environmental strains
Geographical mapping of LPS types for the environmental isolates also supported the link between LPS and geography. LPS type A was predominant in Darwin and surrounds (Table 1 and Fig 2B), whilst LPS B was found in all four geographical regions but with numbers highest in the Katherine and Katherine remote region (Fig 2A and 2B, number of LPS B strains at each site is indicated). LPS B2 from environmental strains was only observed in the Katherine remote region (n = 1). In non-Top End comparator strains, LPS A was also the dominant genotype (range = 63-91%), followed by B (range = 9-24%) and B2 (range = 0-18%). Numbers of the LPS type A, B, and B2, respectively, were as follows: Western Australia (10, 2, 0), Queensland (24,9,5) and Papua New Guinea (8, 1, 2). Environmental footprint of the Burkholderia pseudomallei lipopolysaccharide genotypes  Table).

The geographical footprint of bimA Bm in environmental isolates is similar to LPS B
BLAST analysis demonstrated that bimA Bp was the dominant BimA variant in Top End environmental samples (94.8%), similar to the bimA Bp prevalence in clinical isolates from a previous study, whereby 95.7% of clinical isolates from Darwin and surrounds possessed bimA Bp [10]. Bivariate statistical analysis demonstrated that the proportion of LPS B strains carrying bimA Bm (n = 8, 21%) is higher compared to LPS A (n = 8, 2.6%) and this was significant (P<0.001). We next determined if bimA Bp or bimA Bm has a similar geographical footprint to LPS A or LPS B in the Top End. [10]. Geographical mapping of the bimA variants in the environmental strains demonstrated that bimA Bp was linked with Darwin and surrounds (Fig 3B), having a similar geographical footprint as LPS A. In contrast, bimA Bm was associated with the East Arnhem and the Katherine and Katherine remote region (13 of 17 (76%) environmental bimA strains), similar to LPS B. We also noted that the majority of the LPS B strains in the Katherine and East Arnhem region also carried bimA Bm (n = 8/13; 62%), and these strains were concentrated in two geographical regions (Fig 3, star [n = 6/9] and triangle [n = 2/2]).

Phylogenomics revealed that LPS B2 is geographically restricted within the Top End
A phylogenomic analysis was performed to investigate the distribution of LPS types and strain relatedness on the whole-genome level (Fig 4). Overall, good clade structure based on geographical origin was observed, with strains belonging to the same ST clustering together and within the same LPS type, but with LPS A and B being dispersed throughout the phylogeny (Fig 4). However, this analysis revealed that 5/6 LPS B2 genomes of Top End origin clustered together, suggesting that LPS B2 is geographically restricted in the Northern Territory (Fig 4). Similarly, the LPS B2 strains with a Queensland, Australian origin grouped closely together.

The B. pseudomallei LPS discrepancy in five MLST STs can be explained by ST homoplasy (isolates with the same ST that are genetically distinct)
The strong link between LPS type, geography and ST suggested that homoplasy may be present in the 18 clinical strains and three environmental strains that belonged to the five STs with more than one LPS type ( Table 2). Based on comparative analysis of 175 genomes, isolates with the same ST but mixed LPS type did not group together (Fig 4: "ST homoplasy cases") and were separated by a large number of SNPs. This is comparable to previously observed MLST homoplasy in B. pseudomallei [20,21]. To assess the effect of recombination on phylogenetic inference, SNP density filtering was applied to remove recombinogenic SNPs with Gubbins (v.2.3.1), which did not considerably alter the topology of the phylogenetic tree. We also noted that the isolates belonging to ST-807 and ST-468 were from three Top End geographical regions and varied by a large number of SNPs with the largest SNP difference being 36,000 and~29,000 respectively (Table 2). This phenomenon represents further examples of intracontinental homoplasy but includes homoplasy occurring in more closely related geographical regions than previously described [20].
The LPS O-antigen moiety of B. pseudomallei is highly varied with three serotypes described [12]. Nevertheless we recently found that previously published in vitro differences in virulence between LPS A and LPS B did not translate to differences in mortality in DPMS patients, supporting the dominant role in human melioidosis of host risk factors in determining disease severity and outcomes [18]. Regional differences in LPS serotype prevalence have been noted in Southeast Asia and Australia with LPS A being the most prevalent, followed by LPS B and with LPS B2 being the rarest in Australian strains. As yet LPS B2 has only been detected in B. pseudomallei strains from Australia and Papua New Guinea [12]. Despite the reporting of these regional differences in LPS prevalence, no studies have investigated the geographical footprint of the LPS types in a contiguous melioidosis-endemic region. We used a combination of statistical analysis, WGS and geographical mapping to examine the geographical footprint of the LPS genotypes in the Top End of the Northern Territory.
We first examined the prevalence of the LPS types across the environmental and clinical samples, and our data confirm that the prevalence of the LPS types is consistent between the Northern Territory environmental (LPS A = 89%; LPS B = 10% and LPS B2 = 1%) and clinical isolate collection (LPS A = 87%; LPS B = 12% and LPS B2 = 1%) [18], with LPS A being the most prevalent. High prevalence of LPS A is reflected in it also being present in many nearneighbour species [31], with the possibility of horizontal gene transfer of LPS A between these related species. LPS A was also distributed throughout the B. pseudomallei phylogeny, also supporting horizontal transfer of the entire LPS A loci. Based on our large clinical dataset LPS provides protective immunity to melioidosis in in vitro models of disease making it a potential vaccine candidate. It has also been demonstrated that antibodies from the two dominant LPS types, LPS A and B are not cross-reactive due to structural differences of the O-antigen, with consequent implications for an LPS based vaccine [12]. The sole presence of LPS A in the Darwin city region where the majority of melioidosis cases occur in the Northern Territory [6], suggests that an LPS A based vaccine strategy would be appropriate in this region. However, in regions where LPS A and LPS B strains are both present such as the Darwin remote region, an LPS A alone vaccine strategy is unlikely to provide adequate protection. For other melioidosis endemic regions globally, determination of the LPS genotype footprints will be critical if LPS vaccines are to be used.
Despite the strong link noted between ST and LPS type, we detected an LPS discrepancy in 21 isolates belonging to five rare STs: ST-118 (n = 3), ST-734 (n = 10), ST-468 (n = 5), ST-807 (n = 3) and ST-1485 (n = 3). Phylogenomic analysis of 175 genomes confirmed that the isolates belonging to the five STs represented five new ST homoplasy occurrences, explaining the LPS discrepancy noted in the five STs. Isolates of each ST were separated by a large number of SNPs (~29,000-~39,000), which is characteristic of Australian isolates that belong to different STs [20,21]. Despite the large number of SNPs separating isolates within each ST, they all still resided within the Australian clade, representing five novel intracontinental ST homoplasy cases. These findings further highlight the low resolution of MLST and the need for high resolution techniques such as WGS to resolve unexpected genotyping results, particularly for B. pseudomallei which is a highly recombinogenic pathogen [32].
In addition to the phylogeographical role of the different B. pseudomallei LPS genotypes, the potential for differential virulence amongst LPS genotypes and potential interactions with other variable genotypes requires consideration. In vitro models of LPS types have been associated with differential virulence [12,33,34]. However, as noted above this did not translate to clinical significance in our large clinical cohort (>1000 melioidosis patients) where mortality, rates of bacteraemia and septic shock were the same for patients with LPS A and LPS B [18]. Nevertheless, it is known that virulence mechanisms work in concert for B. pseudomallei to invade and cause disease and further genomic studies are required to assess whether there are regional combinations of virulence genes including LPS variants that will have predictive capability for variable disease presentations and severity. For example, an association between bimA Bm and neurological melioidosis has been previously described [10], and a new finding from this study was that eight environmental strains from the Katherine and Katherine remote and East Arnhem region had a combination of bimA Bm and LPS B. Whether this specific bimA Bm −LPS B combination has implications for neurotropism of B. pseudomallei now requires elucidation.
The present study encompassed a number of limitations. One limitation of this study is that the assigned presumptive location of infection for cases of melioidosis is dependent on good epidemiological history obtained from patients and will sometimes not be the true location of infection. Nevertheless, the close correlation of geographical LPS footprints between the clinical case isolates and the environmental isolates (which are 100% location accurate) is reassuring that the prospective nature of the Darwin melioidosis study is providing accurate epidemiological data. A second limitation is that the four regions ( 3,360,659hectares] https://www.abs.gov.au/) vary substantially in both geographic size and population density, which results in a greater number of isolates obtained from the Darwin and surrounding region and ultimately clustered data. We accounted for the clustered nature of our data by using bivariate logistics regression with the cluster feature.
In conclusion, by combining genomic data with corresponding strain geographical information we have found that in the tropical north of Australia the LPS types have distinct geographical footprints as well as ST associations, adding to the already known variably present genetic markers fhaB3 and the bimA variants. A novel and interesting finding was that the geographical footprint of LPS B and bimA Bm are similar and in remote Top End locations. The clinical, sero-diagnostic and vaccine implications of geographically-based B. pseudomallei LPS and other gene differentials and their relationships to regional and global dispersal of melioidosis require global collaborations with further analysis of larger clinically and geospatiallylinked datasets.
Supporting information S1