Figures
Abstract
Streptococcus pneumoniae (pneumococcus) is a genetically diverse opportunistic bacterial pathogen that expresses two phase-variable loci encoding restriction-modification systems. Comparisons of two genetically-distinct pairs of epigenetically-distinct variants, each distinguished by a stabilised arrangement of one of these phase-variable loci, found the consequent changes in genome-wide DNA methylation patterns were associated with differential expression of mobile genetic elements (MGEs). This relationship was hypothesised to be mediated through changes in xenogenic silencing (XS) or nucleoid organisation. Therefore the chromosomal conformation of both variants of each pair were characterised using Illumina Hi-C, and Nanopore Pore-C, sequencing. Both methods concurred that the organisation of the pneumococcal chromosome was dominated by small-scale structures, with most pairwise interactions between loci <25 kb apart. Neither found substantial evidence for higher-order structure or XS in the pneumococcal genome, with more complex contact patterns only evident around the replication origin. Comparisons between the variants identified phage-related chromosomal islands (PRCIs) as the foci of differential contact densities between the variants. This was driven by copy number variation, resulting from variable excision and replication of the episomal PRCIs. However, the methods were discordant in their identification of the variant in which the PRCI was more actively replicating in both pairs. Validatory experiments demonstrated that the prevalence of circular PRCIs was not determined by DNA modification, but instead varied stochastically between colonies in both backgrounds, and was metastable during vegetative growth. PRCI excision was inducible by mitomycin C, but independent of the presence of a prophage. Yet transcriptional activation of these elements was affected by both signals, indicating transcription and replication are separately regulated. Therefore pneumococcal MGEs do not appear to be subject to XS, resulting in heterogeneity being generated within these bacterial populations through the frequent local disruption of chromosome conformation resulting from the stochastic excision and reintegration of episomal elements.
Author summary
The pneumococcus is a bacterium with a circular chromosome that is organised by DNA-binding proteins and often contains mobile genetic elements (MGEs), genes able to transmit between bacteria. All pneumococci encode defences against MGEs, some of which create epigenetic modifications (typically methylation) genome-wide at particular sequence motifs. Changes in these epigenetic patterns are associated with altered MGE gene expression and replication in otherwise genetically-identical bacteria. To test whether this was the result of methylation remodelling the organisation of the chromosome, we compared the contact patterns across the chromosome using two sequencing technologies. Both methods concurred that the pneumococcal genome is generally folded into small structures, with the biggest differences between the variants caused by the replication of MGEs. However, the methods disagreed on the variant in which the MGEs replicated fastest. Further experiments showed that MGE replication was stable over the course of culturing over hours, but would randomly change level between days, explaining the inconsistent observations. MGE replication was found to rise in response to DNA damage, whereas gene expression also depended on the presence of other signals, explaining the discrepancies in these activities between variants. Hence MGEs significantly contribute to the heterogeneity that rapidly accumulates within pneumococcal populations.
Citation: Lim TY, Horsfield ST, Troman CM, Bentley SD, Kwun MJ, Croucher NJ (2025) Pore-C sequencing identifies episome-driven chromosome conformation perturbations differentiating pneumococcal epigenetic variants. PLoS Pathog 21(8): e1013392. https://doi.org/10.1371/journal.ppat.1013392
Editor: Gavin Paterson, University of Edinburgh, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: March 23, 2025; Accepted: July 18, 2025; Published: August 14, 2025
Copyright: © 2025 Lim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All raw genetic data used in this study are available from the ENA with the accession codes listed in S1 and S2 Tables. Processed files generated by bioinformatic pipelines are available from FigShare (https://figshare.com/projects/Hi-C_and_Pore-C_data_for_pneumococcal_epigenetic_variants/227616). Experimental results and sample information are available from Github (https://github.com/nickjcroucher/RMV_PoreC).
Funding: This work was supported by a Sir Henry Dale fellowship jointly funded by Wellcome and the Royal Society (grant 104169/Z/14/A to NJC); Wellcome (grant 206194 supporting SDB); the European Bioinformatics Institute (supporting STH), and the UK Medical Research Council and Department for International Development (grant MR/R015600/1 supporting TYL, STH, KT, MJK and NJC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: NJC has consulted for Antigen Discovery Inc., Merck and Pfizer, and has received an investigator-initiated award from GlaxoSmithKline; SDB has consulted for Pfizer and Merck.
Introduction
Streptococcus pneumoniae (the pneumococcus) is a gram-positive commensal bacterium and opportunistic pathogen. Pneumococci are a common cause of diseases such as conjunctivitis, pneumonia, sepsis and meningitis [1]. The species has an extensive pangenome [2–4] as a consequence of genetic exchange through transformation [5] and the frequent integration of mobile genetic elements (MGEs) into the pneumococcal chromosome [4]. However, the acquisition of genes can be inhibited by restriction modification systems (RMSs) [6,7]. Almost all pneumococci share two phase-variable Type I RMSs: the SpnIII RMS encoded by the inverting variable restriction (ivr) locus [8,9], and the SpnIV RMS encoded by the translocating variable restriction (tvr) locus [4,9]. The rapid variation of these loci can inhibit the spread of prophage within clonally-related populations [10,11]. Yet the changes also cause global alterations to the methylation of the genome. Hence the sensitivity of pneumococcal transcription to epigenomic changes means these RMSs also cause phenotypic heterogeneity between isolates [9,12].
Variation in the SpnIII RMS has been associated with changes in capsule expression [9,13,14], human carriage [15], adhesion to human cells [16], and virulence in a mouse model [9,17]. Variation at the tvr locus is instead associated with variation in transformation efficiency [12,18]. However, it has not been possible to identify methylation sites that can explain the transcriptional changes through proximal effects at gene regulatory loci [9,12]. Yet methylation has broader effects on the biophysical properties of DNA that means a global change in methylation patterns could affect the conformation of the bacterial nucleoid, and alter the interactions between regulatory proteins and DNA loci [19]. Hence it is plausible that the epigenetic effects are instead mediated through longer-distance, larger-scale changes in chromosomal conformation.
This is supported by analysis of two pneumococcal isolates, RMV7 and RMV8. Mutations disrupting the recombinase that catalyses rearrangements at the tvr locus enabled “locked” phase variants of each to be isolated: one representing the common, dominant arrangement (RMV7domi and RMV8domi), and one expressing a rare arrangement (RMV7rare and RMV8rare) [7]. These variants differ in the methylation motif targeted by the SpnIV RMS, but are otherwise isogenic outside the tvr loci. Both rare variants were more transformable than their dominant variant counterparts, and in both cases this was at least partly attributable to the altered activity of MGEs integrated into the chromosome [12,18]. RMV7 contains a cryptic integrative and conjugative element next to rpsI (ICErpsI) and the antibiotic-resistance transposon Tn916, as well as two phage-related chromosomal islands (PRCIs; alternatively known as phage-inducible chromosomal islands, or PICIs) integrated next to dnaN (PRCIdnaN) and uvrA (PRCIuvrA). PRCIdnaN was more transcriptionally active in RMV7domi, suppressing the induction of competence for transformation through inducing stress responses [12]. RMV8 encodes a prophage (ϕRMV8), a PRCI integrated within the mal operon (PRCImalA), and a CIPhR prophage remnant (previously annotated as PRCItadA) [18,20]. In RMV8domi, reduced excision of ϕRMV8 resulted in greater expression of a prophage-modified RNA that suppressed competence induction [18]. However, no specific methylation sites have been identified that could explain this difference in MGE activity [12].
Hence host DNA conformation is a plausible mechanism by which epigenetic modification could modulate the activity of MGEs through xenogeneic silencing (XS) [21]. This process depends on nucleoid-associated proteins (NAPs) that preferentially bind to the AT-rich DNA that is characteristic of MGEs [22,23], and are capable of silencing their gene expression. As well as affecting MGE gene transcription [23–25], these NAPs can modulate the excision and re-integration of MGEs [26]. Such differences in XS activity could explain the inter-variant differences observed in RMV7 and RMV8. At least four families of NAPs have been identified [21,27–30]. Two are common in gram-negative Proteobacteria: H-NS, identified in Escherichia coli [31], and MvaT, characterised in Pseudomonas [30]. By contrast, Lsr2 has only been found in gram-positive Actinobacteria [29], and the Rok protein has only been characterised in Bacillus subtilis [32]. Hence no XS system has yet been characterised in S. pneumoniae. Nevertheless, the species encodes orthologues of Rok, as well as other proteins that preferentially bind AT-rich DNA [33]. Therefore the differential activity of MGEs in RMV7 and RMV8 may be attributable to changes in the patterns of NAPs associated with the genome.
The conformation of the genome, and therefore the distribution of NAPs and regions subject to XS, can be characterised using chromosome conformation capture (3C) [34]. This approach typically involves crosslinking DNA to proteins with formaldehyde, digesting the DNA with a frequently-cutting restriction enzyme, then facilitating proximity ligation to link DNA fragments that were spatially close within the cell (Fig 1A). The subsequent use of paired-end Illumina data enables such contact patterns to be assayed at a genome-wide scale using high-throughput 3C (Hi-C), which infers instances of digestion and re-ligation within the DNA separating the two reads based on their mapping positions and orientations relative to a reference genome [35]. More recently, long-read data generated by Nanopore sequencing has enabled 3C analyses to use information from concatemers consisting of multiple DNA fragments ligated together [36,37]. Hence this Pore-C method makes it possible to infer higher-order multiway chromatin interactions, and represents an efficient method for generating data on the scale of a bacterial genome.
(A) Diagram describing the methodology underpinning the use of high-throughput sequencing to capture chromatin conformation (3C). The final sequencing step distinguishes the Illumina (Hi-C) and Oxford Nanopore Technology (Pore-C) methods. (B) Relationship between the frequency of contacts between loci and their separation in the genome. Each panel represents a unique combination of sequencing technology and epigenetic variant. Each point corresponds to a detected pairwise interaction between loci in a contact matrix calculated at a 5 kb resolution for both technologies. The red dashed lines represent the thresholds used to define the closely-spaced loci between which particularly high contact densities were inferred. (C) Genome-wide Hi-C and Pore-C contact matrices shown at a 10 kb resolution. Each panel displays the cumulative number of contacts identified in three biological replicates for each unique combination of sequencing technology and epigenetic variant. The frequency of detected interactions between loci across the genome is represented by the colour of the corresponding cell in these symmetrical matrices. The black dashed lines indicate the position of the origin of replication. (D) Matrix detailing the cumulative number of contacts identified in loci surrounding the origin of replication, the position of which is marked by black dashed lines. The coloured dashed lines indicate the position of the parS site downstream of ori; the coloured dotted lines show the position of the parS sites upstream of ori. Each cell in the matrix represents the interactions between a pair of 10 kb loci.
Therefore this study compared Hi-C and Pore-C approaches to inferring the chromosomal conformation of the dominant and rare variants of S. pneumoniae RMV7 and RMV8. These assays were employed to determine whether it is feasible that the phenotypic heterogeneity associated with epigenetic variation can be attributed to changes in XS, or other alterations in the distribution of NAPs across the genome.
Results
Short-range interactions dominate pneumococcal Hi-C and Pore-C data
Three independent 3C preparations were generated for each of S. pneumoniae RMV7domi, RMV7rare, RMV8domi and RMV8rare (S1 and S2 Tables). These were digested with NlaIII, which can cut at sites found in the pneumococcal chromosome at intervals of ~300 bp (S3 Table), and used to generate a 12-plex library for Nanopore sequencing. However, this yielded few reads, as the activity of the pores declined rapidly during the sequencing run (S1 Fig). Residual DNA from these samples was instead processed for Illumina sequencing, and successfully generated Hi-C data (S2 Table). A further three independent 3C preparations were generated for each of the four variants using MluCI, which can cut at sites found in the pneumococcal chromosome at intervals of ~125 bp (S3 Table). Previous applications of the Pore-C approach to human cells found an additional protease treatment with pronase increased the efficacy of de-cross-linking, thereby slowing the rate of nanopore blocking by proteins attached to DNA [38]. Correspondingly, including this step in the sample preparation protocol enabled more efficient data generation from individual flow cells (S1 Fig and S2 Table).
The three biological replicates for both methods for each of the variants were mapped to the appropriate dominant variant reference genome (S2 and S3 Figs and S1 Table). The necessity of mapping short read fragments meant the Pore-C data, which contains a higher rate of sequencing errors, mapped less efficiently than the Hi-C data. Nevertheless, a high proportion of the Nanopore data could be uniquely mapped to locations within the pneumococcal genome (S3 Fig). To establish the highest resolution at which analyses could be reliably conducted, the proportions of inferred relative mapping orientations of sequence fragments were plotted against the separation between their mapped locations if they came from the same read pair, for Illumina Hi-C data, or from the same read, for Pore-C data. This found that sequence fragments separated by less than 5 kb were typically mapped to opposite strands of the genome for Illumina Hi-C data (S4 Fig), or to the same strand of the genome, for Pore-C data (S5 Fig). These are the orientations expected for undigested DNA sequenced with these technologies, consistent with only a subset of restriction sites being cleaved during sample preparation. This is typical of 3C preparations, requiring a convergence distance to be defined as the separation at which sufficient digestion and re-ligation occurred to ensure all sequence fragment mapping orientations become equally frequent [39]. This was estimated to be between 5 and 10 kb for both technologies (S4 and S5 Figs).
Plotting the inferred contact frequency between loci against their separation highlighted particularly strong interactions at an interval of ~5 kb (Fig 1B), which may indicate a particular set of atypical contacts. However, plotting the locations of these high-level short-range interactions found they were distributed across the chromosome, and therefore may represent false positive contact inferences resulting from incompletely-digested DNA (S6 Fig). Therefore data were subsequently analysed at a resolution of 10 kb, to mitigate any effects of DNA fragments that had not undergone digestion and religation.
This locus size was used to analyse the contact frequencies between all loci across the chromosome for each replicate (S7 and S8 Figs), and combined across each set of replicates (Fig 1C). A strong diagonal was evident across all of the symmetrical contact frequency matrices. This suggested most interactions occurred over distances <25 kb, consistent with a simple genome organisation, dominated by short-range structures. There were relatively few signals of higher-level chromosomal organisation. Faint secondary diagonals, intersecting the orthogonal main diagonal around the terminus of replication, were evident in the Pore-C data. Both methods identified an elevated density of off-diagonal interactions around the origin of replication (ori; Fig 1D and S8 Fig). This was positioned ~1.6 Mb from the beginning of the RMV7 reference, resulting in vertical and horizontal stripes across the matrix in the Hi-C data, whereas the Pore-C data showed an enrichment of contacts in the upper right quarter of the matrix (Fig 1C). The RMV8 ori was at the start of the reference genome, resulting instead in a framing effect in the Hi-C data, with elevated intensities in each corner of the matrix.
Focussing on the origin of replication suggested the presence of short-range boundaries to interactions that were consistent across technologies at ~40 kb or ~20 kb downstream of ori in RMV7 and RMV8, respectively (Fig 1D). These coincided with the location of the only conserved parS site in this downstream replichore [40], which is more distant from ori in RMV7 as a consequence of the integration of PRCIdnaN in the intervening sequence [12]. The parS site is recognised by the ParB chromosome segregation machinery, which is integral to organising the origin domain of the chromosomes of the Bacillota species B. subtilis [41] and S. pneumoniae [40]. Hence this motif is likely to underlie the observed patterns of chromosomal contacts in this region. Although two parS sites are proximal to ori in the upstream replichore, these do not appear to affect contact patterns in the same way, consistent with previous observations of parS sites having distinct effects depending on their location [40,41]. Hence both Hi-C and Pore-C data produced a consistent view of the contacts that dominate the pneumococcal genome’s conformation.
Differential interactions between variants span PRCIs
A genome-wide search for differences in chromosome conformation between the variants was undertaken using contact matrices that summarised contact counts within non-overlapping 10 bp loci. The DiffHiC (S9–S12 Figs) and multiHiCcompare (S13–S16 Figs) methods were used, both of which fit generalised linear models to independent biological replicate datasets to identify differences in contact frequencies [42,43]. However, they differ in their normalisation of the data [44]. Clustering of the biological replicates by the diffHiC pipeline suggested the variants could be resolved from one another by the genome-wide patterns of contacts in each comparison, except in the Hi-C analysis of the RMV7 variants (S17 Fig). However, these separations were not always associated with high bootstrap values, suggesting the differences were attributable to a small subset of loci, rather than broad changes in genome-wide contact frequencies. Correspondingly, specific sites exhibiting significantly different contact frequencies between the genotypes, following Benjamini-Hochberg corrections for multiple testing, could be identified using Manhattan plots (Fig 2A).
(A) Manhattan plots comparing the contact densities between epigenetic variants. For each genotype, two different statistical methods (diffHiC and multiHiCCompare) were used to compare the three biological replicates generated using each technology (Hi-C and Pore-C) for both epigenetic variants. The negative logarithm of the q values, corresponding to false discovery rates following a Benjamini-Hochberg correction for multiple testing, are shown for each locus across the genome. The threshold for genome-wide significance is shown by the red dashed horizontal line. The vertical black dashed lines show the boundaries of the variable PRCIs (PRCIdnaN in RMV7, and PRCImalA in RMV8) within the chromosomes. (B) Sites of differential contact frequencies identified by all four statistical comparisons within PRCIdnaN in RMV7, and PRCImalA in RMV8. Each point represents an interaction that was found to be significantly different between the corresponding dominant and rare variants following a correction for multiple testing. Its position on the horizontal axis represents the site of the interacting locus with the lower numerical position. The dashed line connects the point to the interacting locus with the higher numerical position. The vertical position of the point represents the base two logarithmic fold difference in the rare variant relative to the dominant variant, such that values greater than zero represent a higher contact frequency in the rare variant. The shapes of the points represent the method used to generate the data.
For RMV8, the only loci that were consistently identified as significantly differing between the two variants by all four comparisons were within, or adjacent to, PRCImalA [18]. The interacting loci, and the fold differences in contact frequency, were plotted against the annotation of this MGE (Fig 2B). The hits were either between the flanking region and the element, or between the two flanks. However, while the spatial distribution and magnitude of the effects were similar across all comparisons, the Hi-C data suggested higher contact frequencies in RMV8rare, whereas the Pore-C data suggested higher contact frequencies in RMV8domi. To rule out sample misidentification as a cause of this discrepancy, it was verified that the patterns of contact frequencies at the tvr locus matched the arrangements characteristic of the corresponding variants, and that a simple analysis of contact frequency ratios within PRCImalA supported the genome-wide analysis results (S18 Fig). Detailed analysis of the Nanopore sequencing read lengths mapping to this region did not suggest this was an artefact of the Pore-C methodology preferentially sequencing shorter DNA fragments originating from the MGE (S19 Fig). Therefore the Hi-C and Pore-C analyses concurred that there was variation in the contact frequency within PRCImalA, but disagreed on the variant in which the contact density was highest.
The comparison of the RMV7 variants provided an opportunity to test whether this observation could be independently replicated, as the analyses of Hi-C data were consistent in identifying PRCIdnaN [12] as the region differing most significantly between RMV7domi and RMV7rare (Fig 2A). This MGE was also the location of the most significant difference in contact frequencies between the variants in the diffHiC comparison of Pore-C datasets, in addition to being the site of multiple significant differences in the multiHiCcompare analysis of the same data (Fig 1D). The detailed plot of these points relative to the PRCIdnaN annotation demonstrated these hits were either within the element, or between the element and its flanking regions (Fig 2B). While the differences were again of similar magnitude and distribution across the two methods, the Hi-C and Pore-C data were once more discordant in identifying the variant in which the contact frequency was highest. In this comparison, the Hi-C data suggested the contact densities were higher in RMV7domi, whereas the Pore-C data implied they were higher in RMV7rare. Therefore multiple significant differences in contact frequencies between both pairs of variants were within or flanking PRCIs, although the methods did not reach a consensus on the nature of the altered pattern of interactions.
Changes in PRCI contact frequency do not reflect changes in XS
These discrepancies between the datasets for each variant could represent either a biological difference between the samples, or an artefact of the differing methodologies used to generate the sequence reads. Consequently, a generalised linear mixed-effects model was constructed that could resolve the correlations with different genetic and biological variables. This model was fitted to the four combinations of sequencing methodology and genotype at a resolution of 10 kb using a zero-inflated Poisson distribution with a logarithmic link function (see Methods; S20–S24 Figs). The fitted models were able to explain a substantial fraction of the variation in contact frequencies across loci and datasets (S25 Fig). The effect of distance between loci was estimated to be consistent between the genotypes, but contact frequencies declined more rapidly with distance in the Hi-C data, suggesting the Pore-C data were more efficient at detecting long-range interactions (Fig 3A).
A zero-inflated Poisson generalised linear mixed effects model was fitted to the four combinations of genotype and sequencing technology using contact matrices calculated at a resolution of 10 kb. The two genotypes are distinguished by colours, and the two technologies are distinguished by line and point styles. (A) Coefficients estimating the effect of separation between loci on contact frequencies, normalised to the expected contact frequency between neighbouring loci, shown by the horizontal line at a value of one. (B) Coefficients estimating the effect of MluCI and NlaIII site densities on the detection of contacts; these coefficients reflect the impact of the number of sites per 1 kb. The horizontal line at one represents the coefficient value if the density of sites had no effect on the number of inferred contacts. (C) Coefficients estimating the effect of the fraction of bases (scaled upwards 10-fold to aid model fitting) that are guanine or cytosine (GC) on contact frequencies. (D) Coefficients estimating the contact frequencies within mobile genetic elements. The top panel shows the coefficients quantifying the relative contact frequencies in the MGEs, normalised to the frequencies observed across the rest of the chromosome, corresponding to the horizontal line at a value of one. The bottom panel shows the coefficients estimating the contact frequencies at the same loci in the rare variant relative to the dominant variant. (E) Locus-specific variation in the contact frequencies across the loci encompassing the two PRCIs highlighted in Fig 2: PRCIdnaN within RMV7, and PRCImalA within RMV8. Data are plotted at 10 kb intervals. The black vertical dashed lines indicate the boundaries of the PRCIs. (F) Differences between the variants at the loci shown in (E).
To test whether any of the differential contact frequency results were artefacts of the restriction enzymes used to generate the samples, the model also estimated the sensitivity of the inferred contact frequencies to the density of sites targeted by these endonucleases (Fig 3B). With the Pore-C data, there was no evidence of any strong relationship between restriction sites and inferred contact density. With Hi-C data, a higher density of NlaIII sites was associated with an increased sensitivity for detecting contacts, which is consistent with the experimental protocol. However, there was an inverse relationship between the inferred contact density and the density of MluCI sites, despite this enzyme not being involved in sample preparation. Given the AT-rich nature of the MluCI sequence motif (AATT), such an observation may be attributable to Illumina sequencing’s bias towards DNA with a neutral GC content, which is not shared with Nanopore data [45]. However, contact frequencies did not exhibit an overall inverse relationship with GC content. Rather, contact frequencies increased with rising GC content in RMV7, but showed the opposite relationship in RMV8 (Fig 3C). Both trends were more extreme for the Hi-C data than the Pore-C data. This suggested the Illumina data may be more affected by localised regions of extremely AT-rich sequence, correlating with the presence of MluCI sites, rather than the mean GC content averaged over a 10 kb locus. Nevertheless, no consistent evidence could be found of a pneumococcal XS system that preferentially condensed AT-rich sequences [46].
A more specific test for XS was quantifying the contact density across the MGEs within each of the genomes, relative to the rest of the chromosome. For RMV7, neither the Hi-C or Pore-C data found any evidence of elevated contact densities across the identified MGEs, with the confidence intervals of the level of contact density relative to the rest of the genome always spanning one (Fig 3D). When comparing the variants, differences in the density of contacts were only significant for PRCIdnaN. In further agreement with the differential contact analyses (Fig 2B), the contact frequencies were higher for PRCIdnaN in RMV7rare for the Pore-C data, but lower in the Hi-C data. For RMV8, the Illumina Hi-C data again found no evidence of the ϕRMV8 or PRCImalA MGEs being associated with elevated contact densities, although there was some evidence of higher contact frequencies within the latter in the Pore-C data. However, the Pore-C data also indicated that contact frequencies within PRCImalA were reduced in the rare variant, suggesting there was no consistent evidence for increased contact densities across both variants. By contrast, the Illumina Hi-C data found that PRCImalA was associated with higher contact densities in RMV8rare. Therefore the model outputs were consistent with the differential contact analysis in both genotypes, while also suggesting there was no evidence of elevated contact densities within the pneumococcal MGEs that would have been indicative of XS.
To test whether there was evidence of XS at any location within MGEs, locus-specific effects on contact frequencies were estimated for the sites within and flanking PRCIdnaN and PRCImalA (Fig 3E). There was no evidence of heterogeneity within the elements, although the size of PRCImalA meant it contained only a single locus, making the elevated contact density inferred for this element in the Pore-C data highly susceptible to confounding by locus-level effects (Fig 3D). There was also little variation between the variants at these sites (Fig 3F), which is consistent with the differences being captured by the MGE-level variables (Fig 3D). However, variation was evident in the flanking regions (Fig 3F), the direction of which mirrored the alterations in the elements themselves (Fig 3D). Therefore the differential contact densities at the PRCIs was not caused by changing levels of XS-driven condensation, but instead reflected altered interactions between the MGEs and the proximal regions of the host chromosome.
Contact frequency changes are associated with altered PRCI activity
Such an observation suggested the differences between the variants may reflect changes in excision, replication and activation of the PRCIs. Given the conflicting data between the Hi-C and Pore-C methods, previously-collected RNA-seq data were used to identify the variants in which the MGEs were more active. PRCIdnaN had previously been identified as being transcribed at a significantly higher level in RMV7domi than RMV7rare during early exponential phase using both RNA-seq (Fig 4A) and quantitative reverse transcriptase PCR (qRT-PCR) data [12]. Analogously, the PRCImalA genes were generally more highly expressed in RMV8rare than RMV8domi (Fig 4B), although these differences were not consistent enough to reach genome-wide significance [18]. However, no difference was observed when the tvr locus of each of the variants was removed (Fig 4C), indicating the differences might be associated with epigenetic modifications [18].
(A) Quantification of PRCIdnaN expression in RMV7domi and RMV7rare using previously-published RNA-seq data [12]. The annotated protein coding sequences (CDSs) within the PRCIs are shown by the arrows at the bottom of the panel, with their orientation indicating the direction in which they are transcribed. The lines in the main panel show the level of expression estimated for each CDS for each of the three independent biological replicates as normalised transcripts per million reads. The solid line indicates the median value for each variant, and the shaded area indicates the range between the maximum and minimum values. (B) Quantification of PRCImalA expression in RMV8domi and RMV8rare using previously-published RNA-seq data [18], as displayed in panel (A). (C) Quantification of PRCImalA expression in RMV8domi tvr::cat and RMV8rare tvr::cat using previously-published RNA-seq data [18], as displayed in panel (A). These two mutants do not have functional SpnIV RMSs. (D) Quantification of PRCImalA expression in RMV8domi and RMV8rare using qRT-PCR assays of transcription of an integrase (int) and replication (dnaC) CDS. Three technical replicates are shown for each of three biological replicates, as indicated by the shapes of the points. The violin plot summarises their distribution, with the median indicated by the horizontal line. Measurements were made at both early and mid-exponential timepoints. (E) Diagram showing the primers used to quantify the integration, excision and replication of the PRCIs. (F) Differences in integration, excision and replication of the PRCIs between variants. Data are shown as in panel (D). For both PRCIdnaN, in RMV7, and PRCImalA, in RMV8, the level of excision was quantified as the ratio of attB sites to attL sites A higher ratio indicates a greater level of excision. Similarly, the levels of circularised PRCIs were quantified by measuring the concentration of attP.
To validate whether transcription of PRCImalA substantially differed between the variants, qRT-PCR was used to assay the expression of int, encoding the element’s integrase, and dnaC, encoding a DNA replication protein, during early exponential (OD600 = 0.2) and mid-exponential (OD600 = 0.5) phase growth (Fig 4D). Although no substantial difference was evident in early exponential phase, the transcription of the replication gene rose to be ~4-fold higher in RMV8rare than RMV8domi during mid-exponential phase, consistent with the higher expression of these genes in the same variant in the RNA-seq data (Fig 4B). This suggested that PRCIdnaN and PRCImalA were less frequently transcribed in RMV7rare and RMV8domi, respectively. Hence transcription, as well as contact densities, differed between epigenetic variants at these PRCIs.
The readily detectable levels of gene expression, which changed in response to growth, were also consistent with an absence of XS. Furthermore, they were indicative of active excision and replication of these PRCIs, which would explain the differences in contacts within these elements, and between them and their flanking regions [47]. Therefore qPCR assays were used to measure the ratios of excised to integrated elements, and the copy number of circularised elements within the cell (Fig 4E, F). In RMV7, the ratio of excised to integrated PRCIdnaN was significantly higher in RMV7rare than RMV7domi, although there was no substantial difference in the number of circularised elements within the cell. In RMV8, both the ratio of excised to integrated elements, and the frequency of circularised PRCImalA genomes, were significantly higher in RMV8domi than RMV8rare. Therefore the altered contact frequency patterns between variants likely represented changes in the topological configuration of the elements.
For such a topological change to fully explain the significant differences of opposing magnitude in the Hi-C and Pore-C comparisons, it would need to be possible for differences in PRCI circularisation to be metastable [48]. Under such circumstances, a variant may exhibit levels of PRCI circularisation and replication that are stable over enough bacterial generations such that separate biological cultures grown in parallel have consistent properties. Yet the conflicting results between analyses demands that it is also possible that re-isolation of the variant produces a genotype with a different level of PRCI activity that is similarly persistent.
To test this, three independent colonies were cultured for each variant of RMV7 and RMV8. Each of these cultures was frozen, used to inoculate overnight growth, then diluted into two parallel tubes of fresh media until they reached exponential phase growth. Measurements of the level of excision, and extrachromosomal circularisation, were generally consistent across the parallel duplicate cultures inoculated from the same isolate (Fig 5A). However, there was substantial variation between cultures derived from different colonies for RMV8. These results also contrasted with the previous qPCR assays, which were performed on the same isolate used for the Pore-C analysis (Fig 4F). For RMV7, the results were more consistent across the colonies representing each variant (Fig 5A). However, RMV7domi was generally associated with the higher median copy number across the replicates in this experiment, whereas this value had been higher for RMV7rare in the previous assay (Fig 4F). Therefore PRCI copy number levels varied stochastically between even isogenic isolates, but are metastable during culturing in liquid media.
(A) Violin plot showing the variation in PRCI topology between cultures inoculated with independently-isolated colonies of the four variants. Two biological replicate cultures were grown from each of three independently-isolated colonies for each variant. Each violin plot represents a single colony, and describes the distribution of three technical replicate measurements for each of the paired biological replicates, with the shape of each point indicating the biological replicate to which they relate. The horizontal line across each violin represents the median value. (B) Violin plot showing the variation in PRCIdnaN excision during a six-day passage of RMV7 genotypes. Four independent mutants, in which the RMV7domi or RMV7rare tvr locus had been introduced into a tvr::Janus RMV7 parental genotype, were passaged in parallel [12]. DNA was sampled from a single biological replicate of each mutant at the start of the passage, and after two, four and six days. Each violin summarises three technical replicate measurements of a single biological replicate. (C) Violin plot showing the effect of inducing stimuli on the topology of PRCImalA. The excision and replication of PRCImalA following exposure to competence stimulating peptide and mitomycin C were assayed two hours after exposure in early exponential growth. These experiments used both RMV8rare, and a mutant derivative lacking the ϕRMV8 prophage. Data are shown as in panel (A). (D) Violin plot quantifying the effect of mitomycin C on PRCI gene expression. The expression of the int and dnaC genes of PRCImalA were assayed after two hours of exposure to mitomycin C, in both RMV8rare and a mutant derivative lacking the ϕRMV8 prophage. Data are shown as in Fig 4.
PRCI excision and expression are separately regulated
The variation in contact frequency and copy number between isolates of the same variant contrasted with the consistent differences in PRCI expression between RNA-seq and qRT-PCR experiments (Fig 4B, D) [12]. Correspondingly, previous passages of RMV7 genotypes, in which tvr loci with differing arrangements were independently introduced into a common parental genotype, found phenotypic and expression differences were stable over multiple days [12]. To test whether levels of excision were more variable, the topological arrangement of PRCIdnaN was assayed in the same samples (Fig 5B). This found substantial differences in the integration-excision dynamics of this element across isolates of both variants. As these samples were directly transferred between liquid cultures, this demonstrated the changes in PRCI excision did not depend on culturing on solid media. Hence differences in PRCI expression may be stable, despite stochastic variation in the topology of the element.
This implied the excision and transcription of PRCIs were differentially regulated, consistent with PRCI gene expression in at least some samples being highest in the variants in which these elements were most frequently integrated into the chromosome (Fig 4D, F). This was tested using RMV8, which carries both the ɸRMV8 prophage and PRCImalA [18]. RMV8rare cells were exposed to competence stimulating peptide (CSP) and mitomycin C (MMC), both of which activate the ɸRMV8 prophage [20]. These assays were conducted with RMV8rare and a mutant in which ɸRMV8 had been replaced with a chloramphenicol acetyltransferase (cat) resistance marker, to ascertain whether the PRCI’s activity was affected by the prophage (Fig 5C). Both the level of excision, and the concentration of circularised PRCI elements, rose in response to MMC, but not CSP. This behaviour was independent of the presence of a prophage within the same cell. This suggests the excision of PRCImalA is controlled by the MGE itself in response to similar stimuli that activate prophage.
A qRT-PCR assay was then used to test whether MMC also activated PRCImalA int and dnaC gene expression in both RMV8rare and RMV8rare ɸRMV8::cat (Fig 5D). An inconsistent rise in the transcription of both genes was recorded in MMC supplemented cultures, correlating with the elevated copy number of the element (Fig 5C). However, independently of treatment with MMC, transcription levels were consistently lower in the absence of the prophage (Fig 5D). This effect was greatest for the dnaC replication gene, the median expression of which was 4.8-fold lower in the absence of ɸRMV8 in unsupplemented media, and 3.4-fold lower in the absence of ɸRMV8 in MMC-supplemented cultures. This is consistent with the PRCI’s dependence on the intact prophage for its ability to transmit between host cells [49]. Therefore expression of the PRCI is regulated by additional factors, associated with the presence of prophage, that do not affect excision.
PRCIs do not cause a substantial fitness cost to their host cells
To test whether this regulation of PRCI activity limited their cost to their host cell, the growth of mutants in which PRCIs had been replaced with a Janus cassette [50] was compared to the corresponding unmodified genotypes in unsupplemented media (Fig 6A). Neither RMV7rare PRCIdnaN::Janus, nor RMV8rare PRCImalA::Janus, exhibited detectable changes in growth in unsupplemented media relative to the variant from which they were derived. By contrast, removal of the prophage ϕRMV8 substantially increased the growth of RMV8rare (Fig 6B), consistent with the behaviour of a similar RMV8 prophage null mutant [20]. The deletion of PRCImalA from this non-lysogenic mutant caused no additional change. Therefore PRCIs cause a substantially smaller impairment of host pneumococcus growth than prophage in the absence of an activating stimulus.
Each panel shows the 20 h growth curve of a set of pneumococcal genotypes, measured through assaying the optical density (OD600) at 30 min intervals. In each plot, the solid line shows the median, and the shaded region shows the full range of values observed across three biological replicates. (A) A comparison of the effect of removing PRCIdnaN from RMV7rare and removing PRCImalA from RMV8rare. (B) A comparison of the effect of removing ϕRMV8 from RMV8rare and RMV8rare PRCImalA::Janus. (C) A comparison of the effect of exogenous stimuli on the growth of RMV7rare and RMV7rare PRCIdnaN::Janus. (D) The effect of increasing concentrations of mitomycin C on RMV8rare and mutant derivatives lacking ϕRMV8, or PRCImalA, or both.
Growth assays were then used to test whether this changed when PRCIs were induced to excise from the chromosome by MMC during early exponential phase. Both RMV7rare and the corresponding PRCIdnaN::Janus mutant grew similarly in the presence of MMC, consistent with the MGE not reducing host cell fitness upon excision (Fig 6C). Although MMC did cause a dose-dependent reduction in the density of RMV8rare cultures, this sensitivity was substantially reduced in a non-lysogenic ϕRMV8::cat mutant (Fig 6D). By contrast, removal of PRCImalA did not affect MMC sensitivity in either the wild type or non-lysogenic RMV8 cells. Therefore stimulating the excision and circularisation of PRCIs neither adversely affect pneumococcal host cell growth, nor interfered with the lysis of cells following prophage induction.
Discussion
This application of both Hi-C and Pore-C methods to epigenetic variants of two S. pneumoniae genotypes provides an overview of the pneumococcal chromosome’s organisation. Both methods concurred that the chromosomal contacts were dominated by short-range interactions, consistent with a simple, locally-condensed structure. However, the resolution at which these could be studied was limited by the density of digestion and proximity re-ligation events. Future studies, using refined sample preparation protocols, may enable higher-resolution analyses of bacterial chromosomes.
At larger scales, some similarities were observed with data from B. subtilis, another species within the Bacillota phylum [41], and a recently-published study of S. pneumoniae R6 [51]. Both methods found evidence of an origin domain, indicative of this region having a globular conformation, which included constrained contact patterns (Fig 1D) that likely reflected the positioning of parS sites, as seen in B. subtilis [41] and S. pneumoniae [51]. The Pore-C data also identified a secondary diagonal signal of long-range interactions that was orthogonal to that formed by the contacts between neighbouring loci, previously determined as being indicative of a close association of the chromosome arms (Fig 1C) [41]. However, there was no evidence of the chromosome interaction domains that have been noted not only in B. subtilis [41], but also E. coli [52,53], Caulobacter crescentus [54] and Mycoplasma pneumoniae [55]. This absence of discrete domains concurs with the recent study of S. pneumoniae R6 [51]. It may be that the relatively small pneumococcal chromosome, with strong coding biases on each replichore [56], is capable of orchestrating concurrent transcription and replication without substantial higher-order chromosomal folding.
This comparatively simple structure may reflect the apparent paucity of NAPs in pneumococci relative to E. coli and B. subtilis, with only the histone-like protein HU (or HlpA) characterised in S. pneumoniae [57–60], and the relatively small number of MGEs in a typically pneumococcal genome [4]. The lack of characterised orthologues of many of the proteins involved in XS is consistent with the absence of condensed regions that form boundaries between large domains in other species [41], and the extensive disruption to pneumococcal chromosome conformation associated with the depletion of HlpA [51]. Nevertheless, the epigenetic variants of RMV7 and RMV8 were each distinguished by variable contact density at the sites of PRCI integration (Fig 2A). However, the approximately two-fold increases in contact density at these elements (Fig 2B) was consistent with the attP locus of the circularised form having a similar copy number to the chromosome itself (Figs 4E and 5A), effectively doubling the presence of the PRCI sequences. Hence the alternative explanation of condensation by XS was unlikely. While it is possible that the high density of short-range contacts identified chromosome-wide in our analyses prevents the identification of greater condensation of individual loci, the absence of XS also explains the absence of any detectable increase in contact density in AT-rich regions, or within other MGEs (Fig 3C, D).
Furthermore, a lack of XS is consistent with the continued expression, excision and activation of these PRCIs (Figs 4 and 5), and the larger ϕRMV8 prophage (Fig 6), even in the absence of inducing stimuli [18,20]. Such a difference in genome biology from B. subtilis and E. coli suggests a fundamentally distinct interaction between MGEs and host cells in these species. One possible explanation is that XS proteins have an important role in controlling the activity of plasmids [61]. Such elements are common in E. coli [62], found in ~10% of B. subtilis isolates [63], but rarer in S. pneumoniae [4]. Hence pneumococci may have adopted a different strategy for reducing the fitness cost or transmissibility of their primarily chromosomally-integrated MGEs.
That the only consistent significant difference in contact frequencies between the variants of RMV8 was caused by PRCImalA (Fig 2A) contrasted with the previous RNA-seq analysis, which only identified comC and ϕRMV8 as sites of significantly-differing transcriptional variation between RMV8domi and RMV8rare [18]. However, the excision of ϕRMV8 is linked to its increased expression, typically resulting in cell lysis [20]. By contrast the excision of PRCImalA is reversible, without substantial fitness cost to the host (Fig 6), meaning these episomes are always likely to be hotspots of variation in copy number even in live cells in the same culture, thereby resulting in differential contact densities. Additional significant differences in contact density were also identified across the genome in the Pore-C data, with greater variation in RMV7 than RMV8. This is consistent with the more widespread differences in transcription between RMV7domi and RMV7rare than between the RMV8 variants [12,18], although it may also reflect the higher and more even coverage across the RMV7 Pore-C datasets (S8 Fig) providing greater power for detecting such variation. Hence alterations in chromosomal conformation remains a feasible mechanism to explain how distinct patterns of epigenetic DNA modification can affect transcription. However, this observed correlation cannot establish causation without higher-resolution 3C data, and further experimental work that can change both the DNA methylation patterns, and the organisation of the genome.
However, the DNA modifications could not account for the differences in PRCI excision dynamics, which were variable between different isolates of the same variant (Fig 5A). This diversity likely represents the PRCIs engaging in “bet hedging” behaviour [12,64], as integrated elements are stably inherited following chromosomal replication, but are at risk of deletion by homologous recombination [65]. Hence the vertical inheritance of PRCIs is likely to be maximised by existing as both chromosomally-integrated and episomal forms. That the attB/attL ratios were below one suggested most attB sites contained a chromosomally-integrated form (Fig 4D), ensuring they will be passed to daughter cells in clonally-replicating populations. However, the attP levels were in the range 0.3-5, implying that circular forms were also widespread, meaning the PRCI can survive even if the chromosomally-integrated copy is deleted by recombination. Therefore these episomal dynamics can help explain the stable inheritance of PRCIs by pneumococcal strains over decades, in contrast to the transient associations between prophage and host genotypes [4].
This accounts for why it is adaptive for the PRCIs to increase their excision in response to the detection of RecA-coated nucleoprotein filaments (Fig 5C), which are generated by MMC, and mediate homologous recombination [20]. Yet PRCI excision does not cause the same fitness cost to the cell as prophage activation, because it cannot lyse the host in the absence of an intact “helper” phage (Fig 5D). Therefore it is also adaptive for expression to be regulated in response to prophage activity, as was apparent in RMV8. This observed interdependence between the MGEs explains the rise in PRCImalA transcription in mid-exponential phase (Fig 4F), as this is when ϕRMV8 is most active [66]. This distinction between the regulation of excision and activation means the former can occur stochastically, even in the absence of a prophage, as in RMV7. The observed inheritance of a metastable state suggests that the levels of excision are determined by the concentration of regulatory molecules being passed on from mother to daughter cells [67], although no mechanistic details were analysed in this study. Hence the discrepancy between the PRCI copy numbers detected in the Hi-C and Pore-C data is indicative of these genomes being dynamic systems, the behaviour of which is determined by metastable regulatory switches, which cannot by fully described by a single sequence alone.
Hence Pore-C will reveal additional heterogeneity within species, strains, and even clonal bacterial populations, with identical genomes being differentiated by their topology and copy number of individual elements. Although our analysis used data from two flow cells, one was sufficient to identify the key differences between the studied genotypes, meaning this method can be implemented by any laboratory with the capacity to use sequencing platforms on the scale of a MinION. Hence future applications of this method to species with a greater diversity of replicons, NAPs and XS systems promises to aid our understanding of the diversity of bacterial genome biology across species.
Methods
Culturing of S. pneumoniae
All S. pneumoniae strains (S1 Table) were cultured at 35°C in a 5% CO2 atmosphere. Culturing on solid media used Todd-Hewitt broth (Sigma-Aldrich) supplemented with 0.5% yeast extract (Sigma-Aldrich), 1.5% agar (Sigma-Aldrich) and 200 U ml−1 catalase (Sigma-Aldrich). To select mutant genotypes, media were additionally supplemented with antibiotics: streptomycin (Sigma-Aldrich) or kanamycin (Sigma-Aldrich) at 400 μg ml-1, or chloramphenicol (Sigma-Aldrich) at 4 μg ml-1.
Unless stated otherwise, liquid culturing utilised a ‘mixed’ media consisting of Todd-Hewitt broth supplemented with 0.5% yeast extract and Brain Heart Infusion broth (Sigma-Aldrich) mixed at a 2:3 ratio [12]. For comparing PRCI dynamics between variants, samples were collected during mid-exponential growth, at an optical density at 600 nm (OD600) of ~0.6. For measuring the PRCI responses to stimuli, either unsupplemented media, MMC (850 ng), or competence stimulating peptide (CSP; 8.75 µg) was added to 7 ml of bacteria grown to an OD600 of ~0.2. The added CSP was appropriate for inducing competence in the cultured genotype: CSP1 for RMV7, and CSP2 for RMV8. The cultures were then incubated at 35 ºC with 5% CO2 for 2 h. Subsequently, 5 ml of culture was used for RNA extraction, and 2 ml was centrifuged at 3220 g for 10 min to produce a cell pellet for DNA extraction.
To measure growth curves, 2 x 104 cells from titrated frozen glycerol stocks were grown in mixed liquid media in 96-well microtiter plates at 35 ºC with 5% CO2 for 20 h. For all experiments measuring bacterial sensitivity to inducing stimuli, 200 µl of bacterial cultures with an OD600 between 0.15 and 0.25 were mixed with MMC (10 ng), CSP (75 ng), or H2O2 (2 ng), then grown in microtiter plates at 35 ºC with 5% CO2 for 20 h. The OD600 was measured at 30 min intervals using a FLUOstar Omega microplate reader (BMG LABTECH). Three or four replicate wells were assayed for each tested genotype and condition in each experiment.
DNA purification and PCR amplification
Following overnight incubation, bacteria were pelleted by centrifugation at 3220 g for 10 min. Supernatants were discarded, and bacterial pellets were then resuspended in 480 μl of lysis buffer (Promega) and 120 μl of 30 mg ml-1 lysozyme (Sigma-Aldrich). Samples were incubated at 35 ºC for 45 min and centrifuged for 2 min at 8000 g. Bacterial genomic DNA was then extracted using the Wizard Genomic DNA Purification Kit (Promega), following the manufacturer’s instructions. For experiments requiring subsequent PCR product purification, 1000 ng of genomic DNA, 25 μl of 2x DreamTaq Master Mix (Thermo Fisher), 2 μl each of the 10 µM forward and reverse primers listed in S4 Table, and nuclease-free water (Thermo Fisher) were used for PCR amplification in a total reaction volume of 50 µl. Otherwise, 500 ng of genomic DNA, 7.5 μl of 2x DreamTaq Master Mix (Thermo Fisher), 1 μl each of a 10 μM forward and reverse primers (S2 Table), and nuclease-free water were combined in a total reaction volume of 15 μl.
Construction of the S. pneumoniae RMV8 PRCImalA mutant
To generate the PRCImalA::Janus construct, the ~1 kb region flanking upstream (UP) and downstream (DOWN) of PRCImalA were amplified by PCR using primers that inserted restriction enzyme sites on the internal sides of each region. The UP region was generated with primers malA_KO_UP_For and malA_KO_UP_Apal, whereas the DOWN region was generated with primers malA_KO_DOWN_For_BamHI and malA_KO_DOWN_Rev. The Janus cassette was amplified with primers that added corresponding restriction sites (S4 Table). Purified PCR amplicons were then obtained via gel electrophoresis with a 1% agarose gel dyed with SYBR Safe (Thermo Fisher) in Tris/Borate/EDTA buffer, followed by gel extraction with a GenElute Gel Extraction Kit (Sigma-Aldrich).
After extraction, purified PCR amplicons were digested at 37 ºC for 2 h with the appropriate restriction enzyme: ApaI (Promega) for UP, and BamHI (Promega) for DOWN. Digested UP and DOWN products were then mixed with the digested Janus cassette at a 2:2:1 ratio and incubated with T4 DNA ligase (New England Biolabs) at room temperature (RT) overnight. The ligation mix was then used as the template for a touchdown PCR, and the amplified PRCImalA::Janus construct was purified by separation through electrophoresis on a 1% agarose gel, and extraction using a QIAquick Gel & PCR Cleanup kit (Qiagen). To transform recipient bacteria (RMV8rare or RMV8rare ϕRMV8::cat) with the PRCImalA::Janus construct, a 1 ml sample of a bacterial culture at an OD600 of ~0.2 was incubated with 5 µl of 500 mM CaCl2 (Sigma-Aldrich), 1250 ng of CSP2, and 1 μg of the PRCImalA::Janus construct at 35°C for 2 h. Samples were then spread onto solid media supplemented with antibiotics (400 μg ml-1 of either kanamycin or streptomycin) to select for resistant transformants that grew after incubation at 35 ºC with 5% CO2 for at least 16 h.
RNA purification and reverse transcription
A 5 ml sample of a bacterial culture was mixed with an equal volume of RNAprotect (Qiagen) and incubated at room temperature for 5 min. RNA samples were extracted from this mixture using the SV Total Isolation System (Promega) according to the manufacturer’s instructions, with the exception that the initial extraction of RNA required treating samples with 480 μL 30 mg mL−1 lysozyme (Promega) for 30 min at 35 °C. The purified RNA was treated with amplification-grade DNase I (Invitrogen), then used in reverse transcription reactions using the First-Strand III cDNA synthesis kit (Invitrogen). Each reaction used 0.2 μg RNA, 100 units of the SuperScript III reverse transcriptase, 1 μL of 100 μM random hexamer primers (Thermo Fisher Scientific) and 1 μL of 10 mM dNTP mix (Bioline). After a 5 min annealing period at 25 °C, reverse transcription was conducted at 50 °C for 30 min, then 55°C for 30 min. The reverse transcriptase was then heat inactivated at 70 °C for 15 min.
Quantitative PCR
The oligonucleotides used for qPCR were designed to generate amplicons that were 150–200 bp in length, and are listed in S4 Table. Template cDNA was diluted in a 1:25 ratio with nuclease-free water (Qiagen). Each 15 μL reaction used 3.75 μL of DNA template (either genomic DNA or cDNA), 0.75 μL of 10 μM forward and reverse primer solutions (Invitrogen), 7.5 μL of PowerUp SYBR Green Master Mix (Thermo Fisher Scientific) and 2.25 μL of DNase-free and RNase-free water. Reactions were run using MicroAmp Optical 96-well reaction plates (Thermo Fisher Scientific) and the QuantStudio 7 Flex Real-Time PCR System (Applied Biosystems). The mixtures were initially heated to 50 °C for 2 min. Subsequently, 40 amplification cycles were run, each consisting of denaturation at 95 °C for 15 s, followed by annealing and elongation at 60 °C for 1 min. The purity of the amplicon at the end of the reaction was assessed through a melt curve analysis. Three technical replicate measurements were made for each of three biological replicates, unless otherwise specified.
The expression of PRCImalA genes was quantified using the ∆∆Ct method. The rpoA gene was used as the reference gene, against which target gene expression levels were normalised [12]. The ∆Ct values were calculated as the difference between the mean Ct of the target gene and rpoA across technical replicates within the same biological replicate. The ∆∆Ct values were then calculated relative to the reference ∆Ct, which was that of RMV8rare at an OD600 of 0.2. The fold gene expression difference between genotypes was then quantified as 2-∆∆Ct.
For the quantification of PRCI dynamics, the amplicons produced across the integration site, and the rpoA reference gene, were generated from genomic DNA (S2 Table). The concentration of each purified PCR product was measured using a Qubit Broad Range Kit and Qubit 4 Fluorometer (Thermo Fisher). The Ct values for copy numbers of 3x103, 3x104, 3x105 and 3x106 of these amplicons were used to generate a standard curve to convert the Ct values from experiments into absolute copy numbers of DNA molecules.
Construction of Hi-C and Pore-C sequencing libraries
Three independent cultures of each of the four variants (S. pneumoniae RMV7domi, RMV7rare, RMV8domi and RMV8rare) were grown to mid-exponential phase, corresponding to an OD600 of ~0.60. The samples processed for Illumina Hi-C and Pore-C analyses were cultured independently. To crosslink DNA and proteins, cells were mixed with 1% v/v formaldehyde (Sigma-Aldrich) for 30 min at RT, followed by a 30 min incubation at 4 ºC. The formaldehyde was then quenched through the addition of 1 ml 2.5 M glycine (Sigma-Aldrich), which was incubated at RT for 5 min, then at 4 ºC for 15 min. Fixed cells were then collected by centrifugation at 3220 g for 10 min at 4 ºC, flash frozen on dry ice, and stored at -80 ºC, to permit consistent processing of all samples in parallel.
Frozen pellets were thawed on ice, washed with 5 ml water, and centrifuged at 3220 g for 10 min. Pellets were then resuspended in 50 μl 30 mg ml-1 lysozyme, followed by incubation at 37 ºC for 30 min. Both 10% v/v sodium dodecyl sulphate (SDS; Fisher Bioreagents) and 10% v/v Triton-X (Sigma-Aldrich) were added, followed by 10 min RT shaking incubation, then 10 min incubation at 55 ºC. A 400 µl sample of the lysed cells was then combined with 130 µl of digestion mix, comprising NEB CutSmart buffer (NEB), 10% v/v Triton-X, and 1 U μl-1 of either NlaIII (NEB; Illumina Hi-C library preparation), or MluCI (NEB; Pore-C library preparation). The chromatin was digested at 37 ºC for 3 h. The NlaIII enzyme was then heat inactivated through a 65 ºC incubation for 20 min. The less stable MluCI enzyme did not require inactivation.
Digested chromatin was then ligated through the addition of 460 μL ligation mix, comprising ligation buffer (NEB), 0.1 µg μl-1 recombinant albumin (Promega), and 20 U μl-1 T4 DNA ligase (NEB). This mixture was incubated at RT for 1 h, followed by a 4 ºC incubation for 48 h. After ligation, the DNA-protein crosslinks were reversed through incubation of the mixture with 500 μl 20% Tween-20 (Sigma-Aldrich), 100 µl 10% v/v SDS, and 100 µl 20 μg μl-1 proteinase K (Sigma-Aldrich) at 65 ºC overnight. For the Pore-C library preparation, another round of protein digestion was performed by adding 118 μl 20 mg ml-1 pronase (Sigma-Aldrich) and 235 μl 10% v/v SDS, followed by a 63 ºC incubation for 1 h.
Following de-cross-linking, DNA was purified with 2.5 ml of a chilled 25:24:1 ratio mixture of phenol:chloroform:isoamyl alcohol (Sigma-Aldrich), followed by separation by centrifugation at 3220 g for 30 min at 4 ºC. DNA was then precipitated from the aqueous phase of each sample using 20 μl 5 M NaCl (Sigma Aldrich), 200 μl 3 M sodium acetate (EMD Millipore Corporation) and 6 ml isopropanol (Sigma-Aldrich). These mixtures were incubated at -80 ºC overnight, then centrifuged at 3220 g for 30 min at 4 ºC. The resulting pellets were washed with 1 ml of chilled 70% ethanol (VWR Chemicals BDH), followed by a second centrifugation (3220 g, 15 min, 4 ºC). All remaining ethanol was removed through pipetting and drying at 65 ºC in a heat block for 15 min. The DNA was the resuspended in 100 µl nuclease-free water.
To enrich for longer molecules likely to contain multiple ligated DNA fragments, a 100 μl sample of each preparation was mixed with 50 μl of KAPA Pure Beads (Roche) through vortexing, followed by incubation together at RT for 10 min. The beads were then magnetically captured, and the supernatant discarded. The beads were washed twice using 200 μl 80% ethanol and dried at RT for 4 min. DNA was then eluted from the beads through a 10 min incubation in 100 µl of pre-heated nuclease-free water at 37 ºC. The concentration and size distribution of the DNA molecules was checked using an Agilent 2200 Tape Station (Agilent Technologies) and NanoDrop 1000 (Thermo Fisher).
DNA sequencing
Illumina Hi-C data were generated at the Wellcome Sanger Institute. Libraries were prepared using standard approaches [68], enriching for a fragment size of approximately 450 bp. Libraries corresponding to the three biological replicates for each of the four variants were then combined into a 12-plex library, which was sequenced on a single lane of the Illumina Novaseq 6000 platform. This generated 152 nt paired end reads.
Pore-C data were generated at Imperial College London using the native 12-plex barcoding library preparation kit (Oxford Nanopore Technologies, ONT; product SQK-NBD114.24) to process ~400 ng of DNA from each of the three biological replicates for each of the four variants. Each DNA sample was individually treated to ensure fragments were blunt ended with the NEBNext FFPE DNA Repair Mix (NEB) and the NEBNext Ultra II End Repair/dA-tailing Module (NEB). Samples then underwent native barcode ligation with NEB Blunt/TA Ligase Master Mix, before being pooled together. The NEBNext Quick Ligation Module (NEB) was then used to attach sequencing adapters. The completed libraries were then purified from the mixture using the short fragment buffer to obtain DNA fragments of all sizes. The concentration of the final DNA library was measured on a Qubit 4 fluorometer (Thermo Fisher). Two flow cells (ONT product FLO-MIN114) were each used to sequence 15 fmol of the final library using the MinION platform, recording data with a minimum quality score of 9 using high-accuracy basecalling with Guppy version 6.4.6 (ONT).
Bioinformatic processing
The Illumina Hi-C data were processed using version 2.1.0 of the hic workflow [69] implemented using Nextflow version 24.02.0 [70]. This workflow filtered the Illumina reads using FastQC [71], then mapped reads to the appropriate reference genome (S1 Table) using bowtie2 [72], and generated raw contact maps using iced, as implemented within the HiC-Pro pipeline [73]. Genome-wide contact maps between loci were then generated at resolutions of 100, 500, 1000, 5000, 10000 and 50000 bp using cooler [74]. A quality control report was then generated with MultiQC [75], with additional visualisation files produced with juicer [76].
The Pore-C data were processed using version 1.1.0 of the wf-pore-c workflow [77] implemented using Nextflow version 23.10.1 [70]. This workflow mapped reads to the appropriate reference genome (S1 Table) using minimap2 [78] with the non-default settings “-x map-ont -k 7 -w 5”. These alignments were then processed with Pore-c-py [77] and pairtools [39] to generate raw contact maps, which were processed with cooler [74] and juicer [76], as for the Hi-C data.
Differential analysis of contact matrices
Analyses of differential contacts used matrices calculated at a resolution of 10 kb, with all self interactions along the matrix diagonals removed, and the ivr and tvr loci excluded, as they contained known rearrangements.
The diffHiC package [42] was applied using the recommended workflow [79], with modifications to parameter settings as appropriate for these datasets. Loci with fewer than five contacts across all replicates in each comparison were first removed, as there was low power to identify differential contacts at such positions. The counts across the filtered dataset were then normalised across replicates within each comparison between variants using a cyclic LOESS process, run for ten iterations. The biological coefficient of variation was then calculated across different contact densities, which informed the dispersion of the negative binomial generalised linear model that was fitted to the data. This used a tagwise, rather than trended, dispersion calculation, as there was not a strong relationship between variance and contact densities in some datasets. Differential contact densities between loci in the two variants were then identified from this model using quasi F-tests. The resulting p values were subject to a Benjamini-Hochberg correction for multiple testing to generate q values.
Similarly, analysis with the multiHiCcompare package [43] used a modified version of the recommended workflow [80]. This started by first both removing any loci for which the minimum frequency across replicates was zero. The remaining data were normalised using a cyclic LOESS process run for ten iterations. Significant differences were inferred through using a negative binomial exact test, as implemented in edgeR [81], with a Benjamini-Hochberg correction for multiple testing.
Statistical modelling of contact densities
The 10 kb resolution contact matrices were analysed using a generalised linear mixed effects model with the structure:
The terms all represent fixed effects, except those with the ‘|’ notation, which denotes a grouping variable for a random effect. Asterisks indicate an interaction between fixed effects. A zero-inflated Poisson distribution, with a logarithmic link function, was used to fit the model with the glmmTMB package [82]. Parameter estimates reproducibly converged using the default settings. However, attempts analyse the data using an equivalent model with a negative binomial link function, which would account for any overdispersion in the data, failed to converge on robust parameter estimates. Analysis of parameter estimates used the sjPlot package [83], model fits were assessed with the performance package [84], and data processing used the tidyverse packages [85].
Each interaction was categorised by the distance between the loci: < 12.5 kb (i.e., neighbouring 10 kb loci); 12.5-25 kb; 25–50 kb, and >50 kb. Only loci within 100 kb of each other were considered, to ensure parameter estimation was feasible, and avoid the model fitting primarily to the noisy, sparse interactions between distant loci.
Loci were also classified based on the annotation of the corresponding reference genomes. In both genomes, the four rRNA operons were annotated, as the difficulty of mapping to these relatively GC-rich repeats reduces the detection of contacts involving these loci. In RMV7, four MGEs were annotated: PRCIdnaN, PRCIuvrA, ICErpsI, and Tn916 [12]. In RMV8, three MGEs were annotated: the prophage remnant CIPhR, prophage ɸRMV8 and PRCImalA [18]. These classifications were fitted separately for each variant, to enable intervariant differences to be inferred at these loci.
The additional fixed effects analysed the properties of each DNA locus, which were calculated using biopython [86]. The effect of the distribution of NlaIII and MluCI sites was accounted for through calculating the mean number of restriction sites across the two interacting loci, and dividing this value by ten. The GC content was calculated as the mean proportion of bases in the two interacting loci, which was scaled upwards by a factor of ten. Similarly, the distance to the origin was calculated as the proportion of the genome between the modelled site and the origin of replication. This was included because the DNA-protein crosslinks were fixed in mid-exponential phase, and therefore the detected contact frequencies were higher near the origin of replication, as some cells would have initiated, but not completed, chromosomal replication. These values were scaled upwards by a factor of ten when included in the model. These adjustments were necessary for numerical stability of the model fitting process.
The identity of the biological replicate was included as a random effect, to correct for differences in the amount of sequence data generated for each dataset. A second random effect estimated the level of contact frequency for each locus in the dominant variant, and the difference in contact frequencies at each of these loci in the rare variant.
Supporting information
S1 Fig. A comparison of the data generated from Pore-C libraries produced using different protocols.
The line plots show the number of nanopores on a flow cell generating sequence data over the course of three sequencing runs. This declined over time as pores became blocked by DNA that remained cross-linked to proteins. The MinION device attempts to clear such blockages through regular transient voltage reversals, resulting in the jagged appearance of the curve. The lines are coloured according to whether the sequencing library was prepared using de-cross-linking with proteinase K only (one replicate), or proteinase K and pronase (two replicates). This demonstrated the additional pronase treatment substantially reduced the rate of pore blocking, resulting in an increased yield of sequence data.
https://doi.org/10.1371/journal.ppat.1013392.s001
(TIF)
S2 Fig. Statistical description of the Illumina Hi-C data by the HiC-Pro pipeline.
(A) Analysis of paired read alignment to the reference genome. This shows the majority of processed pairs were successfully mapped to the genome as unique, paired alignments. (B) Analysis of the aligned pairs. Most reads were split between three categories. The “dangling end pairs” correspond to reads from the ends of the same restriction fragment, which represents a failure to ligate a DNA molecule into a longer concatemer. The “relegation pairs” correspond to either to DNA molecules that were not digested, or neighbouring fragments that were ligated back together into their original form. The “valid interaction pairs” correspond to pairs that allow for a digestion and relegation to be inferred. A high proportion of these read pairs have the forward-reverse relative orientation expected of undigested DNA.
https://doi.org/10.1371/journal.ppat.1013392.s002
(TIF)
S3 Fig. Analysis of the Pore-C mapping data generated by pairtools.
(A) Processing of sequence reads, split into pairs of restriction fragments. The modal categories comprised pairs in which only one of the two sequences could be mapped to the reference genome. Nevertheless, there were more pairs of sequences that could both be mapped to the reference than pairs of sequences in which neither fragment mapped to the reference. (B) Details of read mapping. All pairs were classified by the mapping status of the constituent reads. The unmapped reads from panel (A) were divided between categories in which at least one read was not able to be mapped to the reference genome (N), or mapped to multiple sites (M). The single-sided mapped reads were split into the four categories in which only one read could be uniquely mapped (U).Yet many read pairs consisted of paired fragments that were both uniquely mapped (UU), enabling inference of contacts across the genome.
https://doi.org/10.1371/journal.ppat.1013392.s003
(TIF)
S4 Fig. Line plots showing the mapping orientations of read pairs from the Illumina Hi-C datasets at different separations.
The read pairs were categorised into 100 bins, based on the distribution of logarithmically-scaled distances between the mapping locations of the pair members. For each bin, the proportions of read pairs mapping in the four possible orientations (both to the positive strand; both to the negative strand; the forward read mapping to the positive strand and the reverse read mapping to the negative strand; or the forward read mapping to the negative strand and the reverse read mapping to the positive strand) were calculated. The different orientations are represented by the colour of the lines. As Illumina read pairs are generated by sequencing initiated from each end of a DNA molecule, reads generated from undigested templates are expected to map to different strands of the genome (i.e., have a + /- or -/ + orientation). These orientations dominate over short distances, suggesting read pairs mapping a short distance from one another were generated from uncut genomic DNA. The dashed lines represent estimates of the convergence distance, at which digestion and religation is sufficiently frequent for the four orientations to each reach an approximately equal proportion of ~0.25. Each panel shows the data from an individual replicate.
https://doi.org/10.1371/journal.ppat.1013392.s004
(TIF)
S5 Fig. Identification of the convergence distance in the Pore-C data.
Data are shown as in S4 Fig. As Nanopore reads are generated as continuous sequence following initiation at one end of a molecule, data generated from an undigested template is expected to have a + /+ or -/- orientation. Therefore these orientations dominate over short distances, in contrast to the analyses shown in S4 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s005
(TIF)
S6 Fig. Spatial distribution of the high-level short-range interactions between loci highlighted in Fig 1B.
The absence of any spatial clustering of the short-range highly-interacting loci suggested they could not be explained by a process localised to a specific region, and therefore were most parsimoniously explained as artefactual products resulting from incomplete digestion of DNA.
https://doi.org/10.1371/journal.ppat.1013392.s006
(TIF)
S7 Fig. Contact matrices for individual Illumina Hi-C replicates, calculated at a resolution of 10 kb.
Insufficient long-range interactions were inferred from the first sample extracted from S. pneumoniae RMV7domi to generate a matrix. Each other matrix is symmetrical, with both the horizontal and vertical axes representing the length of the genome. Each cell is coloured to represent the frequency of interactions between the corresponding loci. Keys are provided for each replicate individually, to adjust the visualisation to the amount of data generated.
https://doi.org/10.1371/journal.ppat.1013392.s007
(TIF)
S8 Fig. Contact matrices for individual Pore-C replicates, calculated at a resolution of 10 kb.
Data are shown as in S7 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s008
(TIF)
S9 Fig. Processing of Hi-C data from S. pneumoniae RMV7 for analysis with diffHiC.
All analyses were conducted using a locus size of 10 kb. (A) Histogram showing the distribution of contact frequencies across all loci and replicates as mean log2 counts per million reads. (B) MA plot comparing the mean log2 counts per million reads (denoted A) against the ratio of log2 counts per million reads in the rare variant relative to the dominant variant (denoted M) for a representative pair of biological replicates. The density of points is represented by the blue shading. The LOESS regression line is shown in red. (C) MA plot of the same data after normalisation. (D) Relationship between the biological coefficient of variation, relating to the dispersion parameter of the negative binomial distribution used to fit the generalised linear model, and the mean log2 counts per million reads. (E) Relationship between the quasi-likelihood dispersion and the mean log2 counts per million reads estimated by fitting the generalised linear model to the contact frequency data. (F) MA plot of normalised data, highlighting loci found to have a significantly differing density of contacts between the two variants.
https://doi.org/10.1371/journal.ppat.1013392.s009
(TIF)
S10 Fig. Processing of Pore-C data from S. pneumoniae RMV7 for analysis with diffHiC.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S9 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s010
(TIF)
S11 Fig. Processing of Hi-C data from S. pneumoniae RMV8 for analysis with diffHiC.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S9 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s011
(TIF)
S12 Fig. Processing of Pore-C data from S. pneumoniae RMV8 for analysis with diffHiC.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S9 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s012
(TIF)
S13 Fig. Processing of Hi-C data from S. pneumoniae RMV7 for analysis with multiHiCcomparison.
All analyses were conducted using a locus size of 10 kb. (A) MD plots for each pairwise comparison between replicates after joint normalisation by a cyclic LOESS process. Each plot shows the ratio of log2 counts per million reads in one replicate relative to another (denoted M), relative to the distance between the interacting loci (denoted D), enumerated as the difference between the indices of the loci. (B) Composite MD plot summarising the output of the test for differential contact intensities across all replicates. The position of points on the vertical axis represents the log2 fold difference between the variants, with points higher on the axis when the contact density is higher in the rare variant relative to the dominant variant. Points are coloured according to the categorisation of their false discovery rate (i.e., q value), calculated after a Benjamini-Hochberg correction for multiple testing.
https://doi.org/10.1371/journal.ppat.1013392.s013
(TIF)
S14 Fig. Processing of Pore-C data from S. pneumoniae RMV7 for analysis with multiHiCcomparison.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S13 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s014
(TIF)
S15 Fig. Processing of Hi-C data from S. pneumoniae RMV8 for analysis with multiHiCcomparison.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S13 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s015
(TIF)
S16 Fig. Processing of Pore-C data from S. pneumoniae RMV8 for analysis with multiHiCcomparison.
All analyses were conducted using a locus size of 10 kb. Data are shown as in S13 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s016
(TIF)
S17 Fig. Heatmaps comparing the contact frequency patterns between variants.
In each heatmap, each column is a different replicate dataset from either the dominant (domi) or rare variant. Each row represents a locus. The colour of each cell represents the intensity of contacts at a locus in a replicate following normalisation as part of the diffHiC analyses. The variants are clustered based on their similarity, as shown by the dendrograms. The nodes of the dendrogram are labelled with the corresponding bootstrap support values, expressed as percentages. The panels show (A) Illumina Hi-C data for RMV7 (B) Pore-C data for RMV7 (C) Illumina Hi-C data for RMV8 (D) Pore-C data for RMV8.
https://doi.org/10.1371/journal.ppat.1013392.s017
(TIF)
S18 Fig. Validating the differences in contact frequencies in PRCIs between variants.
(A) Contact frequency matrices at the tvr loci of RMV7 and RMV8 at a resolution of 1 kb. Data are shown as in Fig 1D. Each rare variant dataset shows distinctive off-diagonal contacts that are diagnostic of rearrangements within the tvr loci, relative to the dominant variant reference genome. This confirms that the dominant and rare variant datasets have been correctly identified. (B) Contact frequency matrices within PRCIdnaN, for the RMV7 datasets, or within PRCImalA, for the RMV8 datasets, at a resolution of 1 kb. Data are shown as in Fig 1D. (C) Scatterplots comparing the contact frequencies within PRCIdnaN and PRCImalA between the epigenetically-distinct variants. The cumulative contact frequencies between each 1 kb locus were calculated across the three biological replicates for each combination of methodology and variant. These values were normalised by dividing them by the total number of contacts inferred in the corresponding datasets. These normalised contact frequencies were compared between the dominant and rare variants for the loci within PRCIdnaN, in RMV7, and PRCImalA, in RMV8. The comparisons were segregated by the distance between the interacting loci. The dashed lines are the lines of identity. These simple comparisons concur with the diffHiC and multiHiCcomparison analyses that found the Hi-C and Pore-C data implied opposing changes in the frequencies of contacts within these PRCIs between the pairs of variants.
https://doi.org/10.1371/journal.ppat.1013392.s018
(TIF)
S19 Fig. Violin plots showing the distribution of Pore-C read lengths, categorised by whether or not a fragment mapped to a differentially-active PRCI: either PRCIdnaN (in RMV7) or PRCImalA (in RMV8).
https://doi.org/10.1371/journal.ppat.1013392.s019
(TIF)
S20 Fig. Plots summarising the fixed effect coefficient estimates from fitting the generalised linear mixed effects models to contact matrices calculated at a resolution of 10 kb.
Points represent the maximum likelihood estimate of parameter values, and are coloured by whether they increase (blue) or decrease (red) contact frequencies. The error bars represent the 95% confidence intervals of the estimates. The asterisks indicate statistically significant deviations from one: * denotes p < 0.05; ** denotes p < 0.01, and *** denotes p < 0.001. (A) Model fit to the RMV7 Hi-C data. (B) Model fit to the RMV7 Pore-C data. (C) Model fit to the RMV8 Hi-C data. (D) Model fit to the RMV8 Pore-C data.
https://doi.org/10.1371/journal.ppat.1013392.s020
(TIF)
S21 Fig. Evaluation of the quality of the generalised linear mixed effects model fits to the RMV7 Hi-C data.
These plots, generated by the R package performance, test whether the model performs poorly in reproducing observed values, whether it is mis-specified, or whether there are difficulties in estimating individual parameter values.
https://doi.org/10.1371/journal.ppat.1013392.s021
(TIF)
S22 Fig. Evaluation of the quality of the generalised linear mixed effects model fits to the RMV7 Pore-C data.
Data are shown as in S21 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s022
(TIF)
S23 Fig. Evaluation of the quality of the generalised linear mixed effects model fits to the RMV8 Hi-C data.
Data are shown as in S21 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s023
(TIF)
S24 Fig. Evaluation of the quality of the generalised linear mixed effects model fits to the RMV8 Pore-C data.
Data are shown as in S21 Fig.
https://doi.org/10.1371/journal.ppat.1013392.s024
(TIF)
S25 Fig. Comparison of observed data with model predictions.
For each model, the six panels show the comparisons between the observed and predicted contact frequencies for each locus across the biological replicates. Points are coloured by the distance between the interacting loci. (A) Model predictions using the RMV7 Hi-C data. Models were not fitted to the first S. pneumoniae RMV7domi Hi-C replicate, which generated relatively little data. (B) Model predictions using the RMV7 Pore-C data. (C) Model predictions using the RMV8 Hi-C data. (D) Model predictions using the RMV8 Pore-C data.
https://doi.org/10.1371/journal.ppat.1013392.s025
(TIF)
S2 Table. Accession codes for Hi-C and Pore-C sequence data.
https://doi.org/10.1371/journal.ppat.1013392.s027
(DOCX)
S3 Table. Number of restriction sites in the genomes of S. pneumoniae RMV7domi and RMV8domi, and PRCIdnaN (within the RMV7domi genome) and PRCImalA (within the RMV8domi genome).
https://doi.org/10.1371/journal.ppat.1013392.s028
(DOCX)
S4 Table. Oligonucleotides used in this study.
https://doi.org/10.1371/journal.ppat.1013392.s029
(DOCX)
Acknowledgments
We thank Prof. Juanma Vaquerizas for helpful discussions about Hi-C data generation and analysis, and the Bespoke team at the Wellcome Sanger Institute for generating the Hi-C sequencing libraries.
References
- 1. Croucher NJ, Løchen A, Bentley SD. Pneumococcal vaccines: host interactions, population dynamics, and design principles. Annu Rev Microbiol. 2018;72:521–49. pmid:30200849
- 2. Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, et al. Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol. 2007;189(22):8186–95. pmid:17675389
- 3. Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. 2010;11(10):R107. pmid:21034474
- 4. Croucher NJ, Coupland PG, Stevenson AE, Callendrello A, Bentley SD, Hanage WP. Diversification of bacterial genome content through distinct mechanisms over different timescales. Nat Commun. 2014;5:5471. pmid:25407023
- 5. D’Aeth JC, van der Linden MP, McGee L, de Lencastre H, Turner P, Song J-H, et al. The role of interspecies recombination in the evolution of antibiotic-resistant pneumococci. Elife. 2021;10:e67113. pmid:34259624
- 6. Johnston C, Martin B, Granadel C, Polard P, Claverys J-P. Programmed protection of foreign DNA from restriction allows pathogenicity island exchange during pneumococcal transformation. PLoS Pathog. 2013;9(2):e1003178. pmid:23459610
- 7. Kwun MJ, Oggioni MR, De Ste Croix M, Bentley SD, Croucher NJ. Excision-reintegration at a pneumococcal phase-variable restriction-modification locus drives within- and between-strain epigenetic differentiation and inhibits gene acquisition. Nucleic Acids Res. 2018;46(21):11438–53. pmid:30321375
- 8. Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD, Peterson S, et al. Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science. 2001;293(5529):498–506. pmid:11463916
- 9. Manso AS, Chai MH, Atack JM, Furi L, De Ste Croix M, Haigh R, et al. A random six-phase switch regulates pneumococcal virulence via global epigenetic changes. Nat Commun. 2014;5:5055. pmid:25268848
- 10. Kwun MJ, Oggioni MR, Bentley SD, Fraser C, Croucher NJ. Synergistic activity of mobile genetic element defences in Streptococcus pneumoniae. Genes (Basel). 2019;10(9):707. pmid:31540216
- 11. Furi L, Crawford LA, Rangel-Pineros G, Manso AS, De Ste Croix M, Haigh RD, et al. Methylation warfare: interaction of pneumococcal bacteriophages with their host. J Bacteriol. 2019;201(19):e00370-19. pmid:31285240
- 12. Kwun MJ, Ion AV, Oggioni MR, Bentley SD, Croucher NJ. Diverse regulatory pathways modulate bet hedging of competence induction in epigenetically-differentiated phase variants of Streptococcus pneumoniae. Nucleic Acids Res. 2023;51(19):10375–94. pmid:37757859
- 13. Li J, Li J-W, Feng Z, Wang J, An H, Liu Y, et al. Epigenetic switch driven by DNA inversions dictates phase variation in Streptococcus pneumoniae. PLoS Pathog. 2016;12(7):e1005762. pmid:27427949
- 14. Durmort C, Ercoli G, Ramos-Sevillano E, Chimalapati S, Haigh RD, De Ste Croix M, et al. Deletion of the zinc transporter lipoprotein AdcAII causes hyperencapsulation of Streptococcus pneumoniae associated with distinct alleles of the type i restriction-modification system. mBio. 2020;11(2):e00445-20. pmid:32234814
- 15. De Ste Croix M, Mitsi E, Morozov A, Glenn S, Andrew PW, Ferreira DM, et al. Phase variation in pneumococcal populations during carriage in the human nasopharynx. Sci Rep. 2020;10(1):1803. pmid:32019989
- 16. Phillips ZN, Trappetti C, Van Den Bergh A, Martin G, Calcutt A, Ozberk V, et al. Pneumococcal phasevarions control multiple virulence traits, including vaccine candidate expression. Microbiol Spectr. 2022;10(3):e0091622. pmid:35536022
- 17. Agnew HN, Atack JM, Fernando ARD, Waters SN, van der Linden M, Smith E, et al. Uncovering the link between the SpnIII restriction modification system and LuxS in Streptococcus pneumoniae meningitis isolates. Front Cell Infect Microbiol. 2023;13:1177857. pmid:37197203
- 18. Kwun MJ, Ion AV, Cheng H-C, D’Aeth JC, Dougan S, Oggioni MR, et al. Post-vaccine epidemiology of serotype 3 pneumococci identifies transformation inhibition through prophage-driven alteration of a non-coding RNA. Gemome Med. 2022;14:144.
- 19. Buitrago D, Labrador M, Arcon JP, Lema R, Flores O, Esteve-Codina A, et al. Impact of DNA methylation on 3D genome structure. Nat Commun. 2021;12(1):3243. pmid:34050148
- 20. Kwun MJ, Ion AV, Apagyi KJ, Croucher NJ. Chromosomal curing drives an arms race between bacterial transformation and prophage. bioRxiv. 2024.
- 21. Dorman CJ. H-NS, the genome sentinel. Nat Rev Microbiol. 2007;5(2):157–61. pmid:17191074
- 22. Navarre WW. The impact of gene silencing on horizontal gene transfer and bacterial evolution. Adv Microb Physiol. 2016;69:157–86. pmid:27720010
- 23. Wiechert J, Filipchyk A, Hünnefeld M, Gätgens C, Brehm J, Heermann R, et al. Deciphering the rules underlying xenogeneic silencing and counter-silencing of Lsr2-like proteins using CgpS of corynebacterium glutamicum as a model. mBio. 2020;11(1):e02273-19. pmid:32019787
- 24. Navarre WW, McClelland M, Libby SJ, Fang FC. Silencing of xenogeneic DNA by H-NS-facilitation of lateral gene transfer in bacteria by a defense system that recognizes foreign DNA. Genes Dev. 2007;21(12):1456–71. pmid:17575047
- 25. Seah NE, Warren D, Tong W, Laxmikanthan G, Van Duyne GD, Landy A. Nucleoprotein architectures regulating the directionality of viral integration and excision. Proc Natl Acad Sci U S A. 2014;111(34):12372–7. pmid:25114241
- 26. Perez-Rueda E, Ibarra JA. Distribution of putative xenogeneic silencers in prokaryote genomes. Comput Biol Chem. 2015;58:167–72. pmid:26247404
- 27. Winardhi RS, Fu W, Castang S, Li Y, Dove SL, Yan J. Higher order oligomerization is required for H-NS family member MvaT to form gene-silencing nucleoprotein filament. Nucleic Acids Res. 2012;40(18):8942–52. pmid:22798496
- 28. Qu Y, Lim CJ, Whang YR, Liu J, Yan J. Mechanism of DNA organization by Mycobacterium tuberculosis protein Lsr2. Nucleic Acids Res. 2013;41(10):5263–72. pmid:23580555
- 29. Tendeng C, Soutourina OA, Danchin A, Bertin PN. MvaT proteins in Pseudomonas spp.: a novel class of H-NS-like proteins. Microbiology (Reading). 2003;149(Pt 11):3047–50. pmid:14600217
- 30. Wang W, Li G-W, Chen C, Xie XS, Zhuang X. Chromosome organization by a nucleoid-associated protein in live bacteria. Science. 2011;333(6048):1445–9. pmid:21903814
- 31. Seid CA, Smith JL, Grossman AD. Genetic and biochemical interactions between the bacterial replication initiator DnaA and the nucleoid-associated protein Rok in Bacillus subtilis. Mol Microbiol. 2017;103(5):798–817. pmid:27902860
- 32. Solano-Collado V, Hüttener M, Espinosa M, Juárez A, Bravo A. MgaSpn and H-NS: two unrelated global regulators with similar DNA-binding properties. Front Mol Biosci. 2016;3:60. pmid:27747214
- 33. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295(5558):1306–11. pmid:11847345
- 34. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–93. pmid:19815776
- 35.
Ulahannan N, Pendleton M, Deshpande A, Schwenk S, Behr JM, Dai X, et al. Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure. 2019. https://doi.org/10.1101/833590
- 36. Li Z, Long Y, Yu Y, Zhang F, Zhang H, Liu Z, et al. Pore-C simultaneously captures genome-wide multi-way chromatin interaction and associated DNA methylation status in Arabidopsis. Plant Biotechnol J. 2022;20(6):1009–11. pmid:35313066
- 37. Zhong J-Y, Niu L, Lin Z-B, Bai X, Chen Y, Luo F, et al. High-throughput Pore-C reveals the single-allele topology and cell type-specificity of 3D genome folding. Nat Commun. 2023;14(1):1250. pmid:36878904
- 38. Open2C, Abdennur N, Fudenberg G, Flyamer IM, Galitsyna AA, Goloborodko A, et al. Pairtools: From sequencing data to chromosome contacts. PLoS Comput Biol. 2024;20(5):e1012164. pmid:38809952
- 39. Attaiech L, Minnen A, Kjos M, Gruber S, Veening J-W. The ParB-parS chromosome segregation system modulates competence development in Streptococcus pneumoniae. mBio. 2015;6(4):e00662. pmid:26126852
- 40. Marbouty M, Le Gall A, Cattoni DI, Cournac A, Koh A, Fiche J-B, et al. Condensin- and replication-mediated bacterial chromosome folding and origin condensation revealed by Hi-C and super-resolution imaging. Mol Cell. 2015;59(4):588–602. pmid:26295962
- 41. Lun ATL, Smyth GK. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics. 2015;16:258. pmid:26283514
- 42. Stansfield JC, Cresswell KG, Dozmorov MG. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics. 2019;35(17):2916–23. pmid:30668639
- 43. Stansfield JC, Tran D, Nguyen T, Dozmorov MG. R tutorial: detection of differentially interacting chromatin regions from multiple Hi-C datasets. Curr Protoc Bioinform. 2019;66(1):e76. pmid:31125519
- 44. Browne PD, Nielsen TK, Kot W, Aggerholm A, Gilbert MTP, Puetz L, et al. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. Gigascience. 2020;9(2):giaa008. pmid:32052832
- 45. Ali SS, Xia B, Liu J, Navarre WW. Silencing of foreign DNA in bacteria. Curr Opin Microbiol. 2012;15(2):175–81. pmid:22265250
- 46. Wang S, Lee S, Chu C, Jain D, Kerpedjiev P, Nelson GM, et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 2020;21(1):73. pmid:32293513
- 47. Newby J, Chapman J. Metastable behavior in Markov processes with internal states. J Math Biol. 2014;69(4):941–76. pmid:23995843
- 48. Martínez-Rubio R, Quiles-Puchalt N, Martí M, Humphrey S, Ram G, Smyth D, et al. Phage-inducible islands in the Gram-positive cocci. ISME J. 2017;11(4):1029–42. pmid:27959343
- 49. Sung CK, Li H, Claverys JP, Morrison DA. An rpsL cassette, janus, for gene replacement through negative selection in Streptococcus pneumoniae. Appl Environ Microbiol. 2001;67(11):5190–6. pmid:11679344
- 50. Mazzuoli M-V, van Raaphorst R, Martin LS, Bock FP, Thierry A, Marbouty M, et al. HU promotes higher order chromosome organization and influences DNA replication rates in Streptococcus pneumoniae. Nucleic Acids Res. 2025;53(8):gkaf312. pmid:40263708
- 51. Espeli O, Mercier R, Boccard F. DNA dynamics vary according to macrodomain topography in the E. coli chromosome. Mol Microbiol. 2008;68(6):1418–27. pmid:18410497
- 52. Lioy VS, Cournac A, Marbouty M, Duigou S, Mozziconacci J, Espéli O, et al. Multiscale structuring of the E. coli chromosome by nucleoid-associated and condensin proteins. Cell. 2018;172: 771–83.e18.
- 53. Le TBK, Imakaev MV, Mirny LA, Laub MT. High-resolution mapping of the spatial organization of a bacterial chromosome. Science. 2013;342(6159):731–4. pmid:24158908
- 54. Trussart M, Yus E, Martinez S, Baù D, Tahara YO, Pengo T, et al. Defined chromosome structure in the genome-reduced bacterium Mycoplasma pneumoniae. Nat Commun. 2017;8:14665. pmid:28272414
- 55. Croucher NJ, Walker D, Romero P, Lennard N, Paterson GK, Bason NC, et al. Role of conjugative elements in the evolution of the multidrug-resistant pandemic clone Streptococcus pneumoniaeSpain23F ST81. J Bacteriol. 2009;191(5):1480–9. pmid:19114491
- 56. Rode CK, Melkerson-Watson LJ, Johnson AT, Bloch CA. Type-specific contributions to chromosome size differences in Escherichia coli. Infect Immun. 1999;67(1):230–6. pmid:9864220
- 57. Wasim A, Gupta A, Bera P, Mondal J. Interpretation of organizational role of proteins on E. coli nucleoid via Hi-C integrated model. Biophys J. 2023;122(1):63–81. pmid:36435970
- 58. Swetha RG, Sekar DKK, Devi ED, Ahmed ZZ, Ramaiah S, Anbarasu A, et al. Streptococcus pneumoniae Genome Database (SPGDB): a database for strain specific comparative analysis of Streptococcus pneumoniae genes and proteins. Genomics. 2014;104(6 Pt B):582–6. pmid:25269378
- 59. Ferrándiz M-J, Carreño D, Ayora S, de la Campa AG. HU of Streptococcus pneumoniae is essential for the preservation of DNA supercoiling. Front Microbiol. 2018;9:493. pmid:29662473
- 60. Dorman CJ. H-NS-like nucleoid-associated proteins, mobile genetic elements and horizontal gene transfer in bacteria. Plasmid. 2014;75:1–11. pmid:24998344
- 61. Denamur E, Clermont O, Bonacorsi S, Gordon D. The population genetics of pathogenic Escherichia coli. Nat Rev Microbiol. 2021;19(1):37–54. pmid:32826992
- 62. Zawadzki P, Riley MA, Cohan FM. Homology among nearly all plasmids infecting three Bacillus species. J Bacteriol. 1996;178(1):191–8. pmid:8550416
- 63. Veening J-W, Smits WK, Kuipers OP. Bistability, epigenetics, and bet-hedging in bacteria. Annu Rev Microbiol. 2008;62:193–210. pmid:18537474
- 64. Croucher NJ, Mostowy R, Wymant C, Turner P, Bentley SD, Fraser C. Horizontal DNA transfer mechanisms of bacteria as weapons of intragenomic conflict. PLoS Biol. 2016;14(3):e1002394. pmid:26934590
- 65. Kwun MJ, Ion AV, Cheng H-C, D’Aeth JC, Dougan S, Oggioni MR, et al. Post-vaccine epidemiology of serotype 3 pneumococci identifies transformation inhibition through prophage-driven alteration of a non-coding RNA. Genome Med. 2022;14:144.
- 66. Martins BMC, Locke JCW. Microbial individuality: how single-cell heterogeneity enables population level strategies. Curr Opin Microbiol. 2015;24:104–12. pmid:25662921
- 67. Bronner IF, Quail MA. Best practices for illumina library preparation. Curr Protoc Hum Genet. 2019;102(1):e86. pmid:31216112
- 68.
Servant N, Ewels P, Garcia MU, Talbot A, Peltzer A, Miller E. nf-core/hic: nf-core/hic v2.1.0. Zenodo; 2023. doi: https://doi.org/10.5281/zenodo.7994878
- 69. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316–9. pmid:28398311
- 70.
Andrews S. FastQC. 2023.
- 71. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. pmid:22388286
- 72. Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C-J, Vert J-P, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. pmid:26619908
- 73. Abdennur N, Mirny LA. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36(1):311–6. pmid:31290943
- 74. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8. pmid:27312411
- 75. Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, Lander ES, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3(1):95–8. pmid:27467249
- 76.
EPI2ME. wf-pore-c. 2024.
- 77. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. pmid:29750242
- 78.
Lun ATL. DiffHiC: differential analysis of Hi-C data user’s guide. 2017.
- 79.
Stansfield JC, Dozmorov MG. MultiHiCcompare vignette. 2024.
- 80. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. pmid:19910308
- 81. Brooks M E, Kristensen K, Benthem K J ,van, Magnusson A, Berg C W, Nielsen A, et al. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal. 2017;9(2):378.
- 82.
Lüdecke D. sjPlot: data visualization for statistics in social science. 2017.
- 83. Lüdecke D, Ben-Shachar M, Patil I, Waggoner P, Makowski D. performance: An R package for assessment, comparison and testing of statistical models. JOSS. 2021;6(60):3139.
- 84. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019;4(43):1686.
- 85. Duan B, Ding P, Navarre WW, Liu J, Xia B. Xenogeneic silencing and bacterial genome evolution: mechanisms for DNA recognition imply multifaceted roles of xenogeneic silencers. Mol Biol Evol. 2021;38(10):4135–48. pmid:34003286
- 86. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–23.