Hopping into a hot seat: Role of DNA structural features on IS5-mediated gene activation and inactivation under stress

Insertion sequence elements (IS elements) are proposed to play major roles in shaping the genetic and phenotypic landscapes of prokaryotic cells. Recent evidence has raised the possibility that environmental stress conditions increase IS hopping into new sites, and often such hopping has the phenotypic effect of relieving the stress. Although stress-induced targeted mutations have been reported for a number of E. coli genes, the glpFK (glycerol utilization) and the cryptic bglGFB (β-glucoside utilization) systems are among the best characterized where the effects of IS insertion-mediated gene activation are well-characterized at the molecular level. In the glpFK system, starvation of cells incapable of utilizing glycerol leads to an IS5 insertion event that activates the glpFK operon, and enables glycerol utilization. In the case of the cryptic bglGFB operon, insertion of IS5 (and other IS elements) into a specific region in the bglG upstream sequence has the effect of activating the operon in both growing cells, and in starving cells. However, a major unanswered question in the glpFK system, the bgl system, as well as other examples, has been why the insertion events are promoted at specific locations, and how the specific stress condition (glycerol starvation for example) can be mechanistically linked to enhanced insertion at a specific locus. In this paper, we show that a specific DNA structural feature (superhelical stress-induced duplex destabilization, SIDD) is associated with “stress-induced” IS5 insertion in the glpFK, bglGFB, flhDC, fucAO and nfsB systems. We propose a speculative mechanistic model that links specific environmental conditions to the unmasking of an insertional hotspot in the glpFK system. We demonstrate that experimentally altering the predicted stability of a SIDD element in the nfsB gene significantly impacts IS5 insertion at its hotspot.


Introduction
Transposable genetic elements, once considered to be exotic "DNA level parasites" that enabled non-Mendelian transmission of genetic traits, are now recognized to be members of a large collection of genetic elements collectively termed the"mobilome" that constitute important but a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 variable parts of most examined genomes [1]. In this view, the genomes consist of a fixed "core genome" and a collection of variable genetic elements that include plasmids, viruses, transposons, insertion sequences, integrative conjugative elements (ICE), as well as a large number of related sequences that are not always easily recognized. The mobilome, important for all organisms, has played an especially critical role in shaping the prokaryotic world. A testament to the significance of the mobilome is the conclusion that transposases, the genes that confer mobility to transposable elements, are the most ubiquitous genes in nature [2].
Insertion sequences (IS elements) are the smallest autonomous transposable genetic elements found in bacteria. They often consist of little more than one or two reading frames encoding a transposase, the enzyme required for transposition, as well as distinctive terminal sequences that set the IS element apart from the flanking host DNA [1,3]. Most bacteria, including Escherichia coli, not only harbor more than one type of IS element, they often have multiple copies of many IS elements. The ISfinder database (http://www-is.biotoul.fr/) lists more than 4,000 different IS elements, and even this number represents only a fraction of the total number of IS elements found in publicly-available sequence databases.
Given their ubiquity and known roles in gene activation and inactivation, it is hardly surprising that a great many important roles have been ascribed to IS elements in shaping the bacterial genome [1]. For example, the presence of multiple copies of many IS elements furnishes regions of "portable homology" that are acted upon by host recombination functions leading to deletions and rearrangements in the bacterial chromosome as well as integration of exogenous DNA into the chromosome. IS element expansion can occur in nutrient-rich niches in which many genes become non-essential, and can therefore be inactivated, followed by genome contraction by deletions mediated by recombination at repetitive IS element copies. These events have been proposed to be important in the evolution of symbionts as well as intracellular pathogens.
While deduced expansion-contraction cycles offer dramatic examples of the role of IS elements in shaping the bacterial genome, there are many examples in which IS-mediated geneactivation and inactivation confer adaptive advantage to transient changes in the environment. This view is strongly supported by such well-known adaptive responses as phase and antigenic variation by invertible sequences (related to IS elements) in Salmonella, E. coli and other pathogenic bacteria (reviewed in [4]). More interesting still are examples where phase variation occurs by the precise insertion and excision of "normal" IS elements, as in a case involving the insertion of IS492 into the eps locus (capsule synthesis) of the marine bacterium Pseudomonas atlantica [5], as well as a case for the insertion of IS1301 into the siaA gene of the human pathogen Neisseria meningitidis [6]. In each of these cases, insertion occurs into a single hotspot, leading to inactivation of the operon. Subsequent precise excision restores the sequence and original level of operon expression.
In the examples cited above, the tacit assumption has been made that the underlying IS insertion and excision events are stochastic, with the environment playing no role in regulating these events, an assumption that to our knowledge has not been rigorously tested. Challenging the idea that IS hopping is always stochastic, is the emerging evidence that IS insertions can be selectively targeted to specific chromosomal loci by environmental stress conditions, and such targeting has the effect of relieving the stress. This concept, sometimes referred to as "directed" mutation, is controversial because it has been misinterpreted as invoking a Lamarckian process. However, as considered further elsewhere in this communication, the underlying processes do not violate fundamental genetic principles, but merely illuminate previously underappreciated aspects of the evolved relationship between transposons and their host genomes. Table 1 is a non-exhaustive list of experimental systems that have been used to suggest that IS element insertion is elevated under stressful conditions. In this communication we focus on five systems in which IS5 insertion leads to gene activation or inactivation.
Transposable genetic elements were long assumed to have little preference for a specific DNA target sequence, but more recent evidence has suggested that subtle structural features are important in target-site acquisition [3,20,21]. For example, in the E. coli genome, a preference is seen for intergenic regions. This situation could have arisen because selection preserves only those insertions that are either neutral or beneficial. It is additionally possible that transposition target selection favors specific structural features [1,20]. For example, AT-rich sequences were proposed to be preferential targets for transposition. Another example is provided by transposons that appear to target replication forks, showing a preference for inserting into the lagging strand template [22]. Insertion mechanisms of many transposable elements involve introducing staggered cuts at short sequences (2-14 bp) leading to "target site duplication" (TSD), often considered to be the signature of transposition. Indeed, for a few transposons there is a high degree of target sequence specificity: for example, Tn7 has a single preferred site in the E. coli genome, although it can be forced to insert at certain other regions [23]. However, for a majority of transposons, the short sequences are neither highly conserved nor rare, and they therefore are unlikely to account for non-random target selection. Thus, DNA structural features outside of the potential target sites may play important roles [1].
A number of non-canonical DNA structures such as A-DNA, Z-DNA, H-DNA, G-quartets, SIDD (superhelical stress-induced DNA destabilization) regions and other "non-B DNA structures" have been identified in bacterial as well as eukaryotic genomes [24,25]. Here we describe an association between IS5 element insertion and SIDD sequences in experimental systems in which environmental stress leads to increased transposition. With specific focus on IS5, we propose and test a speculative model on the roles of DNA structure and DNA binding proteins on transposition target selection, and how target selection could be affected by environmental stress.

SIDD mapping of DNA sequences
All analyses were carried out on Escherichia coli K12 (MG1655 or BW25113) DNA sequences from GenBank (Table 2). Each sequence was between 4 to 5 kb long, and contained the gene in question at the center along with a 2 kb sequence upstream of the ATG start codon, and the

Construction of nfsB alleles with altered SIDD profiles
We used a recombineering protocol based on GalK positive-selection and counter-selection [31] to modify the nfsB gene in situ. To construct the DNA duplex-stabilizing PROM-3 mutation, three bases, AA (+6 and +7 from nfsB transcription start) and T (+14 from nfsB transcription start), were changed to CC and G, respectively (the larger sequence context for these changes is shown in Results and Discussion). To accomplish this, first the galK gene together with its constitutive em7 promoter (em7-galK) was amplified from the pGalK plasmid [31] using primers GalK-PROM3-P1 and GalK-PROM3-P2, each of which is composed of a 20 bp region at its 3' end that is complementary to the em7-galK sequence, and a 50 bp region at its 5' end that is homologous to the nfsB gene (see S2 Table). The PCR products were gel purified, treated with DpnI, and then electroporated into BW25113 ΔgalK cells (Gal -) expressing Lambda-Red proteins encoded by plasmid pKD46 [32]. The cells were plated on minimal M9 agar plates with galactose (0.5%, w/v) as the sole carbon source. After 3 days of incubation at 30 o C, several colonies (Gal + ) were purified, and subsequently verified for the replacement of the 9-bp region (AAATTACTT) located between +6 and +14 by PCR followed by DNA sequencing. A resulting strain was named BW_K. A 100-bp synthetic DNA oligonucleotide with the sequence (ctcgcttaccatttctcg ttgaaccttg-taatctgctggcacgcaaCCttactGtcacatggagtctttatggatatcatttctgtcgccttaaagcgtc) that contained substitutions of three nucleotides (boldfaced and uppercase letters) was PCR amplified. The DNA products were purified and subsequently electroporated into BW_K cells expressing the Lambda-Red proteins. After incubation at 30 o C for 1 h, the cells were pelleted, washed once by resuspension in 1x M9 salts, and subsequently spread on M9 agar supplemented with 0.2% glycerol and 0.2% 2-deoxygalactose (a galactose analog that is toxic when it is phosphorylated by GalK). After 3 to 4 days of incubation at 30 o C, about 10 colonies (Gal -) were purified using the same agar plates and then verified for the replacement of em7-galK with the 100-bp DNA fragment (containing the desired 3 nucleotide substitutions) using PCR and subsequent DNA sequencing.
To construct the DNA duplex-destabilizing ORF-3 mutation, three bases, C (+38 from nfsB transcription start), G (+56 from nfsB transcription start) and G (+71 from nfsB transcription start) were each changed to the nucleotide "A" using the similar methods as above. The region containing the sequences "CATTTCTGTCGCCTTAAAGCGTCATTC CACTAAG" (bases +38 to +71 from nfsB transcription start) in strain BW25113 ΔgalK was first replaced by em7-galK. A 100-bp DNA fragment (caaaattactttcacatggagtctttatggatatAatttctgtcgc cttaaaAcgtcattccactaaAg catttgatgccagcaaaaaacttaccccg) that contained the desired 3 bases (boldface and uppercase letters) was then substituted for this em7-galK region. The larger sequence context for these changes is shown in Results and Discussion.
Isolation and sequence analyses of furazolidone-resistance (FZD r ) mutations in the nfsB gene Three strains, BW25113 (ΔnfsA, nfsB + carrying a wild-type SIDD region of the nfsB gene), PROM-3 (ΔnfsA nfsB + carrying mutations that increase duplex stability in the SIDD region of the nfsB gene) and ORF-3 (ΔnfsA nfsB + carrying mutations that further decrease duplex stability in the SIDD region of the nfsB gene), were used for isolation of FZD r (ΔnfsA nfsB) mutants.
Overnight cultures grown at 30 o C in LB medium were diluted to an OD 600 of 2. The diluted cultures (100 μl) were spread on LB agar plates containing FZD (6 or 8 μg/ml). The plates were incubated at 30 o C for 36 to 40 h. The FZD r colonies were counted. At least 100 colonies for each tested strain were picked onto new LB + FZD plates that were then incubated at 30 o C overnight. To examine which FZD r mutants are IS insertional mutants, the region containing nfsB and its promoter from over 100 FZD r mutants derived from each strain was amplified by PCR using primers NfsB-ver-F and NfsB-ver-R (S2 Table). The PCR products were subjected to agarose gel electrophoresis, and any mutant with a~2kb DNA band (as compared to a 1 kb band for wild type) was considered to be an IS insertion mutant. To identify which IS insertional mutants bore IS5 elements, two rounds of PCR were performed. The first round of PCR used primers NfsB-ver-R and IS5-ver-F (specific to IS5 and oriented in the same direction as the transposase gene ins5A). Any IS5 insertional mutant in the direct orientation would result in a~1-kb dominant band. The second round PCR used primers NfsB-ver-F and IS5-ver-F. Any IS5 insertional mutant in the reverse orientation would result in a dominant band (about 1 kb in size dependent on the IS5 locations). IS5 insertional frequency was calculated by dividing the sum of IS5 mutants from these two rounds of PCR by the number of the total FZD r mutants.

Stress-induced DNA duplex destabilization (SIDD) regions
In most autonomous organisms studied to date, DNA is maintained in a negatively supercoiled, or under-wound, state through the combined actions of topoisomerases, DNA-binding proteins, and enzymes responsible for transcription and DNA replication. In E. coli, topoisomerase II (also known as DNA gyrase), and to a lesser extent, topoisomerase IV, are the principal enzymes that cut, unwind and re-ligate DNA strands, thereby reducing the linking number. The stress of under-winding is relieved by supercoiling as well as other localized changes in DNA structure. A second set of topoisomerases, exemplified in E. coli by topoisomerase I (ωprotein) counteract excessive under-winding by re-winding DNA strands so that DNA is maintained in a slightly under-wound state. DNA transactions such as transcription and replication introduce positive supercoils ahead of the transcription bubble or replication fork. Transcription in particular is believed to be a major driver of the superhelical state of DNA because it produces positive supercoils ahead of the transcription bubble, and negative supercoils in its wake [33]. In addition, DNA-binding proteins such as histones in eukaryotes and histone-like nucleoid proteins in bacteria sequester and maintain superhelical domains [33].
The negative superhelical status of intracellular DNA is the sum of the contributions of topoisomerases (the unconstrained portion), and those of DNA-binding proteins such as histones or nucleoid proteins (the constrained portion). In actively dividing E. coli cells, negative supercoiling is maintained in the range of around σ = -0.06 (σ is the specific linking difference, or superhelical density; [34,35]), with σ ranging under different conditions, from -0.03 to -0.09. Within this range, the supercoiling status is regulated in part by the intracellular energy charge of the adenylate pool, approximated as the ATP/ADP ratio [36]. The energy stored in negative supercoils is not only essential for DNA transactions such as transcription and replication, but also drives localized sequence-dependent structural transitions with important biological consequences. Such superhelical stress-induced structural transitions (SIST) include SIDD regions, Z-DNA, cruciform extrusions, H-DNA, and G-quadruplexes. Benham and co-workers have developed statistical-mechanical computational procedures capable of accurate predictions of SIDD regions grounded on experimentally determined DNase-sensitivity data [26][27][28][29]. They have applied these computational methods to map SIDDs as well as other DNA structural transitions such as H-DNA, Z-DNA and cruciform structures in the entire E. coli genome [25]. They have further shown that SIDD sequences are associated with both transcriptional regulatory sequences [26,37], and eukaryotic replication origins [38]. Their analysis of E. coli K12 genomic DNA showed that about 1% of the genomic DNA is strongly destabilized, 12% moderately destabilized, and that >75% of genomic DNA is stable under physiological levels of superhelical stress [37].
In this communication, we have analyzed all five known experimental systems in which IS5 insertion events occurred under conditions of starvation or nutrient depletion (glpF, bglG, fucA and flhD) or under antibiotic stress (nfsB), and in which sites of IS5 insertion have been determined at the sequence level. Here we summarize salient features of each system, along with an analysis of the DNA structural context of IS5 insertions.

IS5-mediated activation of the glycerol utilization operon glpFK under starvation stress
The best-characterized experimental system in which environmental conditions appear to stimulate IS5 hopping is the E. coli glpFK/Crp experimental system [7,8]. In wild type E. coli cells, glycerol utilization requires expression of the genes that are normally poorly expressed in the presence of glucose. The first operon in the glycerol utilization pathway, glpFK, codes for the glycerol uptake facilitator, GlpF, and the kinase, GlpK, that phosphorylates glycerol to glycerol-3-phosphate (G3P). G3P has two functions: it is the inducer of the five-operon glp regulon, and it feeds into glycolysis after conversion to dihydroxyacetone phosphate, catalyzed by G3P dehydrogenase (GlpD). The glpFK operon is repressed by the binding of the GlpR repressor to four binding sites (operators; O 1 -O 4 ) in the promoter region, and is activated by the binding of the cAMP-Crp complex to two binding sites that overlap two of the GlpR binding sites, O 2 and O 3 [7]. G3P binds to and releases GlpR from the DNA, and concomitant binding of the cAMP-Crp complex to the two binding sites (O CrpI and O CrpII ) in the promoter activates the transcription of the glpFK operon. Thus, cells lacking either the crp (Crp), or the cyaA (adenylate cyclase; biosynthesis of cAMP) gene are Glpbecause the glpFK operon is essentially silent, and do not form colonies on glycerol minimal agar plates because they cannot utilize glycerol as a carbon source.
Prolonged incubation of Glpcells on glycerol minimal agar plates, however, led to the continuous generation of Glp + colonies starting from day 3 and extending over a period of 9 days. In every case, IS5 is found to be inserted upstream of the glpFK promoter at the same specific location and in the same orientation. Subsequent work established several key features of this glpFK-activating insertional mutation. (1) The mutation leads to high-level activation of the native glpFK promoter [7,8]. (2) Presence of glycerol is necessary to stimulate IS5 insertion in glpR + cells, but not in ΔglpR cells. Moreover, IS5 insertion was significantly decreased upon over-expression of GlpR, indicating that GlpR binding hinders IS5 insertion. (3) GlpR inhibition of IS5 insertion is independent of the role of GlpR in repressing glpFK operon transcriptional expression [7]. (4) Increased IS5 insertion due to the presence of glycerol is specific to the glpFK promoter, and is not observed at other tested loci where insertion events can be monitored [7]. Furthermore, the mutagenic process could be demonstrated independently of the selection procedure [7,39]. (5) IS5 insertion into the glpFK promoter region is blocked by the binding of the cAMP-Crp complex to its two operators (O CrpI and O CrpII ) [39].
Any hypothesis that seeks to implicate an environmental condition in controlling beneficial IS insertion at a specific locus requires a plausible chain of molecular events linking the environmental condition to the transposition event. Despite the wealth of information available about IS5-activation of the glpFK operon, as summarized above, the mechanisms that promote IS5 insertion at this specific locus have not been explored. Fig 1 shows an analysis of a 4.8 kb segment of the E. coli MG1655 genome encompassing the glpFK locus, obtained using the SIST codes generously made available by Prof. C. J. Benham. The destabilization energy G(x) is the incremental energy (in kcal/mol) required to ascertain that base pair x will be unpaired under physiological conditions of superhelical density (σ = -0.06), temperature (37 o C), and ionic strength (0.01 M). Stable duplex DNA regions have high G(x) values (about 12 kcal/mol in Fig 1), whereas regions susceptible to strand separation have lower G(x) values. In Fig 1, panel B, a G(x) plot for the entire 4.8 kb fragment, is shown at the top, with a segment expanded underneath to show greater detail. In Fig 1, panel A, the red text identifies base pairs where G(x) values fall below 6 kcal/mol. Fig 1 shows that the activating IS5 insertion in the glpF upstream region targets a CTAA tetranucleotide embedded within a sequence of very low G(x) values (dipping below 2) within an extended SIDD region.
The occurrence of the glpFK-activating IS5 insertion within a SIDD sequence offers an opportunity to address two questions: (1) How can one possibly link an environmental stress condition (carbon starvation in the presence of glycerol this case) to IS5 hopping into a specific locus, and (2) What exactly is the role of low G(x) values on target selection for IS5 insertion? We propose that the CTAA sequence embedded within the SIDD region constitutes an IS5 insertional hotspot that is, however, masked and remains inaccessible in wild type cells under normal growth conditions. During growth in rich or defined glucose medium, the glpFK operon is repressed by the binding of the GlpR repressor protein to the DNA sequences upstream of the glpF promoter, and this binding serves to mask the IS5 site in the sense to be discussed below. Under conditions where glycerol is the sole carbon source, in normal wild type (crp + ) cells, cAMP synthesis leads to an accumulation of the cAMP-Crp complex which then binds to the same upstream region as the GlpR protein, and activates the glpFK operon. The initial burst of GlpF and GlpK synthesis leads to facilitated glycerol uptake and phosphorylation to produce G3P, which then physically binds to and inactivates GlpR, thereby allowing the expression of the glpFK operon. In these cells, even though GlpR has been inactivated, occupation of the same DNA region by the cAMP-Crp complex continues to mask the IS5 site. When either the crp gene, or the cyaA (cAMP biosynthesis) gene is deleted, there is no cAMP-Crp complex to mask the IS5 insertion site, but GlpR is still present and continues to mask the insertion site. Support for this idea is provided by the finding that in ΔglpR cells, IS5 insertion rates are elevated some 100-fold by the loss of the cAMP-CRP complex in cells growing in rich liquid medium (LB), and the presence or absence of glycerol in the medium has no effect; conversely, over-expression of GlpR essentially eliminates IS5 insertions [7,39].
We propose that prolonged incubation of Δcrp cells on glycerol minimal agar leads to a gradual release of the GlpR repressor from its operators, and the consequent unmasking of the IS5 site (Fig 2A). Although GlpF facilitates glycerol uptake, it is known that glycerol can also passively diffuse into the cell from the medium [40]. We propose that this small pool of glycerol gets phosphorylated to G3P either due to low-level (leaky) expression of the glpK gene or due to the activity of an unknown sugar kinase capable of acting on glycerol with low Crp, when complexed with cyclic AMP (cAMP), activates transcription (!) but inhibits IS5 insertion (--|). GlpR in the free form inhibits both transcription, and IS5 insertion(--|), but when present in sufficient quantities, glycerol 3-phosphate (G3P) binds to GlpR, inducing a conformational change that causes the protein to dissociate from the DNA, relieving the inhibition of transcription, and allowing IS5 insertion [7,39]. Panel B. Models for unmasking of the IS5 insertion hotspot with and without regional DNA amplification. When Δcrp or ΔcyaA cells are subjected to prolonged incubation on minimal agar with glycerol as the sole carbon source, some glycerol diffuses into cells, and is converted to G3P at low efficiency (see text). Accumulation of G3P over time leads to dissociation of GlpR from the glpFK promoter region, unmasking the IS5 insertion hotspot. In an alternative scenario, IS5 insertion occurs in a subset of cells in which the glpFK region is amplified, leading to accelerated release of GlpR due to multicopy titration of GlpR. Note that the gene encoding GlpR is located 12 minutes away, and is unlikely to be co-amplified with glpFK in this scenario. Furthermore, to the extent that G3P formation depends on leaky expression of glpK, amplification will also lead to a more rapid accumulation of G3P. Finally, multiple copies of the IS5 target site resulting from glpFK amplification increase the likelihood of capturing an insertion event in these cells.
There is at least one other mechanism that can achieve a similar effect in a subset of cells. Roth and co-workers have argued that when E. coli FC40 (F' lac -) cells are subjected to prolonged incubation on lactose minimal agar, in discrete subpopulations of non-dividing cells, the F' plasmid copy numbers increase, and the higher gene copy numbers provide greater opportunities to acquire a random mutation [41]. If a lac + reversion does arise in some cells with multiple lacgene copies, then those cells will divide using lactose, and eventually form stable lac + colonies. The idea of selective gene amplification is directly extensible to chromosomal loci because discrete sub-populations of starving cells are known to have amplified different chromosomal segments [41,42]. Extending this idea to the glpFK/Crp experimental system, one proposes that in a subset of cells, regional amplification of the glpFK locus creates multiple copies of the gene without co-amplifying the glpR gene which is encoded in a distant chromosomal locus 12 minutes away, and therefore, leads to a simple titration of the cellular GlpR concentration. Lack of sufficient GlpR in turn promotes IS5 insertion both by unmasking the target site, as well as by providing multiple copies of the unmasked target site in the same cells (Fig 2B). The most interesting aspect of these speculative mechanisms is that they provide a conceptual framework for mutation directed by a specific stress environment. While these mechanisms remain to be experimentally verified, it is evident that each mechanism yields testable predictions.
We next consider what constitutes an IS5 hotspot, and what roles SIDD sequences might play in creating a hotspot. IS5 insertion leads to the duplication of a short tetranucleotide sequence with the consensus sequence Py-T-A-Pu, although the most common sequence is CTAG, which is an under-represented tetranucleotide in bacterial genomes [43]. However, as shown in Table 3, the sequence requirement is not strict, and one or the other known target tetranucleotide sequence is observed approximately once every 50 bp. Thus a target sequence plays a role, but additional DNA (structural) features are apparently required for insertion [3,20,21].
Although the IS5 transposition mechanism has not been directly investigated, IS5 encodes a "DDE type" transposase, which is by far the most common as well as the best-studied class of bacterial transposases. Target acquisition by Mu transposase, a well-studied example of a DDE transposase, requires significant 140 o bending of the target sequence [44]. Even though there are substantial differences in the detailed mechanisms of different DDE type transposases, DNA bending (deformation) appears to be a general feature of this class of transposases. The significance of SIDD sequences could be that they are easily deformed when their duplex status is not stabilized by DNA binding proteins such as GlpR or cAMP-Crp. Thus, an easy-to- deform sequence context (such as a SIDD sequence) surrounding a Py-T-A-Pu target tetranucleotide might constitute an IS5 hotspot. SIDD sequences might constitute an attractive IS5 target for other reasons as well. As noted previously, although SIDD sequences are AT-rich, AT-richness, as well as the actual sequence and the sequence context determine the depth and width of a SIDD destabilization "well" [27,28,37]. It has been shown that the E. coli nucleoid protein H-NS (encoded by hns) binds to AT-rich sequences and H-NS binding often leads to gene-silencing [45,46]. Interestingly, H-NS was found to be required for efficient transposition of several transposons, including IS903, Tn10 and Tn552 as transposition rates are significantly reduced in Δhns cells [47]. Thus, H-NS bound to a SIDD sequence might be an important feature in generating an IS5 hotspot. We speculate that the inhibition of IS5 transposition by GlpR and cAMP-Crp may be mediated partly by exclusion of H-NS, and partly by their ability to stabilize the DNA duplex structure. Thus, a favored IS5 target sequence might be a Py-T-A-Pu tetranucleotide embedded in a SIDD sequence because such sequences can (1) be easy-to-deform, and/or (2) have the potential to bind H-NS.
Other possible roles for SIDD sequences cannot be eliminated. For example, some transposable elements insert into transient single-stranded regions created during replication [48]. The lower energetic cost of melting duplex DNA within a SIDD sequence may thus create similar transiently single-stranded targets. It is also possible that DNA melting at SIDD sequences enables binding by HU, another nucleoid protein, which in turn may play a role in transposition of some elements such as bacteriophage Mu [44].

Activation of bglGFB, the cryptic aromatic β-glucoside utilization operon
The E. coli bgl system offers a well-characterized early example of gene activation by transposon insertion [13,[49][50][51], but it differs from the glpFK system in two key characteristics: whereas IS-hopping in the glpFK/Crp system occurs at a single locus, and is promoted in nongrowing or slowly-growing cells, IS insertion in the bgl system occurs at several loci, and occurs both in dividing cells and in non-dividing cells, and therefore offers a somewhat different paradigm. The bglGFB operon is cryptic, but when activated by IS insertion, its expression enables cells to utilize aromatic β-glucosides such as arbutin and salicin. The operon is regulated by a transcription-attenuation mechanism such that bglG, the first gene in the operon, must be expressed in order to overcome the transcriptional attenuation. In wild type cells, the bglG promoter is flanked by inhibitory DNA sequences that prevent expression of bglG. Two classes of mutations can activate the bgl operon: the first and major class involves insertion of an IS element in the promoter-flanking inhibitory sequences, and the second and minor class involves mutational inactivation of hns, the gene encoding the histone-like nucleoid protein H-NS [13,51,52]. In wild type cells, H-NS is believed to silence the bglG promoter by binding to the promoter sequences, and thus, IS insertions and hns mutations both appear to act by relieving H-NS repression. Reynolds et al. showed that the bgl operon is activated by transposon-hopping in the bglG promoter region at exceptionally high frequencies (10 −5 ), almost always by the insertion of IS5 or IS1, but less frequently, also by IS2 and IS3 [52], suggesting that this DNA region constitutes a transposition hotspot. In a subsequent study, Hall showed that when wild type (Bgl -) cells are subjected to prolonged incubation on minimal media with arbutin as the sole carbon source, Bgl + colonies arise from non-dividing cells, apparently at high frequencies. A major fraction of these so-called adaptive mutations were due to insertion of IS1 or IS5 within the same 19 bp hotspot found previously in dividing cells [49]. Fig 3 shows an analysis of a 4.8 kb segment of the E. coli MG1655 genome encompassing the bglG locus using the SIST codes. In Fig 3, panel A, the red text identifies base pairs where G (x) values fall below 6, and panel B shows the G(x) energy plot of the region. It is immediately apparent that most IS insertions in growing cells [13] as well as starved cells [49] fall within the SIDD region, as also shown graphically using red arrows in panel B. In addition to a large number of insertions that occurred in the -70 to -150 region (relative to the transcriptional start site in Fig 3), Schnetz and Rak [13] identified two rare IS5 insertions isolated from growing cells that fell downstream of the bglG promoter (blue arrows in panel B). Schnetz and Rak reported that these downstream insertions occurred at a frequency of <0.01% of the Bgl + mutants. We interpret this finding to mean that even though insertions outside of the SIDD region could confer the Bgl + phenotype, most insertions however were targeted to SIDD regions, strengthening the correlation that insertion events were indeed favored in the SIDD region.
The exceptionally high frequencies of IS5-hopping at the bgl locus suggest that H-NS binding to promoter-flanking sequences creates constitutive IS5 hotspots at several Py-T-A-Pu sequences embedded in the SIDD region. It implies that no duplex-stabilizing proteins equivalent to GlpR bind to these constitutive hotspots in laboratory growth conditions. The bglG upstream region does have binding sites for Fis, another nucleoid protein, and binding sites for the transcription-activating proteins such as Crp, RcsB/BglJ and LeuO [53]. Whether these proteins impact IS-hopping remains to be investigated.

Activation of swarming motility by IS element insertion into the flhD promoter region
In E. coli, swarming motility is believed to confer survival advantages in toxic or nutritionally poor environments, and requires the expression of flagella. The flagellar synthesis regulon, a very large and complex system consisting of some 50 genes [54], is regulated by the two master regulators, FlhD and FlhC expressed from the flhDC operon. In many strains of E. coli, such as BW25113, the flhDC operon, whose expression is under the control of an exceptionally complex regulatory region, remains largely cryptic such that flagellar expression is low, and the cells do not swarm when grown in either liquid media where swarming confers no benefit, or  Table 2) for a 4.6 kb DNA sequence from E. coli K12 MG1655. For clarity, only a segment of the plot (bp 1000 to 3000) is shown. The location of two IS5 insertion sites representing two of the most frequent events are indicated by red arrows, and the much less-frequent outlier insertion sites (IS5-S6 and IS5-517) are shown by blue arrows (see text). The translational start site for bglG is indicated by a black arrow.
https://doi.org/10.1371/journal.pone.0180156.g003 on solid agar media where swarming is disallowed. Two recent studies demonstrated however that when plated on semi-solid agar media ("motility agar") where swarming is allowed, mutants capable of increased swarming ability appear [10,11], and their emergence accelerates as the cells approach stationary phase [11]. The activating mutations are due to the insertion of IS elements in the flhDC upstream regulatory region. These findings have been interpreted to mean that specific environmental conditions promote mechanisms that lead to beneficial mutations in the flhDC operon. Because such mutations do not occur (or occur at very low levels) when not beneficial, as in the case of growth in liquid medium where swarming is unnecessary, or growth on solid agar where swarming is disallowed, mutations that arise only when useful are proposed to be examples of "quasi-Lamarckian" [10] or "directed" [11] mutations.
Analysis of a 4.4 kb sequence embedding the divergently transcribed flhDC-uspC intergenic region in E. coli BW25113 showed an extensively segmented SIDD region upstream of the flhD start codon (Fig 4, panel A). IS5 insertion points occurred within the segmented SIDD sequence with G(x) values ranging from~1 to~7 as compared to a baseline value of 12 for the region shown. A number of negative regulatory proteins (RcsAB, YjjQ, OmpR, Fur, IHF, LrhA, MatA, H-NS), positive regulatory proteins (Crp) bind to this region [55][56][57][58][59]. Of note is the RcsAB protein, which, as a part of the RcsC phosphorelay system that responds to a variety of environmental inputs including surface-sensing [60], could influence DNA structural features relevant to transposon target acquisition. Regardless of whether there is a specific role for the RcsC phosphorelay system, the flhDC system offers an opportunity to examine if mechanosensing can influence transposition by influencing the structural states of DNA.

Reversible activation of a cryptic anabolic function by IS5 insertion
Metabolism of fucose by wild type E. coli cells leads to production of the waste product, 1,2-propanediol (PPD), which is excreted into the medium. PPD cannot be used as a carbon source by wild type cells, but acquisition of an IS element in the intergenic region between the divergently transcribed fucAO and fucPIK operons confers the ability to use PPD as a sole carbon source [14]. However, this benefit comes at the cost of simultaneously losing the ability to utilize fucose. In wild type E. coli cells, utilization of fucose requires the sequential actions of a permease (fucP), an isomerase (fucI), a kinase (fucK) to yield the sugar phosphate, L-fuculose-1-phosphate. An aldolase (fucA) then converts this sugar phosphate to dihydroxyacetone phosphate plus L-lactaldehyde [14]. Under non-respiratory conditions, L-1,2-propanediol oxidoreductase (FucO), an iron-dependent group III dehydrogenase, converts L-lactaldehyde to PPD which is excreted. The inducer for the fucAO-fucPIK system is fuculose-1-phosphate. The lactaldehyde , PPD conversion by the fucO oxidoreductase is reversible, but in wild type cells, PPD cannot be utilized as a sole carbon source because the fucO oxidoreductase itself is not induced, and therefore unavailable. Prolonged incubation of wild type cells on PPD minimal agar, however, yields PPD + mutants that can grow on PPD because the fucAO operon is constitutively expressed [14,15]. All PPD + mutants acquire an IS5 element at a specific locus in the fucAO-fucPIK intergenic region (Fig 5, panel A). The insertion occurs at a CTAG sequence within the broad SIDD region which encompasses complex, but poorly-described, regulatory sequences for the divergent fuc operons. The acquisition of the IS element, while rendering the fucAO operon constitutive, also renders the divergently expressed fucPIK operon non-inducible, thus effectively rendering the cells PPD + , but fucose-negative (Fuc -). Zhang et al., showed that when PPD + /Fucmutants were plated on fucose minimal agar, PPD -/Fuc + back-mutants arose in which precise excision of the original IS5 insertion had occurred [15]. We propose that transposition and precise excision of IS5 might constitute a reversible adaptive system for competing with other enteric bacteria in the gut by acquiring the ability to utilize a carbon source that cannot be utilized by competing bacteria, as hypothesized for Salmonella enterica serovar Typhimurium [61]. The fucAO-fucPIK experimental system, by allowing selection for both insertion and precise excision of IS5, offers an opportunity for dissecting DNA structural features important for insertion as well as excision [15].

IS-mediated gene-inactivation under antibiotic stress
Furazolidone (FZD) is a nitroheterocyclic aromatic compound that upon activation, damages DNA and kills bacteria [62]. Activation is believed to require the action of several nitroreductases, the most important of which are encoded by the genes nfsA and nfsB. The toxic products of FZD nitroreduction are believed to be hydroxylamine derivatives [63]. Resistance to FZD proceeds through two steps: in the first step, nfsA is inactivated by mutation leading to decreased sensitivity, and in the second step, nfsB is inactivated [62], leading to much greater resistance. In both genes, large fractions of the inactivating mutations are due to IS element insertions. In the nfsB gene, there is a strong hotspot for IS5 insertion [64]. The nfsB system is especially interesting because inactivating mutations conferring an FZD-resistant phenotype could occur at a large number of known inactivating sites throughout the reading frame, but all IS5 insertions occur at a single locus in the N-terminal portion of the reading frame. We therefore asked if this insertional hotspot correlates with a SIDD region. The major IS5 hotspot is a CTAA sequence (bases 38-41 in the N-terminal portion of the reading frame; blue lettering in Fig 6, panel A) that is in a region with G(x) values of 6.5, and located within the same SIDD "well". Several other IS elements target the same SIDD region (not shown). Whiteway et al. [64] previously reported that nfsB-inactivating IS1 and IS2 insertions occurred at a number of sites throughout the nfsB reading frame. Multiple potential IS5 target tetranucleotide sequences (Py-T-A-Pu) are observable throughout the nfsB reading frame, but, IS5 insertion was not observed at these other sites. In our experiments, consistent with prior findings, the major insertion event was IS5 insertion at the single location in the Nterminal region of the nfsB reading frame (blue lettering in Panel A, Fig 6).
If duplex destabilization allows frequent IS5 insertion into a Py-T-A-Pu target site, one would anticipate that DNA sequence changes that decrease the G(x) values will lead to an increase in IS5 insertion. To test for this possibility, we first analyzed the effect of in silico mutations in the first 15 codons of the nfsB gene on the G(x) values of the SIDD region. We found that introducing three third-position GC!AT base substitutions to replace three codons (3, 9 and 14) with synonymous codons had the effect of decreasing the G(x) values further (Fig 7A, blue plot line). We introduced these three changes into the chromosomal copy of the nfsB gene as described in Materials and Methods, and tested the effects on the appearance   of total FZD r mutants, total IS insertional mutants, and IS5 insertional mutants. Our data ( Table 4 and Fig 7B) show that decreasing the G(x) value (ORF-3 mutant) had the effect of significantly increasing the fraction of nfsB mutants that had IS5 insertions at the hotspot locus, while increasing the G(x) values (PROM-3 mutant) had the opposite effect.
Potential evolutionary significance of IS5 targeting to SIDD regions Our preliminary analyses (not shown) of an extensive set of unselected IS5 insertion sites in evolved cultures confirmed that while IS5 hopping is favored at SIDD sequences, it is not exclusive to SIDD sites, indicating that other, as yet undiscovered DNA structural elements also modulate IS5 insertion. Given the immense diversity of transposable elements, it is likely that a large variety of structural determinants will play roles in creating hot and cold spots for transposition by different elements. In the examples discussed in this communication, we have focused on known systems in which environmental stress conditions appeared to influence IS5 insertion. We find that in all five cases, insertion events target tetranucleotide sequences embedded within SIDD sequences that are also enriched for H-NS binding.
A long-standing puzzle concerning the evolutionary forces that retain cryptic operons such as the bgl operon may be relevant to understanding how such IS5 hotspots could have been selected. It has been speculated [65] that the bgl operon is retained for its potential benefit in  Fig 6), whereas the ORF-3 mutant has one G:C to T:A transversion, and two G:C to A:T transition mutations. SIDD plots for the wild-type (black line), Prom-3 (red line) and ORF-3 (blue line) sequences are shown. Relative to the wild type sequence, the G(x) values are increased in the Prom-3 mutant (red), meaning that DNA duplex is more stable than wild type in this region. The G(x) values are decreased for the ORF-3 mutation (blue), meaning that the duplex is further destabilized relative to the wild type sequence. Note that all shifts are localized to the same SIDD "well" as found in the wild type sequence. Panel B. Effect of Prom-3 or ORF-3 mutations on relative IS5 insertion frequencies (see text and Table 4). Color coding is the same as in Panel A: black shows insertion frequencies for the wild type strain, red for the Prom-3 (re-stabilized duplex) mutant, and blue for the ORF-3 (further destabilized duplex) mutant.
https://doi.org/10.1371/journal.pone.0180156.g007 utilizing aromatic β-glucosides encountered in the environment, but is nevertheless maintained in a cryptic form in a majority of cells because the same enzymes could also act on lethal substrates such as the environmentally-ubiquitous cyanogenic glucosides [66,67]. The unstated presumption in this speculation is that mechanisms exist for reactivating, and then subsequently deactivating cryptic operons on an "as-needed" basis. As discussed above, IS5 can not only lead to gene activation by hopping into a hotspot within the control region of the gene, but it can also reverse the activation by precise excision from the control region, as demonstrated previously in the fucAO-fucPIK system [15]. We propose that some IS5 hotspots evolved as a population-level regulatory mechanism for turning genes on and off in response to environmental stimuli. In addition to relatively non-specific nucleoid proteins such as H-NS that are required for transposition, locus-specific DNA-binding proteins such as GlpR and Crp may regulate transposon targeting in response to cognate environmental stimuli by masking or unmasking hotspots as discussed above for the glpFK model system. The ubiquity of transposable elements was long interpreted as evidence of a highly evolved and mutually beneficial relationship between transposable elements and the host genome, and this communication sheds light on one such relationship.
Supporting information S1