Genetic toolbox for controlled expression of functional proteins in Geobacillus spp.

Species of genus Geobacillus are thermophilic bacteria and play an ever increasing role as hosts for biotechnological applications both in academia and industry. Here we screened a number of Geobacillus strains to determine which industrially relevant carbon sources they can utilize. One of the strains, G. thermoglucosidasius C56-YS93, was then chosen to develop a toolbox for controlled gene expression over a wide range of levels. It includes a library of semi-synthetic constitutive promoters (76-fold difference in expression levels) and an inducible promoter from the xylA gene. A library of synthetic in silico designed ribosome binding sites was also created for further tuning of translation. The PxylA was further used to successfully express native and heterologous xylanases in G. thermoglucosidasius. This toolbox enables fine-tuning of gene expression in Geobacillus species for metabolic engineering approaches in production of biochemicals and heterologous proteins.


Introduction
For decades, thermophilic bacteria have been used in biotechnology. Their applications have mainly been confined to their thermostable enzymes, one of the most prominent examples being the Taq polymerase from Thermus aquaticus [1]. The global market for industrial enzymes in 2015 was estimated to be US$ 4.4 billion [2] with thermostable enzymes playing an ever increasing role [3]. The genus Geobacillus, comprising some thermophilic species previously belonging to genus Bacillus [4], has also been used in this regard. Examples of industrially relevant enzymes isolated from Geobacillus species include proteases [5], amylases [6], lipases, [7] and xylanases [8], to mention just a few.
Recently, however, there has been a growing interest in thermophiles as biotechnological hosts [9]. Thermophilic species of Bacteria and Archaea are promising candidates for a number of applications, from production of chemicals [10]; [11]; [12] to extraction of metals from mineral ores [13]. Geobacillus species are also keeping up with this trend. For example, successful metabolic engineering of ethanol [14]; [15] and isobutanol [16] production in G. thermoglucosidasius was achieved. Some Geobacillus strains have also been used for heterologous protein expression. A protein from an archaeon, which was insoluble when expressed in Escherichia coli, was successfully folded in G. kaustophilus [17]. This illustrates an important PLOS ONE | DOI: 10

DNA manipulations
Genomic DNA was extracted using the Wizard 1 Genomic DNA Purification Kit (Promega) according to producer's specifications. Plasmid extractions were performed using NucleoSpin 1 Plasmid EasyPure kit (Macherey-Nagel).

PCR and cloning
Primers used in this study are listed in Table 2. PCR of DNA fragments for USER cloning was performed with primers containing uracil using the Phusion U Hot Start DNA Polymerase (Thermo Fisher Scientific). Colony PCR was performed with Taq 2x Master Mix (New England Biolabs) in order to detect positive colonies. Reactions were done according to manufacturers' recommendations with elongation times and annealing temperatures adjusted for specific targets and primers. In most cases annealing temperature was 60˚C and elongation time was programmed at 30 seconds per 1 kb. DNA cloning was performed using USER (uracil-specific excision reagent) technology. It is a simple and robust method, allowing seamless DNA insertions [33]. PCR-amplified DNA fragments containing a primer-incorporated uracil close to both of their 5'-ends were mixed (purification after PCR was not necessary) and treated with DpnI enzyme (Thermo Fisher Scientific) for 30 min at 37˚C to digest template DNA. USER™ enzyme (New England Biolabs) was then added, and the mixture was incubated in three steps: 1) 37˚C for 15 min; 2) 12˚C for 15 min; 3) 10˚C for 10 min. It was then transferred on ice and mixed with chemically competent E. coli cells. Transformation of E. coli and G. thermoglucosidasius The procedure was based on the protocol described by [25] with some steps modified. G. thermoglucosidasius was grown overnight on an mTGP agar plate at 60˚C. A single colony was inoculated into 50 mL of pre-warmed liquid mTGP in a 250 ml flask and incubated at 60˚C and 250 rpm until the culture reached OD 600 of 1.6-2. Cells were cooled down on ice for 10 min and harvested by centrifugation at 4000 g for 10 min. They were washed three times (4000 g for 10 min) with freshly prepared ice-cold electroporation buffer. The buffer contained, per 100 mL: 17.12 g sucrose, 0.042 g MgCl 2 Á6H 2 O, 5 mL glycerol. After the last washing step, the cell pellet was suspended in 1 mL of electroporation buffer, distributed in 60 μL aliquots and stored at -80˚C until further use. For the transformation, an aliquot was thawed on ice and mixed with DNA. It was transferred into an electroporation cuvette with a 2 mm gap between electrodes and subjected to a discharge at 2 kV, with a typical time constants of 4-5 ms using the MicroPulser™ (Bio-Rad). Cells were dissolved in 3 mL mTGP and recovered at 52˚C for 2 hours at 200 rpm. Afterwards they were spun down and seeded on selective agar media plates. Transfromation efficiencies typically were 10 1 −10 2 colonies per microgram of DNA.

sfGFP measurement
The sfGFP [34] was used as a reporter to assess the expression levels. It was previously shown to be active in Geobacillus species [27]. For quantification of sfGFP expression driven by different promoters and RBS's, Geobacillus strains carrying the respective constructs were grown overnight at 60˚C in TMM with 0.05% yeast extract and 0.2% glucose. 2 μL of these cultures were inoculated into 100 μL of fresh pre-heated media in flat-bottom 96-well microtiter plate (Greiner Bio-One) and sealed airtight with VIEWSeal (In Vitro) to prevent water evaporation. Plates were incubated at 60˚C and 200 rpm. Periodically fluorescence was measured with the ELx808™ Microplate Reader (BioTek) with the excitation at 485 nm and emission at 535 nm. Values at the middle of log phase were taken for analysis. Fluorescence was normalized to OD 600 measured at the same time.

Xylanase assay
Xylanase activity was measured with EnzChek 1 Ultra Xylanase Assay Kit (Life Technologies) according to manufacturer's instructions. Briefly, cells were grown for 21 hours, reaching similar densities, harvested and lysed with CelLytic™ B Plus Kit (Sigma-Aldrich). Cell lysate and supernatant from cultures were diluted and 50 μL of dilutions were mixed with 50 μL of xylanase substrate working solution in flat-bottom 96-well microtiter plate (Greiner Bio-One). They were incubated at room temperature for 40 min and the release of reaction products was measured with the ELx808™ Microplate Reader (BioTek) with the excitation at 360 nm and emission at 460 nm. Total protein content was measured with Novagen 1 BCA Protein Assay Kit (Merck) and xylanase activity was normalized to it.

Growth of Geobacillus strains on various carbon sources
In order to assess biotechnological potential of Geobacillus spp. we analyzed the ability of four strains to utilize a number of carbon sources: G. thermoglucosidasius 2542 T which was previously used in metabolic engineering of isobutanol production [16]. G. thermoglucosidasius M10EXG is a natural isolate with high tolerance to ethanol and which is thus a promising host for its production [31]. G. thermoglucosidasius C56-YS93 is another strain of the same species which genome is sequenced and annotated [32]. G. stearothermophilus NUB3621 has been used in a number of applications [35] and its genome has been also recently sequenced [27]. Bacterial cultures were grown in minimal medium (TMM) supplemented with a number of different carbon sources. These included sugars (glucose, xylose, arabinose) and more complex carbohydrates (cellobiose and xylan), which constitute a major part of lignocellulosic biomass. Glycerol and acetate were also included in the screening because they are cheap and suitable as industrial carbon sources. Most strains utilized a number of investigated carbon sources, but showed poor growth on xylan (Fig 1). Of these, G. thermoglucosidasius C56-YS93 had the highest growth yields on glucose combined with good growth on glycerol, acetate and cellobiose. In addition, its genome has been sequenced and annotated and is available online [32], which makes it easier to design and manipulate genetic changes in this strain. Therefore, we chose it for further studies.

Promoter library
In order to facilitate effective metabolic engineering strategies in Geobacillus, it is desirable to have access to a number of promoters with different strengths. A library of semi-synthetic promoters was therefore constructed using a method described by Jensen and Hammer [36]. It includes the randomization of promoter regions between -35 and -10 elements, while leaving these elements intact, as a way to vary promoter strength. This method has been used to construct promoter libraries for E. coli [37], Lactococcus lactis [36], and Saccharomyces cerevisiae [38]. Its advantages include the ease of library construction and gradual increments in strength among the resulting promoters [37].
Here we created a library of synthetic promoters for Geobacillus spp, based on the native and strong promoter of the groESL operon from Geobacillus sp. GHH01 (locus tag GHH_c02820, RefSeq GHH_RS01420). Its regulatory CIRCE sequence [39]; [40] was deleted and the sequences between and around its -35 (TTGCAA) and -10 (TAATAT) elements were randomized using a degenerate oligonucleotide sequence (PNJ388 in Table 1). The ribosome binding site (RBS) was left intact. Fusion of these constructs with sfGFP produced a library that was transformed into G. thermoglucosidasius C56-YS93. To evaluate the strength of the different promoters, superfolder GFP (sfGFP) fluorescence was measured at the middle of log phase. Low transformation efficiency of G. thermoglucosidasius limited the library to 17 constructs, which nevertheless covered a 76-fold range of expression levels (Fig 2). Two promoters in the library exhibited higher expression when compared to the groESL promoter, while two gave comparable expression levels as to that of the native, and the rest were weaker.

RBS library
Modulation of translation initiation is often used as a tool to regulate the level of protein production. An array of ribosome binding sites (RBS's) was therefore constructed using the RBS Calculator [41]; [42]. This software calculates the thermodynamics of interactions between the ribosome and the mRNA. Based on this model, it generates an RBS sequence with a given theoretical translation initiation rate. The model takes into account not only the Shine-Dalgarno sequence, but also sequences flanking it. Since the consensus sequence of bacterial RBS consists of six nucleotides, it is problematic to use RBS Calculator to compare its strength to that of the resulting RBS. In an alternative randomization approach, Bonde et al. [43] constructed a comprehensive library of almost all possible permutations of six nucleotides acting as RBS (the consensus sequence in E. coli being AGGAGG) and studied their effect on protein expression. We hypothesized that for a thermophilic organism like Geobacillus sp. it is worthwhile using rational RBS design because several of its strains are available in the database of RBS Calculator. RBS libraries described by Bonde et al. were created for E. coli and might not work in a different genetic context (and at different temperature) of Gram positive bacteria.
A set of RBS's with a range of different predicted translation initiation rates was created and fused with the promoter P pfl of the pyruvate-formate lyase gene (pfl) of G. thermoglucosidasius C56-YS93 (locus tag Geoth_3895, RefSeq GEOTH_RS19245), where the native RBS was replaced by a synthetic one. sfGFP was again used as a reporter for the screening. Two of the tested RBS's showed low expression levels, while the rest resulted in middle to high expression levels (Fig 3).

Inducible promoter
Inducible promoters are valuable tools for various applications in molecular biology, because they enable the modulation of gene expression as a function of the concentration of the inducing factor. Here we investigated a xylose-inducible promoter of the xylose isomerase gene (xylA), because its homologues in Bacillus species have been extensively studied [44]; [45] and used for protein production [46]. The operator sequence of xylA gene in G. thermoglucosidasius (5'-TTAGTTTATATGATAGACAAAC-3') shares 73% similarity with that of B. subtilis.
The promoter from the G. thermoglucosidasius C56-YS93 xylose isomerase (xylA, locus tag Geoth_2243) was examined by fusing a 160 bp region immediately upstream from the xylA gene to a gene encoding sfGFP on a plasmid. The expression of sfGFP was measured for cells exposed to a range of xylose concentrations from 0 to 0.5% (w/v) with either 0.5% (w/v) glucose or 0.5% (w/v) glycerol as a main carbon source. For the glycerol medium, a step-wise increase in sfGFP expression was observed as a function of increasing xylose concentration, while the level of induction was less pronounced when glucose was present in the medium (Fig  4). The dynamic range of expression also varied significantly, where 2-fold difference was observed in glucose medium compared to 6.5-fold when cells were grown on glycerol medium. The basal expression from the non-induced promoter in glucose medium was lower when compared to the one with glycerol.
A considerable basal expression from the uninduced P xylA was observed for both carbon sources. We hypothesized that it might be due to a repressor protein being titrated out by multiple copies of the extrachromosomal P xylA -sfGFP construct. Based on the homology of xylA and its operator to those in B. subtilis [45], the regulation mechanism of xylA expression may likely be similar in G. thermoglucosidasius as it is in B. subtilis, where XylR is a repressor of xylA gene expression [44]. Hence, in order to make a tighter promoter system, we expressed a putative xylR gene (Geoth_1256) with its native promoter and terminator on the same plasmid. This resulted in a decrease in basal sfGFP expression, although some expression still remained (Fig 4). At zero or low inducer concentrations, additional copies of xylR decreased sfGFP expression. However, the effect was reversed at higher concentrations (Fig 4). Under these conditions overexpression of XylR surprisingly resulted in higher expression from P xylA .
The sfGFP expression levels in pIP26 (xylR + P xylA ::sfGFP) differed almost 12-fold between uninduced and fully induced conditions when cells were grown in medium containing glycerol as a carbon source.
Xylanase production using the P xylA expression system Many Geobacillus species possess a conserved cluster of about 200 kb within a genome containing the genes for xylan utilization, notably a number of xylanases [19]. Xylanases are widely used in paper mill industry, animal feed processing and bakery, and the use of thermostable enzymes is also advantageous in certain fields. Therefore, we sought to use the strain and tools characterized above to overexpress two enzymes: a endo-1,4-β-xylanase native to G. thermoglucosidasius C56-YS93 (locus tag Geoth_2264, RefSeq GEOTH_RS11140) and the xylanase T-6 encoded by xynA in G. thermodenitrificans NG80-2 (locus tag GTNG_1761, RefSeq GTNG_RS09220) as relevant models for homologous and heterologous protein expression. In addition, xylanase T-6 has a putative N-terminal 28-amino acid signal peptide (MLKRSRKAIIVGFSFMLLLPLGMTNALA) predicted by SignalP 4.1 server [47] that potentially enables it to be secreted from the cell. Endo-1,4-β-xylanase (Geoth_2264) lacks a signal peptide.
To demonstrate the applicability of the inducible P xylA for protein expression, two xylanases were put under control of this promoter and expressed in the presence of the inducer (xylose). As shown in Fig 5, most of the xylanase T-6 activity (70%) was observed in the supernatant, indicating that it was secreted from the cell. Thus, in this case, a signal peptide from one species (G. thermodenitrificans NG80-2) was active in the other (G. thermoglucosidasius C56-YS93). The endo-xylanase was also successfully overexpressed and showed relatively high intracellular activity.

Discussion
In this study we generated a set of tools for gene expression in G. thermoglucosidasius and characterized their use for homologous and heterologous protein production.
A library of ribosome binding sites was developed using the RBS Calculator [41]. It was previously shown that a computational model based on the thermodynamics of RNA binding to ribosome does not always accurately predict the actual translation efficiency [48]. Factors other than the strength of the Shine-Dalgarno sequence might play a role [49]; [50]. In this study the sfGFP expression levels generally correlated with predicted translation initiation rates, except for one outlier. One of the factors that may have influenced the accuracy of prediction in this study was the default settings of the RBS Calculator v1.1. In this version the default temperature is 37˚C and could not be adjusted for growth optimum of 60˚C of the thermophilic G. thermodenitrificans NG80-2, which was used as a model. Although the number of RBS sequences tested (six) may be too low to make general statements on predictability of the RBS Calculator for Geobacillus species, the library is large enough for practical purposes of controlling gene expression, as it covers a relatively wide range of translation efficiencies. Future research may use the RBS' designed here in the context of the bicistronic architecture [51] to improve precision of protein biosynthesis, especially in cases of difficult-to-express proteins.
An inducible promoter of the xylA gene studied here showed a 12-fold dynamic range between uninduced and fully induced states, while at the same time demonstrating a significant basal activity. It is desirable for an inducible promoter to be tightly regulated, which means that it should have very low level of expression when not induced. In B. subtilis, the active repressor protein (XylR) binds to its motif in the promoter region. Additionally, the xylA gene is also negatively regulated by catabolite repression by CcpA in the presence of glucose-6-phosphate, where the cis acting element is the catabolite responsive element (CRE), a 14-bp sequence within the upper part of xylA [52]. In addition, glucose-6-phosphate can act on the activity of XylR itself [52]. However, CRE is absent in xylA gene in G. thermoglucosidasius, although the gene product is highly homologous (75% amino acid identity) to that of XylA from B. subtilis. Therefore, we could not use CRE to decrease the leakiness of xylA promoter (e.g. by fusing it to the heterologous gene). Importing the catabolite repression system from B. subtilis is hindered by its possibly lower thermostability. A homologue of the B. subtilis ccpA gene is present in G. thermoglucosidasius C56-YS93 genome (Geoth_0851). However, to the best of our knowledge, its target sequence is currently unknown.
Additional copies of the putative xylR gene did on the other hand reduce basal expression from P xylA . The repression by XylR was significantly more pronounced in the presence of glucose, which is in agreement with B. subtilis model. However, at higher concentrations of the inducer xylose, the presence of additional XylR resulted in increased P xylA activity, i.e. its repressor activity was reversed. This may be due to an unknown mechanism of XylR-mediated regulation in Geobacillus spp., so that at zero or low xylose concentrations XylR acts as a repressor, while at high concentrations in becomes an activator. Similar cases of such dual repressors/activators are known in some bacteria, as for example the Cra protein [53] and AraC regulator [54] in E. coli.
One possible way to decrease basal expression from an inducible promoter is to subject it to directed evolution. It involves applying error-prone PCR to a parent promoter in order to generate a library of promoters with random mutations. This library can then be screened for desirable properties. Apart from tighter promoter versions, a number of other useful properties could be searched for. These could include wider dynamic ranges, sensitivity (the rate at which induction increases with inducer), etc. [55].
Another possible candidate for an inducible system is the promoter of araD gene. AraD is a part of arabinose utilization system and in B. subtilis its expression and the expression of other genes in the same operon is induced by arabinose. It is controlled by the regulation protein AraR which binds to the operator sequence and acts as a repressor [56]. In the presence of arabinose it releases from DNA which makes the transcription possible. The arabinose utilization operons with regulatory and structural genes including araR and araD were characterized in at least one species of Geobacillus [57]. We also found that putative araD with a respective operator sequence (5'-ATTGTACGTACAA-3') and araR are present in G. thermodenitrificans NG80-2 and G. kaustophilus HTA426. Future work will be needed to characterize this and other inducible promoter systems in Geobacillus strains.
Apart from the inducible xylA promoter, a library of 17 constitutive promoters was created and quantified in this study. Importantly, the dynamic range of the inducible P xylA falls within the expression range of the library. This feature might find an application e.g. in cases where it is necessary to find an optimal expression level of a certain gene. It might be carried out by varying the activity of the inducible promoter, and afterwards placing the respective gene under the constitutive promoter of comparable strength.
This study provides a toolkit for controlled gene expression in G. thermoglucosidasius. Since there is a growing interest in Geobacillus spp. in both academia and industry, these tools would be valuable instruments for a number of different applications.