Exploring novel bacterial terpene synthases

Terpenes are the largest class of natural products with extensive structural diversity and are widely used as pharmaceuticals, herbicides, flavourings, fragrances, and biofuels. While they have mostly been isolated from plants and fungi, the availability and analysis of bacterial genome sequence data indicates that bacteria also possess many putative terpene synthase genes. In this study, we further explore this potential for terpene synthase activity in bacteria. Twenty two potential class I terpene synthase genes (TSs) were selected to represent the full sequence diversity of bacterial synthase candidates and recombinantly expressed in E. coli. Terpene synthase activity was detected for 15 of these enzymes, and included mono-, sesqui- and diterpene synthase activities. A number of confirmed sesquiterpene synthases also exhibited promiscuous monoterpene synthase activity, suggesting that bacteria are potentially a richer source of monoterpene synthase activity then previously assumed. Several terpenoid products not previously detected in bacteria were identified, including aromandendrene, acora-3,7(14)-diene and longiborneol. Overall, we have identified promiscuous terpene synthases in bacteria and demonstrated that terpene synthases with substrate promiscuity are widely distributed in nature, forming a rich resource for engineering terpene biosynthetic pathways for biotechnology.


Introduction
Terpenoids, or isoprenoids, are a large class of structurally diverse natural products, with more than 80,000 compounds described in the Dictionary of Natural Compounds (http://dnp. chemnetbase.com). The vast majority of terpenoids have been isolated from plants and fungi; however, bacteria are also known producers of volatile odoriferous metabolites. All terpenoids are synthesised from the universal C5 isoprenoid precursors isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), which are joined by isoprenyl transferases to form isoprenyl diphosphate substrates of varying lengths, such as geranyl diphosphate (GPP, C10), farnesyl diphosphate (FPP, C15) and geranylgeranyl diphosphate (GGPP, C20). Terpene synthases (TSs) convert the linear isoprenyl diphosphate substrates into structurally diverse recently increased to more than 600 regular class I terpene synthases, 400 geosmin synthases and over 120 2-methylisoborneol synthases, due to the ever-increasing availability of genome sequencing data [22,24,[26][27][28].
In this study, we further explore the potential for terpene synthase activity in bacteria. We have constructed a neighbour-joining tree containing 2,167 putative bacterial terpene cyclases/ synthases, including geosmin synthases, 2-methylisoborneol synthases, and regular TSs, by using the Pfam HMM motif containing the two signature domains of the class I terpene synthase for the analysis of publicly available bacterial genome sequences. We were especially interested in determining the biochemical function of presumptive TSs that are separated by the neighbour-joining analysis into isolated clusters that do not contain any other assigned function and belong to diverse bacterial species other than Streptomyces. We have selected 22 potential class I TSs from different clusters and a diverse range of bacterial species for e.g. thermophilic or thermotolerant Actinobacteria (Actinomycetes), Proteobacteria, Firmicutes, Flavobacteria and Myxobacteria for recombinant expression in Escherichia coli. We screened for potential TS activity with GPP, FPP, and GGPP substrates using a combination of in vitro assays on purified recombinant proteins, and in vivo assays employing an engineered E. coli strain containing a heterologous mevalonate (MVA) pathway [7,29]. TS activity was detected for 15 enzymes, and included mono-, sesqui-and di-terpene synthase activities. Active TSs was obtained from several Proteobacteria species, as well as radio-or thermo-tolerant Actinobacteria species. Interestingly, several enzymes were active on more than one prenyl-pyrophosphate substrate. In particular, a number of sesquiterpene synthases also exhibited monoterpene synthase activity, suggesting that bacteria are a potentially much richer source of monoterpene synthase activity then previously assumed. Several new terpenoid products not seen before in bacterial species were detected, including aromandendrene, acora-3,7(14)-diene and longiborneol.

Bioinformatic screening and selection of terpene synthases
Based on previous work [30], the HMM motif of Terpene_synth_C, which contains the two signature domains of class I TSs (DDXXD and (N,D)D(L,I,V)X(S,T)XXXE), with the HMM score larger than 23 was searched against 73,714 total protein sequences from 8,509 complete genomes of bacteria in the NCBI database (https://www.ncbi.nlm.nih.gov/genome/, 20 June, 2016), using the HMM search module in HMMER ver. 3.1b2 (hmmer.org). From the hits, a neighbour-joining tree of 2,167 known and unknown mono-, sesqui-and di-TSs was generated using the MAFFT program, ver.7.299 [31] with the options of-tree out-global pairreorder-distout, and visualized on the iTOL (https://itol.embl.de/) [32] (Fig 1).

Plasmid construction
For in vivo diterpenoid synthase activity, 8 TSs listed in Table 1 were amplified by pETM11-fw and pETM11-rev using a pETM11-TS as a template containing homologous sequences at both ends for Gibson assembly [33]. IspAM22 (D2G, C155G) inserts were created by amplification of the ispA gene from E. coli DH5α with primers containing the desired base changes (S1 Table). A primer extension method was used to amplify the full-length gene which was inserted into pBbA2k-ispAM22-TS (for full list see S1 Table) by annealing at 50˚C using Gibson assembly mix (NEB). For mono-or sesqui-terpene production pBbb2a-GPPS plasmid backbone was used for cloning and TSs (S1 Table) were amplified by pETM11-fw and pETM11-rev using a pETM11-TS as a template containing homologous sequences at both ends for Gibson assembly. Genes encoding selected TS from different bacteria were codon optimised for expression in E. coli (S1 Table), synthesized, and sub-cloned into pETM11 with N-terminal His-tag (Gen-eArt, Life Technologies). Protein sequence of the selected TSs are shown S2 Table.

Bacterial strains and growth conditions
For in vitro analysis, pre-cultures, LB medium (10 g/L tryptone, 5 g/L yeast extract, 5 g/L NaCl; pH 7.0) were used and cultures were incubated at 37˚C for overnight (containing kanamycin at 50 mg/L) in 5 mL. For protein expression, precultures were diluted (1/1000) in 5 mL 2xYT medium (16 g/L Tryptone, 10 g/L Yeast Extract, 5.0 g/L NaCl) and 50 μL was inoculated into Auto Induction Medium Terrific Broth (AIMTB, Formedium) and incubated at 37˚C until OD 600 = 0.4-0.6 was reached. At this OD, they were cooled to 16˚C and induced with 50 μM IPTG and incubated for 18-20 hours. Cultures were centrifuged at 10,000 rpm (JA10 rotor) for 10 minutes and the cell pellets were stored at -20˚C until further use [6]. Neighbor-joining analysis of bacterial TSs. Twenty two TSs selected in this study are named. TSs in red are those whose functionality could not be characterized in this work. The functionality of the selected TSs characterized in this study is indicated in colour: they produce either sesquiterpenoids (purple), diterpenoids (blue), or a mixture of sesqui-and mono-terpenoids (green). Known TSs were annotated in the tree and branches were indicated in colour based on their functionality, monoterpenoids (light blue), 2-methyisoborneol (orange), sesquiterpenoids (purple) and diterpenoids (blue) and also listed in S4 Table. https://doi.org/10.1371/journal.pone.0232220.g001

Heterologous expression and protein purification of selected bacterial TSs
One litre of grown cells of E. coli BL21 (DE3) or ArcticExpress (DE3) harbouring a pET-TS plasmid (S3 Table) were defrosted and re-suspended in 10 ml buffer A (binding buffer: 25 mM Tris-HCl, 0.5 M NaCl, 20 mM imidazole, 5 mM MgCl 2 and 1 mM Tris (2-chloroethyl) phosphate (TCEP), pH 7.8) and sonicated on ice using a 50% duty cycle at 50% power for 5 mins. Cell debris was removed by centrifugation at 40,000 g for 30 mins at 4˚C to separate the soluble protein fraction from the insoluble fraction. His-tagged recombinant proteins were purified by Ni-NTA affinity chromatography (Qiagen). Bound fractions were washed with binding buffer (2 × 10 mL/L culture) and eluted with buffer B (10 mL/L culture; 25 mM Tris-HCl, 0.5 M NaCl, 0.5 M imidazole, 1 mM MgCl 2 , and 1 mM TCEP, pH 7.8) as described [6]. The obtained fractions were analysed by SDS-PAGE to confirm the purity. The pure fractions were pooled and desalted using a PD-10 column according to the manufacturer's instructions [34].

FPP and GPP synthesis
Farnesol or geraniol (150 mg) were dissolved in trichloroacetonitrile (0.6 mL) and stirred for 30 mins. Acetonitrile (20 mL) was then added followed by the ammonium phosphorylation salt (0.7 g) which was added in aliquots over 10 mins. The reaction was stirred for 4 hrs and ran a TLC using propanol/concentrated ammonia/water, 6/2/1 solvent system and stain with phosphomolybdic acid (PMA). The organic phase (ether layer) was washed with 1 M aqueous ammonia (3 x 30 mL). The ammonia washes were combined and washed 3 times with fresh diethyl ether (3 x 100 mL). The aqueous layer was reduced to a residue using a rotary evaporator and purified using silica gel. 50 mg of FPP and 50 mg of GPP was synthesized to use for the in vitro enzyme activity analysis.

In vitro enzyme assays, compound extraction
All enzyme reactions were performed with freshly prepared protein. For mono-terpenoid (C 10 ) and sesqui-terpenoid (C 15 ) enzyme assays: 10 μg of purified TS protein was incubated with 100 μM of GPP or 75 μM of FPP in 1 mL of 25 mM Tris-HCl (pH-7.8) with 5 mM MgCl 2 (33). A 20% (v/v) organic layer (nonane for mono-and sesqui-terpenoids, n-hexane for diand sesqui-terpenoids) was added to the reaction mixtures to trap the volatile terpene products, followed by incubation at 28˚C for overnight with continuously shaking at 50 rpm. After incubation, the organic layer was removed and the reaction mixture extracted twice with 1 mL of hexane, dried over anhydrous MgSO 4 and further concentrated to 100 μL before analysis by gas chromatography-Quadrupole Time-of-Flight Mass Spectrometry (GC-QToF).

In vivo terpenoid production and extraction
Due to GGPP insolubility in the preferred assay buffer, identification of TS activity with GGPP was performed in in vivo conditions. For in vivo diterpenoid production, cells containing pBbA2k-EcispAM22 plasmid with the selected TS (S3 Table) were grown in 2xYT with 0.4% (w/v) glucose until OD 600 = 0.6 and induced with 50 nM anhydrous tetracycline (aTc) overlaid with an 20% (v/v) nonane, incubated at 30˚C for 36 hours. For in vivo production of monoand sesqui-terpenoids the cells containing pBbB2a-GPPS-TS and pMVA plasmid were grown in TB media with 0.4% glucose (w/v) and cells were induced with 50 μM of IPTG and 50 nM anhydrous tetracycline (aTc) at OD = 0.6 overlaid with 20% (v/v) nonane, incubated at 30˚C for 48 hrs. Nonane layers were harvested and clarified by centrifugation (14,000 rpm, 3 min and 4˚C), dried over anhydrous MgSO 4 and analysed by GC-QTOF.

Compound GC-MS analyses
The extracted terpenoids were subjected to GC-QToF (Agilent 7020) equipped with an Agilent Technologies 5977A MSD (Mass Selective Detector). The products were separated on HP5 capillary column (30 m, 0.25 mm i. d., 0.50 μm film, Agilent) using a temperature program of 50-280˚C with a temperature gradient of 20˚C/min and hold for 5 min at 280˚C for sesquiterpenoids. For analysis of monoterpenoids, a temperature program of 50-230˚C with a temperature gradient of 20˚C/min and hold for 2 min at 230˚C was used. For diterpenoids analysis a temperature program of 50-300˚C with a temperature gradient of 20˚C/min and hold for 5 min at 300˚C was used. The injector temperature was set at 250˚C with a split ratio of 10:1 (1 μL injection). The carrier gas was helium with a flow rate of 1 mL/min and a pressure of 5.1 psi. The ion source temperature of the MS was set to 250˚C, and spectra were recorded from m/z 50 to m/z 450 with electron impact mode (70 eV). Compound identification was carried out using authentic standards and comparison to mass spectra to library spectra [35], and NIST (National Institute of Standards and Technology) library of MS spectra and fragmentation patterns, as described previously [36]. Authentic standards 3-carene (115576-25ML), limonene (183164-5ML), linalool (L2602-5G), ocimene isomers (CRM40748), nerolidol (H59605) and geosmin (G5908-1ML) purchased from Sigma-Aldrich were used to validate the compounds produced in in vitro and in vivo conditions.

Ethical statement
As per our knowledge we have provided all the data and followed the ethical guidelines.

Identification of TS homologues
TSs from different origins show substantial differences in overall primary amino acid sequence, but possess a strongly conserved metal binding domain consisting of an acidic amino acid (AA)-rich motif (D/N) DXX (D/E) or DDXXXE located within 80-120 or 230-270 AA of the N-terminus and an Asn/Ser/Glu triad closer to the C-terminus, which are the signature domains of the class I TS. 2,167 protein sequences out of 73,714 proteins were identified to be TSs and their homologues based on the HMM search as described in the Material and Methods. In a previous study, 262 presumptive bacterial TSs were identified and 27 proteins were functionally characterised in in vivo conditions [37]. Several other bacterial TSs have been characterised and their activities identified with single or multiple substrates [24,28,36,[38][39][40][41]. In this study, presumptive TS homologue protein sequences were clustered by pairwise similarity, and from the resulting neighbour-joining tree (Fig 1) we identified TSs from Gram-positive bacteria, mainly from the order Actinomycetales, as well as Gram-negative bacteria belonging to numerous orders. We annotated 2,167 TSs in the neighbour-joining tree including geosmin synthases, 2-methylisoborneol synthases and several presumptive TSs whose functionality could not be assigned based on their protein sequences (Fig 1). TSs already identified in the MIBiG database (125 TSs; https://mibig.secondarymetabolites.org) [42] were marked when they had e-values of 0 by BLASTp search against the 2,167 HMM hit proteins. Many bacterial strains in the Proteobacteria and Firmicutes phylum have TS-like genes, while more than 70% contain the phytoene synthase motif. Enzymes containing this motif are mostly involved in phytoene biosynthesis, a tetra-terpene (C 40 ) precursor for lycopene biosynthesis. Geosmin synthases and 2-methylisoborneol synthases from bacteria that have been well studied were excluded and presumptive TSs from Actinobacteria and proteobacteria were selected for functional characterization in this study. In total 22 TSs were selected from different clades (Fig 1) based on one of the following criteria: (i) the sequences contain a single terpene synthase domain (330-350 AA) which belongs to the Isoprenoid Biosynthesis C1 superfamily; (ii) they are from Gram-positive, Gram-negative, thermophilic or thermo-tolerant bacteria; (iii) they are from bacteria that contain other known TSs and have at least 30-40% identity to known TSs but its functionality are not characterized. Functional characterisation of these TSs and identification of their products based on their acyclic allylic diphosphate substrate specificity requires experimental validation which is shown in this work.

In vitro functional characterization of TSs
Codon optimized selected TSs were DNA synthesised and cloned into pETM11 with a TEV protease cleavable N-terminal His-tag by GeneArt (S3 Table). For the heterologous expression of these TSs, the plasmids were transformed into either E. coli Bl21 (DE3) or ArcticExpress (DE3) (S1 Table) and grown in 2xYT media with optimal inducer concentrations to obtain soluble protein. Recombinant proteins were purified using nickel affinity chromatography and subsequently salt and imidazole were removed by a desalting column prior to in vitro activity assays. Terpenoids resulting from the incubation of the purified recombinant TSs from different bacteria with FPP were extracted using hexane (Fig 2; Table 1), while products formed with GPP were extracted using nonane (Fig 3; Table 1). Products were identified using authentic standards where possible. Purified recombinant geosmin synthase from Streptomyces coelicolor A3(2) [48] and crude extracts with overexpressed limonene synthase from Mentha spicata (LimS) [29] were used as a positive control to validate the in vitro assays for sesquiterpenoids and monoterpenoids, respectively. As expected, purified geosmin synthase yielded the sesquiterpenoids geosmin, germacradienol and germacrene D in the assay mixture upon incubation with FPP ( S2 Fig and S3 Fig), whereas the monoterpene limonene was formed when GPP was added to limonene synthase containing crude extracts (S4 Fig and S5 Fig).
Incubation of a recombinant TS (NCBI accession number AHY47823.1, RrNerS) from the radiation resistant, thermotolerant actinobacterium Rubrobacter radiotolerans [43] with FPP yielded a single product that was identified as trans-nerolidol by GC-QToF analysis (Fig 2A,  Table 1) and this was further confirmed using standards (S6 Fig). Incubation with GPP yielded both R-and S-linalool isomers, which were confirmed using standards (Fig 3A and S7 Fig,  Table 1). Therefore, this TS was annotated as linalool/nerolidol synthase.
Another terpene cyclase homologue, AHY45426 (RrBerS), from R. radiotolerans also showed activity with both FPP and GPP, yielding α-bergamotene (Fig 2B, S10A Fig, Table 1) and linalool (Fig 3B, S8 Fig, Table 1) respectively. Therefore, these two TSs are the first to be identified in thermotolerant bacteria, R. radiotolerans, which have both mono-and sesqui-terpene synthases activities and are active up to 60˚C (S9 Fig). These thermostable TSs can aid in engineering industrially important TSs to tolerate cultivation at higher temperatures [50].
The terpene cyclase (KYF56472.1, ScAroS) from the soil-dwelling Gram-negative bacterium Sorangium cellulosum converted FPP into only aromadendrene in in vitro analysis ( Fig  2C and S10B Fig; Table 1). Aromadendrene is mostly observed in eucalyptus oil and is also produced by a citrus TS, CsSesquiTPS5 [51] and it has been shown to have antibacterial activity to multidrug resistant Gram-negative bacteria [52]. KYF56472.1 was therefore named aromandendrene synthase.
The sesquiterpene cyclase (KFG92939, BpLonS) from Burkholderia paludis was incubated with FPP to yield two sesquiterpenoids, (±)-cadinene and longiborneol (Fig 2D and S10C Fig  and S10D Fig; Table 1) and was therefore assigned as longiborneol synthase. Incubation with GPP did not yield any observable products. Longiborneol synthases are mostly found in fungi [53] and Norway spruce [54]. Fusarium uses longiborneol as a precursor for producing the tricyclic mycotoxin culmorin [55]. This enzyme from B. paludis is the first bacterial longiborneol synthase identified.
AHH94051.1 (KaGerS) from Kutzneria albida DSM 43870 has 50% identity to germacradienol/ geosmin synthase (fragment) and we tested its functionality with FPP in in vitro conditions. This yielded germacradienol and germacrene D (Fig 2E and S10E Fig; Table 1) and did not yield any monoterpenes when incubated with GPP. Production of germacradienol was confirmed by comparison to spectra published by Agger et al (2008) [56].
Geosmin synthase from Streptomyces coelicolor A3(2) when expressed in E. coli produced various monoterpenes: β-myrcene, β-ocimene, linalool and geraniol (S13 Fig) which was unexpected and unexplored for this enzyme as well as the sesquiterpenes: germacradienol, germacrene D but geosmin was not detected, presumably due to inactivity of the C-terminal adomain in in vivo (S13 Fig).

In vivo production of possible diterpenes
For diterpene synthase activity, the substrate GGPP was synthesized in house but due to its hydrophobic nature the substrate was insoluble in aqueous solution and could not be used in in vitro enzyme assays. For rapid identification of diterpene synthase activity it is essential to generate abundant amounts of GGPP for in vivo production in E. coli. For this purpose, the native farnesyl pyrophosphate synthase (ispA) from E. coli with two mutations D2G, C155G (ispAM22), which can function as geranylgeranyl diphosphate synthase [65], was employed to generate GGPP in vivo. GGPP synthase activity using the variant prenyltransferase was confirmed by overexpression of spata-13, 17-diene synthase from Streptomyces xinghaiensis [41] together with GGPP synthase (ispAM22). This yielded the diterpene spata-13,17-diene, where the produced compound was extracted using a nonane layer (S15 Fig). ABU58787.1 (RcDTPS) from Roseiflexus castenholzii DSM 13941 (Fig 5A, S16 Fig), EJL71407.1 (CsDTPS) from Chryseobacterium sp. CF314 (Fig 5B, S17 Fig, and KFE96946 (ClDTPS) from Chryseobacterium luteum (Fig 5C and S18 Fig) produced various possible diterpene products when expressed together with ispAM22 in E. coli DH5α. Due to low product yields, the detected compounds could not be annotated by a NIST library search alone. By co-expressing the heterologous MVA pathway, the precursor supply for diterpene, production was enhanced which led to a 10-fold increased production. However, this was still not enough to determine the compounds produced. Large scale production and purification of the diterpenes would be required for structural determination by NMR analysis. In this study, we identified 15 putative bacterial TSs out of 22 that were tested, which produced structurally diverse mono-, sesqui-and di-terpenoids by in vitro and/or in vivo assays. We have shown TSs with new activity: 7 sesqui-terpenoid synthases, 3 di-terpenoid synthases and 5 mono-/sesqui-terpenoid synthases which were identified from Gram-positive, Gramnegative and thermophilic tolerant bacterial species. Many sesquiterpene synthases were shown to also have activity as a monoterpene synthase, which suggests that dual substrate specificity is very common for bacterial TSs.
In vivo production of all types of terpenoids by E. coli makes an attractive platform for rapid identification of enzymes as well as for better and cheaper production yields through metabolic engineering. Especially the newly discovered TSs from the thermophilic/tolerant bacterial species is very promising for further protein evolution studies to design the end terpene product, which we have started [50] and to exploit the enzyme stability at higher temperatures. In addition, production of diterpenoids in E. coli as an alternative screening method to in vitro assays is useful due to the difficulties with solubility of GGPP.
The results presented in this study suggest that further exploration of putative TSs from different bacteria along with Actinomycetes could expand the structural plethora of terpenes. The majority of the terpene products identified in this study are known to be produced by plant or fungi and reveals that TSs are widely distributed in bacteria. Given the number of uncharacterized bacterial enzymes that exist in nature, it is likely that there remains a wealth of chemistry to be discovered and exploited. Expanding the search for novel terpenoid biosynthesis will provide numerous structures with unexplored properties that could potentially help to develop novel compounds for pharmacological or industrial applications.
Supporting information S1