Rapid Discovery and Functional Characterization of Terpene Synthases from Four Endophytic Xylariaceae

Endophytic fungi are ubiquitous plant endosymbionts that establish complex and poorly understood relationships with their host organisms. Many endophytic fungi are known to produce a wide spectrum of volatile organic compounds (VOCs) with potential energy applications, which have been described as "mycodiesel". Many of these mycodiesel hydrocarbons are terpenes, a chemically diverse class of compounds produced by many plants, fungi, and bacteria. Due to their high energy densities, terpenes, such as pinene and bisabolene, are actively being investigated as potential "drop-in" biofuels for replacing diesel and aviation fuel. In this study, we rapidly discovered and characterized 26 terpene synthases (TPSs) derived from four endophytic fungi known to produce mycodiesel hydrocarbons. The TPS genes were expressed in an E. coli strain harboring a heterologous mevalonate pathway designed to enhance terpene production, and their product profiles were determined using Solid Phase Micro-Extraction (SPME) and GC-MS. Out of the 26 TPS’s profiled, 12 TPS’s were functional, with the majority of them exhibiting both monoterpene and sesquiterpene synthase activity.


Introduction
Endophytic fungi have evolved to live within plant tissues without causing overt harm to their hosts. This endosymbiotic relationship involves continual interactions between host and fungi using a variety of signals, including exchange of secondary metabolites, that elicit specific biological responses [1]. Recent studies aimed at characterizing the various secondary metabolites produced by endophytic fungi revealed that many of these fungi emit a wide spectrum of volatile organic compounds (VOCs) while growing on plant and agricultural residues [2][3][4][5][6][7][8][9]. Not only do these VOCs play important roles in the biology of these fungi, they also supply a rich reservoir of potential compounds for medicinal and industrial applications. Many of these These precursors are then catalyzed by terpene synthases (TPSs) into monoterpenes (C 10 ), sesquiterpenes (C 15 ), diterpenes (C 20 ), and other compounds. Most terpene synthases belong to either the terpene synthase type I or type II superfamily, which can be distinguished by distinct motifs [1,18]. The catalytic reaction of type I terpene synthase involves carbocation formation by abstraction of two diphosphate groups from the substrate through complexation to two highly conserved motifs: the aspartate rich motif (DDXXD) and the NSE/DTE triad ND (L/I/ V)XSXXXE. The type II terpene synthase superfamily have a highly conserved DXDD motif that facilitates the formation of a carbocation by protonation of an epoxide or olefin [1]. To date, genome sequencing has uncovered more than a thousand different genes encoding terpene synthases in bacteria [19,20], fungi [21,22], and plants [23][24][25]. Recently, endophytic fungi have also been reported to produce a diverse spectrum of terpenes, including monoterpenes, sesquiterpenes, diterpenes, and other derivatives [2,5,7,26,27]. These terpenes are not only biologically active secondary metabolites with great pharmaceutical potential but they also have a high energy density, making them attractive renewable fossil fuel alternatives [2,5,[28][29][30]. However, there are few reports describing the discovery and characterization of the terpene synthase genes that produce these compounds [31].
In this study, we undertook a systematical approach combining genome dataset mining, terpene biosynthetic pathway construction in E. coli, Solid Phase MicroExtraction (SPME), and GC-MS analysis to rapidly discover and characterize endophytic terpene synthases. We sequenced four endophytic fungi in the order of Xylariales (Hypoxylon sp. CI4A, Hypoxylon sp. CO27, Hypoxylon sp. EC38, and Daldinia eschscholzii EC12) and mined their genomes for potential TPS genes. A total of 26 putative TPS genes were identified, of which 12 were functionally expressed in E. coli and produced a wide array of monoterpenes and sesquiterpenes.

Material and Methods
The discovery and phylogenetic tree analysis of the putative endophytic terpene synthases The putative endophyte TPS genes were identified by searching the endophyte genomes for terpene synthase Pfam functional domains. The protein sequences of the putative TPSs were downloaded from the endophyte genomes published by the Joint Genome Institute [32]. Secretion signal peptides were predicted using the Signal P4.1 online tool [33]. The endophytic TPS protein sequences were compared to each other and to several TPS from plants and other fungi. All protein sequences were aligned by Clustal W in MEGA 6.0 [34]. Neighbor joining trees were made by MEGA6.0 using the bootstrap method and Poisson model, with bar = 0.2 substitutions per amino acid residue. The sequence comparison of the endophytic TPSs with other plant and fungal TPSs was presented as rectangular trees, while the comparison amongst the endophytic TPSs was presented as a radiation tree.

Stains and plasmids
E.coli strains DH10B and DH1 were used for cloning and production, respectively. Plasmids pJBEI-3122, pBbE1a, and pBbE2k were previously reported [35]. Plasmid pJBEI-3122 contains genes encoding seven enzymes (AtoB, HMGS, HMGR, MK, PMK, PMD, and IDI) of the mevalonate pathway. The protein sequences of the TPSs in this study and the geranyl pyrophosphate synthase (GPPS, GenBank: AF513112.1, GPPS Ag ) from Abies grandis (with the chloroplast signal peptide truncated) were used to generate codon optimized genes for expression in E.coli. A Ribosome Binding Site (RBS) for each putative terpene gene was created and optimized using an online RBS calculator [36]. All the DNA sequences containing the RBS site and TPS or GPPS gene, flanked by BamHI and EcoRI sites, were synthesized by Genscript.

Reconstruction of the terpene biosynthetic pathway in E.coli strain DH1
Each synthesized TPS ORF, including the optimized RBS, was digested by the restriction enzymes BamHI and EcoRI and ligated by T4 DNA ligase (New England Biolabs, CA) into plasmid pBbE1a to create vector pBbE1a-TPS. The synthesized GPPS Ag DNA fragment was digested by BamHI and EcoRI, and ligated into vector pBbE2k using T4 DNA ligase to generate the plasmid pBbE2k-GPPS Ag . The complete terpene biosynthetic pathway was reconstructed in E.coli strain DH1 by co-transforming all three plasmids pJBEI-3122, pBbE1a-TPS, and pBbE2k-GPPS Ag . The plasmids pJBEI-3122 and pBbE2k-GPPS Ag were also co-transformed into strain DH1 as a negative control.

Production of the terpene compounds in E.coli
The transformants containing each TPS gene were cultured in 15 mL of LB medium with 100 μg/L of ampicillin, 34 μg/L chloramphenicol, and 25 μg/L of kanamycin. The cultures were incubated at 37°C shaking at 220 rpm overnight. One mL of overnight culture was then inoculated into 20 mL of fresh EZ-rich medium (Teknova, CA) containing 20g/L glucose as well as the three aforementioned antibiotics and incubated at 37°C with shaking at 220 rpm until an OD 600nm of 0.8 was reached. Then terpene production was induced by adding isopropyl-β-D-1-thiogalactopyranoside (IPTG) at the final concentration of 1mM and incubating for another 20 hours at 30°C with shaking at 180 rpm. Terpenes were extracted after 48 hours.

GC/MS analysis of terpene
The volatile terpene compounds in the headspace of each culture were analyzed by extracting VOCs with a preconditioned Solid-Phase Micro-Extraction (SPME) syringe consisting of 50/ 30 divinylbenzene/carboxen on polydimethylsiloxane on a Stable Flex fiber followed by GC-MS. The SPME fiber was explored into the headspace of each culture flask for an hour to saturate with the volatile terpene compounds produced by the various TPS-expressing strains. The syringe was then inserted into the injection port of a Varian 3800 gas chromatograph containing a 30mx0.25mm i.d DB waxer capillary column with a film thickness 0.25μm. The column temperature was programmed as follows: 60°C for 4 min, increasing to 120°C at 10°C/ min and holding for 5 min, then increasing to 220°C at 20°C/min and holding for 2 min, then increasing to 250°C at 50°C/min and holding for 4 min. The carrier gas was ultra-high purity helium at a constant flow rate of 1 mL/min, and the initial column head pressure was 50Kpa. A two minute injection time was used to desorb the terpene compounds from the sampling fiber into an injection port (splitless mode, injection temperature-220°C) of the chromatograph coupled with a Saturn 2000 ion trap mass spectrometer. The MSD parameters were EI at 70eV, mass range was 30-500 Da, and the scan speed was 2 scans/sec. GC-MS data deconvolution was performed using the Automated Mass Spectral Deconvolution and Identification System (AMDIS) spectral deconvolution software package (v. 2.70, NIST Gaithersburg). AMDIS deconvolution settings were as follows: resolution (medium), sensitivity (low), shape requirement (medium), and component width at 10. Spectral components were searched against the NIST 2011 mass spectral library, and only components with mass spectra match factors > 85% were reported as tentatively identified compounds. Compounds with peak areas >1% of the total peak area in the chromatogram are reported. A large number of terpenes were identified by GC-MS, and to confirm their identity, several commercially available terpene standards were purchased from Sigma-Aldrich and analyzed using the same methodology. They are listed in S7 Table. All terpenes mentioned in this manuscript that do not appear in in S7 Table are considered only to be putatively identified.

Results and Discussion
Identification of terpene synthase genes in four endophytic fungal genomes TPS genes were identified in the genomes of Hypoxylon sp. CI4A, Hypoxylon sp. CO27, Hypoxylon sp. EC38, and Daldinia eschscholzii EC12 by homology searches against conserved TPS domains. A total of 26 putative TPSs were identified in the genomes of these four endophytes, including 7 TPSs from CI4A, 5 from CO27, 6 from EC38, and 6 from EC12. Analysis of the protein sequences determined that none of these TPS harbor signal peptides. Protein sequence alignments with known TPSs determined that all the putative fungal TPSs fall into the type I terpene synthase superfamily and harbor a highly conserved aspartate-rich motif (DDXXD/E). Also, all but cluster 5 TPSs (Fig 2) have a (N/D)DXX(S/T)XX(K/R) (D/E) NSE/DTE triad consensus sequence, which possess a X(D/K)XXXSXXRE triad (Table 1). Phylogenetic analysis of the 26 putative TPSs grouped all but four of them into five distinct clusters, suggesting that these four endophytic fungi may possess at least five distinct functional categories of terpene synthases (Fig 2A). The endophytic TPSs were also compared to several plant and fungal TPSs and were found to have low sequence similarity with all the plant TPSs and most of the fungal TPS, except for two uncharacterized putative TPSs from Trichoderma virens (EHKY27518) and Neurospora tetraspema (EGZ75309) that shared higher sequence similarity with the three endophytic caryophyllene synthases and EC12-GS, respectively, (Fig 2B and 2C). However, due to low sequence similarity to well characterized TPSs (Fig 2B and 2C), it's difficult to predict function using sequence information alone, necessitating a functional characterization of each putative TPS in order to determine its catalytic activity.

Expression of endophytic TPSs in E. coli
To determine their function, the 26 predicted TPS genes were codon optimized and expressed in E. coli along with the geranyl pyrophosphate synthase (GPPS) gene from Abies grandis (GenBank: AF513112.1, GPPS Ag ) and a plasmid harboring the entire mevalonate pathway [35]. This plasmid was used to increase the flux of carbon through the terpene pathway with the aim of enhancing productivity and increasing the chance that even poorly expressed TPS will produce detectable levels of terpenes. The VOC products of each TPS present in the headspace of the culture flask were extracted by SPME and analyzed with GC-MS. Out of the 26 putative endophytic TPSs tested, 12 were active, producing a mixture of mono-(C 10 ) and sesquiterpenes (C 15 ) ( Tables 2 and 3). TPSs in the same cluster tended to produce a similar spectrum of terpene compounds and are therefore discussed by cluster. The profiles of each TPS cluster are summarized in this section and described in detail below. No terpene compounds were produced by the TPSs in the cluster 1, and they are not discussed further. The TPSs in cluster 2 primarily produced monoterpenes, including pinene (1a, 1b), ocimene (1c), and limonene (1d), and a lower abundance (<20%) of sesquiterpenes. The TPSs in cluster 3 yielded a wide spectrum of sesquiterpenes and some monoterpenes. Caryophyllene (2d, 2d, 2g) and its isomers were the major product of these enzymes, accounting for up to 80% total peak abundance. The terpene profiles from TPSs in clusters 4 and 5 are less complex than cluster 3 TPSs, and include the sesquiterpenes chamigrene (3f), and gurjunene (2a, 2b). The non-clustered TPS EC12-SS (SS: Selinene Synthase) and EC12-ILS (IsoLedene Synthase) primarily produced the sesquiterpenes selinene (2h) and isoledene (5a), respectively. The activity of these TPSs correlated well with the terpene products produced by their native hosts. All the major terpenes (pinene, limonene, caryophyllene, chamigrene, gurjunene, selinene, and isoledene) produced from the functional TPSs were detected in the VOC profiles of the four endophytes grown on potato dextrose. The functional endophytic TPSs had low protein sequence similarity compared to other type I TPSs from plants, but retained a conserved DDXXD motif.
An examination of other reports that describe recombinantly expressed TPS indicate that these enzymes tend to produce a single class of terpene, i.e. monoterpenes or sesquiterpenes [1,13,[37][38][39][40][41][42]. There are a few reports using in vitro assays that show the production of both mono-and sesquiterpenes from high concentrations of GPP or FPP substrates [43]. However, it was never demonstrated that this bi-functionality extends to an in vivo activity, so it is unclear whether or not this is a phenomenon that would actually occur in nature. This study is   the first to demonstrate that TPSs can be bi-functional in vivo, producing both mono-and sesquiterpenes. It could be argued that the E. coli strain used in this study has artificially altered the levels of GPP and FPP, but several other TPS have been expressed in this strain that do not exhibit this characteristic, and many other recombinant strains also have altered isoprenoid precursor levels and none have had recombinant TPS that exhibit this behavior. Therefore, this phenomenon appears to be enzyme specific. It will be interesting to further investigate these enzymes to identify the structural features that enable this bi-functionality and to determine the impacts of GPP and FPP levels on product distribution. Also, it will be interesting to determine whether or not this is a widespread phenomenon that extends to the other TPS that have exhibited bi-functionality in vitro Cluster 2: bi-functional α-, β-pinene/α-guaiene synthases In cluster 2, the TPS EC12-PGS(PGS: Pinene and Guaiene Synthase, Tables 2 and 3) from Daldinia eschscholzii EC12 and the TPS EC38-PGS from Hypoxylon sp. EC38 were active and produced β-cis-ocimene (C 10 ,1c), β-pinene (C 10 , 1a), and 1s-α-pinene (C 10 , 1b) as major compounds (Fig 3A, S1 Table). The α-pinene, β-pinene and β-cis-ocimene accounted for 55.6% and 75% of total peak area of GC spectra from strains expressing protein EC12-PGS and EC38-PGS, respectively, which indicates that these two enzymes are pinene synthases. Additionally, the existence of two stereoisomeric products of pinene suggests that these two pinene synthases fall into class II pinene cyclases [42,44]. Supplementary S1 Fig outlines the mechanism of pinene biosynthesis [42]. β-cis-ocimene may be the product of deprotonation followed by intramolecular electrophilic attack of a linalyl cation derived from GPP [44] or it is a possible artifact, as it has been reported to form in the injection port of the gas chromatography instrument used to analyze the products via thermal rearrangement of pinene [45]. Further analysis is required to determine whether this is a bona fide TPS product or an artifact. Interestingly, a significant amount of the sesquiterpene α-guaiene (C 15 , 1d) was also produced by EC12-PGS (11.026% of total peak area) and EC38-PGS (8.16% of total peak area). Other minor products were detected as well, including α-selinene (C 15 , 2h), alloaromadendrene (C 15 , 2l) and its oxidation product viridiflorol(1e) [46] (Table 1 and S1 Table). The GPPS from Abies grandis used in this study was reported to specifically produce GPP, accepting only one DMAPP and one IPP co-substrates [47]. However, the E.coli strain used in this study harbors a native farnesyl pyrophosphate synthase (FPPS) gene(ispA), and is therefore the likely source of the FPP used to synthesize these sesquiterpenes [48]. The production of multiple monoterpenes and sesquiterpenes by these two TPSs indicates that they are bi-functional mono-/sesquiterpene synthases.

Potential applications for endophyte-derived monoterpenes and sesquiterpenes
Next generation biofuels are expected to have high energy density and physicochemical properties compatible with current engine design, transportation systems, and storage infrastructure. Hydrocarbons derived from terpenes meet most of these criteria as they are structurally similar to the compounds in petroleum distillate fuels, and often have similar combustion properties [66]. For example, hydrogenated pinene (C 10 ) dimers were reported to contain high volumetric energy similar to that of jet fuel JP-10 [67]. The hydrogenated product of the sesquiterpene bisabolene(C 15 ) was shown to have better properties than D2 diesel, such as lower cloud point, and higher flash point and API gravity [14]. The most abundant terpenes produced in this study are pinenes and sesquiterpenes, such as guaiene, caryophyllene, chamigrene, gurjunene, and selinene. These terpenes are hydrocarbons or hydrocarbon-like compounds with a carbon content in the C 10 and C 15 range and are therefore good candidates for "drop-in" aviation fuels. Simultaneous satisfaction of combustion specifications and specifications for physical properties such as density, energy content, and viscosity often require blending of different types of hydrocarbons. The use of terpenes and terpene derivatives as blendstocks for renewable fuels for aviation and diesel applications has recently been discussed by Harvey et al. [68], who determined that blending hydrogenated sesquiterpenes with synthetic branched paraffins could raise cetane numbers and reduce viscosity, producing biosynthetic fuels that meet applicable jet and diesel specifications.
In addition to their potential use as biofuels, most of the terpenes reported here are major components of essential oils used in the fragrance and flavoring industries (α-guaiene, β-chamigrene, α-gurjunene, etc.) and many have potential pharmaceutical applications. For example, it has been reported that pinene can act as an anti-tumor [69,70], and anti-repression [71] agent. Also, the sesquiterpene caryophyllene not only has been considered one of the top three most promising high energy "drop in" jet fuels [72] it also has multiple potential pharmaceutical applications, such as anti-cancer [73], anti-inflammatory [73][74][75], life-span elongation [76], neuroprotection [77], insulin secretion moderation [78], acute and chronic pain attenuation [79], and alcohol dependency release [80].

Conclusion
Previously, GC-MS was used to analyze the VOCs produced by four fungal endophytes (Hypoxylon sp. EC38, CI4A, CO27, and Daldinia eschscholzii EC12) and hundreds of terpene compounds were detected [2,9,81,82]. However, most of the TPS enzymes that synthesize these compounds have not been identified. In this study, we leveraged an E. coli strain harboring a synthetic mevalonate pathway for enhanced terpene production as a synthetic biology platform to screen 26 putative TPSs from these four fungi. The TPSs were identified and characterized by a combination of genomic data mining, phylogenetic analysis, protein sequence alignment, fast products extraction with SPME, and rapid chemical characterization with GC-MS. This approach avoids time-consuming and challenging conventional enzyme discovery routes, such as functional genomics library construction and screening, or biochemical purification of native enzymes, in addition to specific challenges for TPS enzymes, such as terpene compound purification and identification, and thereby establishes a valuable and rapid process for novel TPS discovery. Using this approach, we discovered 12 novel TPSs clustered into 4 homology groups that have potential uses in medicine and other industries, including the nascent biofuels sector [83].
Supporting Information S1 Fig. Mechanism for the biosynthesis of monoterpenes: α-, and β-pinene, α-limonene, 2-careen, β-ocimene, and τ-terpinene. The biosynthesis of pinene can be rationalized by postulating that GPP ionizes to a stable allylic cation, followed by collapse to linalyl diphosphate (LPP). The reionization of the LPP cisoid conformer followed by intramolecular electrophilic addition generates the transient α-terpinyl cation. Alternatively, an additional electrophilic attack on the newly formed cyclohexenoid double bond of α-terpinyl cation generates the pinane skeleton, which deprotonated by terpene cyclase II to form both αand β-pinene. (DOCX) S2 Fig. Mechanism for the biosynthesis of sesquiterpenes: α-, and β-caryophyllene, humulen, α-selinene, α-guaiene, and τ-gurjunene. Generally, FPP is ionized to generate an allylic cation, and then through a 11,1 closure to form humulyl cation and a subsequent deprotonation to yield α-caryophyllene, or through a 11,1 closure to form humulyl cation, another 2,10 closure to generate caryophylylcation and further deprotonated to form humulen-(v1) and βcaryophyllene. Also, FPP can be ionized and through a subsequent 10,1 closure and deprotonation, form germacrene A, which can be an intermediate for further intramolecular electrophilic attack, hydride shift, and deprotonation, yield α-selinene, α-guaiene, and τ-gurjunene. Chamigrene biosynthesis could begin with the ionization and subsequent allylic rearrangement of the diphosphate moiety of FPP, allowing for the formation of nerodidyl diphosphate (NPP, cisoid conformation). Reionization of the ciscoid conformation of NPP and subsequent intramolecular electrophilic attack would form a bisabolyl cation, which followed by a secondary intramolecular electrophilic attack and 1,4-hydride shift, would create a cuprenyl cation. A subsequent methylene migration would yield the chamigrenyl cation which could undergo a direct proton abstraction to form β-chamigrene. (DOCX) S1