Biosynthesis of Sandalwood Oil: Santalum album CYP76F Cytochromes P450 Produce Santalols and Bergamotol

Abstract Sandalwood oil is one of the world’s most highly prized essential oils, appearing in many high-end perfumes and fragrances. Extracted from the mature heartwood of several Santalum species, sandalwood oil is comprised mainly of sesquiterpene olefins and alcohols. Four sesquiterpenols, α-, β-, and epi-β-santalol and α-exo-bergamotol, make up approximately 90% of the oil of Santalum album. These compounds are the hydroxylated analogues of α-, β-, and epi-β-santalene and α-exo-bergamotene. By mining a transcriptome database of S. album for candidate cytochrome P450 genes, we cloned and characterized cDNAs encoding a small family of ten cytochrome P450-dependent monooxygenases annotated as SaCYP76F37v1, SaCYP76F37v2, SaCYP76F38v1, SaCYP76F38v2, SaCYP76F39v1, SaCYP76F39v2, SaCYP76F40, SaCYP76F41, SaCYP76F42, and SaCYP76F43. Nine of these genes were functionally characterized using in vitro assays and yeast in vivo assays to encode santalene/bergamotene oxidases and bergamotene oxidases. These results provide a foundation for production of sandalwood oil for the fragrance industry by means of metabolic engineering, as demonstrated with proof-of-concept formation of santalols and bergamotol in engineered yeast cells, simultaneously addressing conservation challenges by reducing pressure on supply of sandalwood from native forests.


Introduction
Sandalwood is the general name for woody perennials of the Santalum genus (Santalaceae), which are exploited for their fragrant heartwood. Sandalwoods are slow growing hemiparasitic trees distributed throughout the tropical and temperate regions of India, Indonesia, Australia and the Pacific Islands [1,2]. The oil extracted from the stems and roots are highly sought after by the fragrance and perfume industry. Santalum album, also known as tropical or Indian sandalwood, is the most valuable of the commercially used species due to the high heartwood oil content (6-10% by dry weight) and desirable odor characteristics. Approximately 90% of S. album essential oil is composed of the sesquiterpene alcohols a-, b-, and epi-b-santalol and a-exobergamotol (Figure 1). The aand b-santalols are the most important contributors to sandalwood oil fragrance [3][4][5]. Lanceol and a-bisabolol are also found in modest concentrations [6]. While the demand for sandalwood oil is increasing, disease, grazing animals and unsustainable exploitation of sandalwood trees has led to the demise of many natural populations. Plantations provide a more sustainable alternative to wild harvesting; however, slow growth rates, high potential for disease and substantial variation in oil yield hamper productivity. Alternatively, chemical approaches to synthesize the santalols have been attempted [7][8][9], but multiple low-recovery steps make chemical synthesis uneconomical at an industrial scale.
Investigations into alternative, more sustainable strategies to produce sandalwood oil include improved plantation systems through development of predictive marker systems for oil biosynthesis in developing heartwood of the slow growing trees, and metabolic engineering of heterologous production systems. Key to these approaches is the elucidation of the biosynthesis of the santalols, bergamotols, and other sesquiterpene compounds characteristic of sandalwood oil. The first step in santalol and bergamotol biosynthesis is the generation of farnesyl diphosphate (FPP) from dimethylallyl diphosphate and isoprenyl diphosphate, catalyzed by FPP synthase (FPPS). FPP is cyclized by santalene synthase (SaSSy), a previously characterized sesquiterpene synthase [10], which produces a mixture of santalenes (a-, band epib-santalene) and a-exo-bergamotene. Since SaSSy generated four structurally similar products, it seemed plausible that a single, multi-substrate cytochrome P450 dependent monooxygenase (P450) could oxidize a-, b-, epi-b-santalene and bergamotene to produce a-, b-, epi-b-santalols and bergamotol, respectively ( Figure 1). Alternatively, different cytochromes P450 could be involved in the oxidation of the different santalenes and bergamotene.
Here, we describe the discovery, cloning and functional characterization of a family of ten S. album P450s of the new CYP76F subfamily and an NADPH-dependent cytochrome P450 reductase (CPR) involved in santalol/bergamotol biosynthesis.

Results
Gene Discovery and Full-Length (FL)cDNA Cloning A S. album trancriptome assembly of 31,461 isotigs was blastx searched for candidate CPRs and P450s potentially involved in the hydroxylation of santalenes and bergamotene. Two SaCPRs were identified using Arabidopsis thaliana CPRs (CAB58575.1, CAB58576.1) as search sequences. FLcDNAs SaCPR1 and SaCPR2 were 70% identical and 82% similar at the amino acid level. Searches for P450s were performed with a set of known plant P450s of the CYP71, CYP72 and CYP76 families, which include P450s with known functions in terpenoid biosynthesis [11][12][13].
Transcripts of the CYP76 family were among the most abundant P450s in the S. album transcriptome and assembled into two different isogroups and two individual isotigs (Table S1). Isogroup 1 consisted of 2,143 reads including 1,107 unique reads assembled into three isotigs. It generated a consensus sequence of 1,917 base pairs and an open reading frame (ORF) of 1,530 bp. Isogroup 2 consisted of 228 reads including 140 unique reads assembled into two isotigs. Both isotigs share a consensus ORF of 1,530 bp. A separate isotig consisted of 11 reads generating a partial sequence of 1,200 bp. Another separate isotig contained one partial sequence of 277 bp with several stop codons. Isogroups 1 and 2 were selected for FLcDNA cloning. PCR amplification  (12). Numbers match the numbers in Table 1 with primers designed according to isogroup 1 resulted in a single unique FLcDNA clone designated as SaCYP76F38v1. PCR amplification with primers based on isogroup 2 resulted in nine different cDNAs clones designated as SaCYP76F37v1, Sa-CYP76F37v2, SaCYP76F38v1, SaCYP76F38v2, SaCYP76F39v1, SaCYP76F39v2, SaCYP76F40, SaCYP76F41, SaCYP76F42, and SaCYP76F43. The predicted CYP76F proteins were 94-99% identical to each other and contained motifs characteristic of eukaryotic P450s including a proline-rich region near the Nterminal membrane-anchoring domain, the oxygen-binding domain and the highly conserved heme binding motif ( Figure S1). A blastp search of the deduced amino acid sequences against the NCBI GenBank protein database identified best matches to a putative P450 from Vitis vinifera (XP_002281735) with 62-64% identity, and CYP76B6 geraniol hydroxylase (CAC80883) from Catharanthus roseus [14] with 53-54% identity. A phylogeny with related plant P450s ( Figure 2) showed the S. album CYP76F proteins form two separate clades, I and II, and are closest to the CYP76B cluster of other species.

Expression of Recombinant SaCYP76Fs in Yeast
SaCYP76F FLcDNAs were expressed together with SaCPR2 in yeast cells, and microsomes were isolated for in vitro P450 enzyme assays. Microsome preparations for all ten SaCYP76Fs, except SaCYP76F43, displayed characteristic P450 CO difference spectra. The P450 content of the microsomal preparations ranged from 0.2 to 1.6 mM ( Figure S2).

In Vitro Functional Identification of Clade I SaCYP76Fs using a Blend of Sesquiterpenes
Microsome preparations were screened for sesquiterpene oxidase activity using NADPH and a defined sesquiterpene mixture of a-, band epi-b-santalene and a-exo-bergamotene as substrate. These sesquiterpenes are not commercially available and were produced by expression of SaSSy in yeast ( Figure S3A). Product formation was measured by gas chromatography mass spectrometry (GCMS).

In vitro Functional Identification of Clade II SaCYP76Fs using a Blend of Sesquiterpenes
In contrast to the clade I SaCYP76Fs, which each gave the same eight sesquiterpenol products, microsomes containing clade II members SaCYP76F37v1, SaCYP76F37v2, SaCYP76F38v1, and SaCYP76F38v2 gave only three products identified as (E)-a-exobergamotol (8) as the major product and (E)-a-santalol (7) and (E)b-santalol (12) as minor products ( Figure 5A-D). No activity was found with SaCYP76F43 ( Figure 5E) possibly due to low expression in yeast as evidenced by the corresponding CO difference spectrum ( Figure S2).

Substrate Specificity and Kinetic Properties of SaCYP76Fs
To test the range of substrates potentially converted by the clade I and clade II SaCYP76F enzymes, we assayed SaCYP76F37v1 and SaCYP76F39v1with a set of sesquiterpenes which resemble santalenes in the acyclic isoprenyl side chain ( Figure 6). Of the nine different substrates tested, SaCYP76F39v1 efficiently converted only the two santalenes, while it showed low activity with abisabolol and was not active with a-curcumene, zingiberene, bbisabolene, b-sesquiphellandrene, farnesene, and trans-nerolidol. These results demonstrated a narrow substrate selectivity of SaCYP76F39v1 with sesquiterpenes relevant for sandalwood oil biosynthesis. Similarly, SaCYP76F37v1 was selectively active with the two santalenes and trans-nerolidol. Apparent

Formation of Santalols and Bergamotol in Transformed Yeast Cells
To test the potential for using SaCYP76F cDNAs to produce santalols and bergamotol in vivo, we first expressed the previously characterized SaSSy and SaFPPS cDNAs [10] in yeast to form the known SaSSy products a-santalene (1), a-exo-bergamotene (2), epib-santalene (3) and b-santalene (4). These four sesquiterpenes were detected in transformed yeast cells ( Figure S6), but were not released with detectable amounts into the culture medium. No differences were observed between cells expressing SaSSy with or without the additional SaFPPS suggesting that endogenous yeast FPP is accessible for SaSSy to produce santalenes and bergamotene. We then tested product formation with the additional expression of SaCPR2 and SaCYP76F candidate cDNAs.  Figure 3A). The product peak for (Z)-a-exobergamotol (6) overlapped with a peak corresponding to (E,E)farnesol, which was produced in yeast independent of the SaCYP76F39v1 ( Figure 7B).
Apparently, a fraction of the sesquiterpenol produced by recombinant yeast expressing SaSSy, SaCPR2 and SaCYP76F39v1 were modified to unknown compounds (identified with hash marks in Figure 6A). When untransformed yeast cells were incubated with authentic sandalwood oil, we found the same unknown compounds ( Figure S7), implying that these compounds are not direct products of SaCYP76F39v1, but are produced by an endogenous activity of yeast converting sandalwood sesquiterpenols.
In vivo analysis of the other SaCYP76F clade I members gave product profiles with nearly identical ratios ( Figure S8) as observed with the corresponding in vitro assays with the microsomal preparations ( Figure 4). Yeast cells expressing clade II SaCYP76Fs produced mostly (E)-a-exo-bergamotol (8) similar to the products formed in the in vitro assays, but only traces of santalols (7 and 12) ( Figure S9). Again, no activity was found with CYP76F43.

Effect of CPR1 and CPR2
To test if substituting SaCPR1 and SaCPR2, which are 70% identical at the protein level, could affect changes in product profiles, we tested both CPRs in yeast in vivo experiments with representative class I and class II SaCYP76F, CYP76F39v1 and CYP76F38v1. No differences were observed in the products and their relative abundances.

Discussion
Using transcriptome analysis, cloning and functional characterization of recombinant P450s, we identified a new CYP76F subfamily in S. album involved in the biosynthesis of a-, band epib-santalols and bergamotols. The different SaCYP76Fs catalyze hydroxylations of santalenes and/or bergamotene products of SaSSy at the terminal allylic methyl groups. Clade I SaCYP76F enzymes produced both (Z) and (E) stereoisomers of a-, band epib-santalols and bergamotols. The P450 product ratios of (Z) and (E) stereoisomers of aand b-santalol were approximately 1:5 and 1:4, respectively, while the oil harvested from the mature heartwood of S. album trees contained mainly the (Z) alcohols [17,18]. There are several possible explanations for the difference in the ratio of stereoisomers found in the enzyme product profile and in the oil extracted from trees. Importantly, we excluded the possibility that the activity of SaCYP76Fs was non-specific towards a range of different substrates, since only products of SaSSy were preferred substrates when compared with other similar sesquiterpenes. However, it is important to note that conditions of yeast cells and in vitro assays are different compared to the physiological conditions in planta, which might explain the differences of product stereoisomers observed. It is possible that subtle changes in the shape and size of the active site under different conditions might result in the olefin precursors being oxidized in different configurations. It is also important to note that the products detected in in vitro microsome assays and in yeast in vivo assays were formed and accumulated over a period of minutes to hours. In contrast, the oil extracted from mature heartwood is the product of biosynthesis and accumulation that occurs over a much longer time period of many years. Isomerization, perhaps catalyzed by an isomerase, may be possible in the trees, however may not have been mimicked with the conditions of the in vitro or yeast in vivo enzyme assays used here. Although the ten P450s isolated in this work are the most abundant P450s in the sandalwood transcriptome sequences, it is also possible that additional sandalwood P450s exist that are similarly active on the santalenes and bergamotene substrates, but generating predominantly the (Z) stereoisomer. We will be exploring this possibility with further screening of the S. album P450 family.
The CYP76 gene family is part of the CYP71 clan, which includes P450 families involved in plant primary and secondary metabolism. Previously functionally characterized CYP76 members are involved in xenobiotic detoxification [19], oxidation of iridoid monoterpenoids [14,20], and oxidation of diterpenes [21,22]. The CYP76F members described here for sesquiterpene hydroxylation add a new dimension to the known functional space of the CYP76 family. The number of CYP76 genes is highly variable in different plant species. For example, papaya (Carica papaya) contains three CYP76 genes, A. thaliana has nine CYP76 genes, and grapevine (Vitis vinifera) has 24 CYP76 genes [11,23]. The ten S. album CYP76F members described here were identified based on transcriptome sequencing and may not represent the full complement of CYP76 genes of this species. In the absence of a genome sequence of S. album, it is not clear if any of these genes represent pairs of allelic variants. The S. album CYP76F members separate into two clades, clade I and II. Although there is overlap in their product profiles, clade I members formed preferentially santalols, whereas clade II members produced preferentially (E)-aexo-bergamotol.
The CYP76 and CPR cDNAs described here, combined with previously cloned santalene synthases [10], provide a biotechnology opportunity to produce valuable components of sandalwood oil. Our initial results demonstrate the potential of transformed yeast cells for production of santalols and bergamotols. As a proofof-concept, we reconstructed the pathways for biosynthesis of santalols and bergamotols in yeast cells using the multi-product SaSSy and SaCPR in combination with different multi-substrate SaCYP76Fs. These results provide a foundation for further metabolic engineering to improve yields and target product specificities.
The cloned terpene synthases [10,24] and P450s (this study) of sandalwood oil biosynthesis can also be explored as biomarkers to monitor the onset of oil formation in sandalwood plantations or for the development of genetic markers for tree improvement. In this context, it is important to note that very little is known about the cell types and the molecular events that control spatial and temporal patterns of the onset of biosynthesis of sandalwood oil. In fact, the spatial and temporal patterns of the onset of sandalwood oil biosynthesis are not well known, beyond the association of oil accumulation in the aging heartwood of sandalwood stems and roots. The aging heartwood of sandalwood trees provides an extremely difficult system to study with biochemical tools. Thus, the genes described here and in previous work [10] and their possible applications for metabolic engineering of sandalwood oil biosynthesis and the development of molecular markers are likely to become more important as worldwide demand for sandalwood products increase and as natural resources of S. album continue to decline.

Materials
The Saccharomyces cerevisiae yeast strain used in this study was BY4741 (MATa his3D1 leu2D0 met15D0 ura3D0). Escherichia coli a-Select Chemically Competent Cells (Bioline) were used for routine cloning and plasmid propagation. The sesquiterpene olefins a-, band epi-b-santalene, and a-exo-bergamotene are not commercially available, but can be produced by expression of SaSSy in yeast [10]. A sesquiterpene oil containing a-, band epi-bsantalene, and a-exo-bergamotene was produced in an industrial scale fermentation system by Allylix, Inc. (Kentucky, USA). The mixture was separated using silver nitrate impregnated TLC plates according to Daramwar et al. [25]; fractions were scraped from TLC plates and sesquiterpenes eluted with pentane followed by GCMS analysis for purity. Other sesquiterpenes, specifically bisabolol, trans-b-farnesene and trans-nerolidol were purchased from SIGMA. Zingiberine, a-curcumene, b-bisabolene and bsesquiphellandrene were from our in house collection of sesquiterpene standards isolated from natural sources.

Transcriptome Sequences
A cDNA library made from Santalum album xylem was sequenced with Sanger technologies generating 11,520 paired end sequences [10]. 454 Titanium sequencing of the cDNA library generated an additional 902,111 sequence reads. The transcriptome assembly was done using both the 454 and Sanger sequences with Roche Newbler assembler version 2.6 under default parameters, which generated a total of 31,461 isotigs.

Cloning of P450 and CPR FLcDNAs and Yeast Transformation
FLcDNAs were amplified by PCR using Phusion Hot Start II DNA Polymerase (Thermo Scientific) with gene specific primers (Table S2) and cDNA prepared from S. album wood cores and leaves as template. PCR conditions included initial denaturing at 98uC for 3 min, two cycles at 98uC for 10 sec, Tm-2uC for 20 sec, and 72uC for 30 sec, followed by 30 cycles at 98uC for 10 sec, Tm for 20 sec and 72uC for 30 sec, and termination for 7 min at 72uC. PCR products were gel purified and cloned into the pJET1.2 vector (Fermentas). Constructs designated pJET1.2-SaCYP76F37 through pJET1.2-SaCYP76F43, pJET1.2-SaCPR1 and pJET1.2-SaCPR2 were sequence verified. SaCYP76F FLcDNAs were subcloned into yeast expression vector pYEDP60 following the User Cloning method [26]. SaSSY (HQ343276) and SaFPPS (HQ343283) cDNAs [10] were cloned, respectively, into the NotI-Bgl II and BamHI-XhoI sites of the dual expression vector pESC-LEU2d by In-Fusion Cloning (Clontech). SaCPR1 and SaCPR2 were cloned individually into the EcoRI-NotI sites of the dual expression vector pESC-HIS (Stratagene). Plasmid transformation of yeast strain BY4741 was done using the LiCl method Gietz et al. [27]. Transformed yeast strains were selected on plates with appropriate synthetic complete drop-out selection medium and grown at 30uC for 48 h.

Microsome Preparation
For microsome isolation, BY4741 cells were transformed with plasmids harboring P450 or CPR. Microsome membranes were prepared from 250 ml cultures according to Pompom et al. [28]. In brief, a 5 ml overnight culture was used to inoculate 50 ml of SD-selective media starting at an OD 600 of 0.2 and grown at 30uC, 170 rpm for 24 h. A volume of 200 ml YPDE medium (1% yeast extract, 2% bacto-peptone, 5% ethanol, 2% dextrose) was inoculated with the 50 ml culture and incubated for another 24 h at 30uC, 170 rpm. Cells were collected by centrifugation for 10 min at 1,0006g and induced with 2% galactose in 250 ml YP medium at 30uC, 170 rpm for 12-16 h. Yeast cells were pelleted by centrifugation at 2,0006g for 10 min, washed once with 5 ml TEK (50 mM Tris-HCl pH 7.5, 1 mM EDTA, 100 mM KCl) and suspended in TES2 buffer (50 mM Tris-HCl pH 7.5, 1 mM EDTA, 600 mM sorbitol, 5 mM DTT and 0.25 mM PMSF). All subsequent steps were performed at 4uC. Yeast cell were disrupted mechanically using acid-washed glass beads (425-600 mm, Sigma) and vigorous manual shaking for 3630 sec. The cell homogenate was centrifuged at 10,0006g for 15 min followed by ultracentrifugation of the supernatant at 100,0006g for 1 h. Microsomes were suspended and homogenized in a buffer containing 50 mM Tris-HCl buffer pH 7.5, 1 mM EDTA and 30% (v/v) glycerol, and used directly for enzyme assays or stored at 280uC.

CPR Activity and P450 CO Spectra
Activity of recombinant SaCPRs was assayed using the Cytochrome C Reductase (NADPH) assay kit (Sigma). CO difference spectra of recombinant P450s were measured according to Guengerich et al. [29].

In Vitro P450 Assays
Microsome preparations containing candidate P450 and CPR were assayed for their capacity to oxidize sesquiterpenes. The reaction mixtures contained 50 mM potassium phosphate pH 7.5, 0.8 mM NADPH and 40 mM of substrate in a total volume of 400 ml. Enzyme reactions were initiated by adding 50 ml of the microsome preparation, incubated at 30uC for 2 h with shaking and stopped by adding 500 ml of hexane. The organic layer was transferred to a new GC vial and concentrated under N 2 gas to about 100 ml followed by GCMS analysis. For kinetic analysis, enzyme assays were performed as above with the following modifications: Assays were performed in a total volume of 400 ml with either 17 pmol of SaCYP7639v1 protein or 35 pmol of SaCYP7637v1 protein, and substrate concentrations of 12 to 138 mM of a-santalene or b-santalene; assays were incubated for 20 min.

Yeast Metabolic Engineering
To assess the production of santalols/bergamotol in a yeast system, the yeast strain BY4741 was co-transformed with plasmids containing cDNAs for SaFPPS, SaSSY, SaCPR, and a candidate CYP76F. Recombinant yeast was initially grown overnight at 30uC in 5 ml of 2% dextrose in minimal selective media. The next day, a 50 ml culture was initiated at a starting OD 600 of 0.2 and grown at 30uC with shaking at 170 rpm until the culture reached an OD 600 of 0.6-0.8. Expression was initiated by transfer into minimal selective media with 2% galactose and grown for 14-16 h. Yeast cells were harvested by centrifugation at 1,0006g for 10 min and washed once with 5 ml sterile ddH 2 O. Cells were extracted twice by vortexing for 1 min with 2 ml hexane and 250 ml acid-washed glass beads (425-600 mm, Sigma). Pooled extracts were transferred to a clean test-tube containing anhydrous Na 2 SO 4 and evaporated under a gentle stream of N 2 gas to about 200 ml. The samples were transferred to a GC glass vial for GCMS analysis or stored at 280uC.

GCMS Analysis
GCMS analysis was carried out on an Agilent 7890A/5975C GCMS system operating in electron ionization selected ion monitoring (SIM)-scan mode. Samples were analyzed on both an HP5 (non-polar; 30 m60.25 mm ID60.25 mm thickness) and a DB-Wax fused silica column (polar; 30 m60.25 mm ID60.25 mm thickness). In both cases, the injector was operated in pulsed splitless mode with the injector temperature maintained at 250uC. Helium was used as the carrier gas with a flow rate of 0.8 ml min 21 and pulsed pressure set at 25 psi for 0.5 min. Scan range: m/z 40-500; SIM: m/z 93, 94, 105, 107, 119, 122 and 202 [dwell time 50 msec]. The oven program for the HP5 column was: 40uC for 3 min; ramp of 10uC min 21 to 130uC, 2uC min 21 to 180uC, 50uC min 21 to 300uC; 300uC for 10 min. The oven program for the DB-wax column was: 40uC for 3 min; ramp of 10uC min 21 to 130uC, 2uC min 21 to 200uC, 50uC min 21 to 250uC; 250uC for 15 min. Chemstation software was used for data acquisition and processing. Compounds were identified by comparison of mass spectral with authentic standards and the NIST/EPA/NIH mass spectral library v2.0 and by comparison of retention indices with those appearing in other publications [15,16].

Phylogenetic Analysis
Phylogenetic analysis was performed using the software MEGA version 4 [30] employing the neighbor-joining (NJ) algorithm with default parameters. Bootstrap (500 replications) confidence values over 50% are displayed at branch points.