Identification and in-silico characterization of taxadien-5α-ol-O-acetyltransferase (TDAT) gene in Corylus avellana L.

Paclitaxel® (PC) is one of the most effective and profitable anti-cancer drugs. The most promising sources of this compound are natural materials such as tissue cultures of Taxus species and, more recently, hazelnut (Corylus avellana L.). A large part of the PC biosynthetic pathway in the yew tree and a few steps in the hazelnut have been identified. Since understanding the biosynthetic pathway of plant-based medicinal metabolites is an effective step toward their development and engineering, this paper aimed to identify taxadiene-5α-ol-O-acetyltransferase (TDAT) in hazelnut. TDAT is one of the key genes involved in the third step of the PC biosynthetic pathway. In this study, the TDAT gene was isolated using the nested-PCR method and then characterized. The cotyledon-derived cell mass induced with 150 μM of methyl jasmonate (MeJA) was utilized to isolate RNA and synthesize the first-strand cDNA. The full-length cDNA of TDAT is 1423 bp long and contains a 1302 bp ORF encoding 433 amino acids. The phylogenetic analysis of this gene revealed high homology with its ortholog genes in Quercus suber and Juglans regia. Bioinformatics analyses were used to predict the secondary and tertiary structures of the protein. Due to the lack of signal peptide, protein structure prediction suggested that this protein may operate at the cytoplasm. The homologous superfamily of the T5AT protein, encoded by TDAT, has two domains. The highest and lowest hydrophobicity of amino acids were found in proline 142 and lysine 56, respectively. T5AT protein fragment had 24 hydrophobic regions. The tertiary structure of this protein was designed using Modeler software (V.9.20), and its structure was verified based on the results of the Verify3D (89.46%) and ERRAT (90.3061) programs. The T5AT enzyme belongs to the superfamily of the transferase, and the amino acids histidine 164, cysteine 165, leucine 166, histidine 167, and Aspartic acid 168 resided at its active site. More characteristics of TDAT, which would aid PC engineering programs and maximize its production in hazelnut, were discussed.

The highest and lowest hydrophobicity of amino acids were found in proline 142 and lysine 56, respectively. 29 T5AT protein fragment had 24 hydrophobic regions. The tertiary structure of this protein was designed using 30 Modeler software (V.9.20), and its structure was verified with the results of the Verify3D (89.46%) and ERRAT 31 (90.3061) programs. The T5AT enzyme belongs to the superfamily of the transferase, and the amino acids 32 histidine 164, cysteine 165, leucine 166, histidine 167, and Aspartic acid 168 were located at its active site. 33

Introduction 37
Cancer is known as a leading cause of death worldwide, and it is expected that the number of cancer patients 38 will increase to more than 22 million cases in the next 20 years [1]. Consequently, the demand for anticancer 39 drugs is growing rapidly. Paclitaxel (PC), sold under the brand name Taxol, is a chemotherapy medication used 40 to treat various types of cancer such as ovarian and breast cancers as well as AIDS-related Kaposi's Scarcorna 41 [2][3][4]. PC has a unique effect on cancer cells compared to other similar compounds and it inhibits cell 42 proliferation by binding to microtubules. This compound also promotes the stabilization of microtubules at the 43 G2-M phase of the cell cycle [3,5]. PC is a diterpenoid compound that was first extracted from the western yew 44 tree (Taxus brevifolia). In particular, Hazelnut (Corylus avellana L.) [6], some microorganisms like yew 45 endophytic fungi, and hazelnut endophytic fungi [7,8] are the PC's sources. All natural sources produce low 46 levels of PC [9][10][11]. Although PC can be synthesized through total chemical synthesis, it is found to be time-47 consuming, expensive, and low-yielding due to the complex chemical structure of this compound [12,13]. 48 Hence, in response to the increasing demand for the supplies of PC, new alternative approaches are required to 49 be developed. It has been well investigated that Taxus and C. avellana L. cell cultures are promising sources of 50 PC production [4,14,15]. Metabolic engineering introduces rational changes in the genetic makeup of an 51 organism to alter or improve its metabolic profile, and consequently to develop new "non-natural" products. The 52 control of these complex biosynthetic processes has been enabled by the understanding of the metabolic 53 pathways and advances in molecular biology techniques [16]. However, the amount of PC produced by tissue 54 and cell culture of C. avellana is low. The deep and detailed understanding of the PC biosynthetic pathway in 55 hazelnut, especially those genes encoding rate-limiting enzymes and the enzymes catalyzing these reactions, as 56 long with developments in metabolic engineering, account for developing practical and effective 57 biotechnological methods that result in a considerable increase in the amount of PC produced in hazelnut [17, 58 18]. 59 The PC biosynthesis pathway contains 19 enzymatic steps. This pathway starts from the universal precursor of a 60 diterpenoid called geranylgeranyl diphosphate (GGPP). GGPP is created from farnesyl diphosphate (FPP) and 61 isopentenyl diphosphate (IPP) by GGPP synthase (GGPPS) [19]. A few PC biosynthetic genes have been 62 identified in hazelnut, such as GGPPS (Gene Bank Accession No: EF 206343) and CgHMGR (Gene Bank 63 Accession No: EF553534) [18,20]. 64 The initial main precursor PC biosynthesis pathway is taxa-4(5), 11(12)-diene. This precursor is catalyzed by 65 taxadien synthase (TS) from GGPP [21]. Taxadien is hydroxylated at the C-5 position and produces taxa- 13α-hydroxylase (TαH). TαH like T5AT converts taxa-4(20), 11(12)-diene-5α-ol to taxa-4(20), 11(12) diene 5α, 71 13α-diol, but its next steps are unknown. Thus, the alternative branch in this pathway conquers taxadien-5α-ol-72 O-acetyltransferase (TDAT) could be an important target for metabolic engineering [23,24]. 73 In this article, we present the result of a novel study on the isolation and characterization of TDAT from the 74 methyl jasmonate (MeJA)-induced cell suspension of hazelnut. 75 76

In vitro cell culture chemical and biological reagents 78
In this research, the plant materials were hazelnut (C. avellana L.) cotyledon-derived calli, and cell suspensions. 79 For this purpose, hazelnut seeds were provided from Gilan Province of Iran, (37 were harvested after 72 h [26]. Subsequently, the harvested cells were kept in liquid nitrogen until they were 98 used for the next molecular studies. 99 100

EST assembly strategy for identification of TDAT 101
The TDAT coding DNA sequence (CDS) was obtained from C.
Extraction of RNA and cDNA synthesis 120 Total RNA was extracted from MeJA treated hazelnut cells (calli) with a Total RNA isolation kit, DENAZ II 121 ASIA, (cat No.: S-1010, Iran). The concentration and quality of the extracted RNA were analyzed using a 122 Nanodrop spectrophotometer (OneC, Thermo Scientific, USA) and confirmed with agarose gel electrophoresis. 123 Genomic DNA content was removed from the extracted RNA by RNase-free DNAse I (Thermo Scientific 124 (Fermentase), cat no.: ENo521, USA). The first-strand cDNA synthesis reaction was accomplished using a 125 Revert Aid first-strand cDNA synthesis kit (Thermo Scientific (Fermentase), cat no.: K1621, USA). 126 127

RT-PCR analysis and gene isolation with Nested-PCR 128
Because gene identification and isolation from cDNA depends on its expression, the cell suspension elicited They would amplify overlap fragments to approve the ORF specification (Fig. 1). The RT-PCR experiment was 134 operated using forward-1 and reverse-1 primers (Table 1) The results of cDNA sequencing were analyzed using Chromas V.1.14 software. Overlapping sequences were 151 edited using the CLC sequencing V.6.1 program, and the final sequence was used for subsequent studies. 152 Protein sequences that identified with more than 70% of the coding region of consensus sequences from 153 different species were chosen for sequence alignment and phylogenic analysis. The secondary structure of the T5AT protein was predicted and assessed using the software. The conserved area 164 of the T5AT gene was identified using the alignment of ortholog sequences for more precision and confidence. Peptide program (http:www.cbs.tdu.dk/service/signal/) was performed to predict the signal sequence of T5AT. 172 Epestifind software (emboss.bioinformatics.nl/cgi-bin/emboss/epestifind/) was used to rapidly identify the PEST 173 motif (proline (P), one aspartate (D), glutamate (E), and at least one serine (S), or threonine (T)).

Elicitor treatment of cell suspension and RT-PCR analysis 194
In the primary studies, no gene (TDAT) expression was observed in the control (none elicited) calli or cell 195 suspensions (Fig. 2). MeJA elicitor treatments (0, 50, 100, and 150 µM) were then used to induce TDAT gene

Bioinformatics analysis and Modeling of T5AT protein 220
To draw the phylogenic tree and protected regions, the alignment of amino acids was carried out with Clustal 221 Omega software. The HXXXDG motif is very stable and highly conserved. Its amino acid position is 164 to 169 222 of C. avellana (Fig. 3). 223 The results of phylogenetic analyses suggested that the TDAT gene in C. avellana is more closely related to 224 HHT in Quercus suber and it belongs to the same order as Quercus suber and Juglans regia (Fig. 4). One of the statistical parameters to genetically evaluate the evolution process is Omega (ω) value, the rate ratio 232 of nonsynonymous to synonymous substitutions (dN/dS). ω=1 suggests neutral expectation; ω< 1 indicates 233 negative (purifying) selection; while ω> 1 shows positive (diversifying) selection [29]. As shown in 234 Supplementary Figure  Based on the Signal P program, no signal peptide was determined for the T5AT protein. According to Porter's 262 data prediction, the T5AT protein is formed through the contribution of 29.6 % Helix, 26.1 % extended or Beta 263 strand, and 44.34% coils (Fig. 6). The secondary structure of the T5AT protein was confirmed with the same 264 result of the PSIPRED program with the line confidence of 0 to 9 digits. The PSIPRED prediction of the 265 location of the T5AT protein was performed in 6 positions with different scores. The activity site of this protein 266 in the plant cell was predicted to be in the cytoplasm (50%), nucleus (21.5%), mitochondrion (7.1%), 267 peroxisome (7.1%), endoplasmic reticulum (7.1%) and chloroplast (7.1%). 268 269 Fig. 6 The secondary structure of T5AT protein prediction with PORTER program. Query-length: 433. H= 270 Helix. E= strand. C = Coil. B = much buried. b = somewhat buried. e = somewhat exposed. E =very exposed. 271

272
Epestifind software was used to find the PEST motif as a potential proteolytic cleavage site. Altogether 4 PEST 273 motifs were identified in the T5AT protein from position 1 to 433. A potential PEST motif with 14 amino acids 274 was determined among the positions of 14 to 29, and three poor PEST motifs were considered among the amino 275 acids of 136 to 150, 319 to 337, and 378 to 398 (Fig. 7). 276 277 Fig. 7 The position of the PEST motif was detected with Epestifind software. 278

279
Using Protparam software, the molecular weight of the TDAT protein and theoretical pI were estimated to be 280 48017.62 Da and 6.43, respectively ( Table 2). The result of Protscale analysis showed that T5AT has 51 positive 281 electric charges and 53 negative electric charges, due to the presence of lysine/ arginine and aspartate/glutamate. 282 The details of the abundance of amino acids participating in the T5AT structure are summarized in software. The Z-score calculates the entire energy of the structure and shows the degree of consistency between 316 the sequence and the tertiary structure of the model [35]. This value was found to be -9.94 and -8.72 for the 317 pattern and MO323, respectively. The negative values represent the accuracy of the simulated structure; 318 accordingly, there is a high similarity between MO323 and the template. As shown in Fig. 9, the structure of 319 MO323 was located in the position corresponding with the X-ray structures. Thus, the obtained model is 320 sufficiently reliable and is empirically close to the 3D-structure of the pattern. Verify3D calculations showed 321 that 60.51% of the amino acids belonging to MO323 had a score of higher than 0.2. This value was 89.46% for 322 the template. This parameter determines the compatibility of the 3D-structure of the protein with its 1D-323 structure. Based on the obtained score, the accuracy of the simulated structure and its high quality can be 324 the template and MO323 were merged, and the overall similarity and minor differences between them were 331 examined with YASARA software. The result showed that the topology of two proteins' folding is significantly 332 similar, although minor differences between the model and the pattern were seen (Fig. 10). Differences are more 333 specified in the turn and coil regions. As shown in Fig. 11a plants and fungi. The HXXXD motif forms part of the active site of the enzymes in this superfamily [36]. This 339 motif in the T5AT at positions 164 to 168 of the protein structure contains the His-Cys-Leu-His-Asp amino acid 340 sequence that forms a turn between the eighth beta-strand and the tertiary alpha-helix (Fig. 11b). 341 342  In this study, taxadien-5α-ol-O-acetyltransferase (TDAT or TAT ) ortholog was detected and identified from C. 358 avellana L. using the experimental and in-silico analysis. C. avellana is known as a candidate for a PC 359 production source [37]. The full-length cDNA encoding TDAT from C. avellana (gene accession number: cases) and GAG (2 cases) for Glu. Thr106 in the 3D-structure of T5AT protein has been placed in the beta 368 structure and away from the active site of the enzyme. So alteration of this amino acid probably has little effect 369 on the active site. A reviewed 3D-structure of this enzyme showed that Gly24, Phe32, Phe34, Pro46, Lys47, 370 Ala79, Gly161, Ser307, and Pro376 are close to the active site, and alteration of these amino acids may affect 371 the function of these enzymes. 372 The taxadien-5α-ol-O-acetyltransferase gene has been identified in the yew, another main source of PC. This 373 gene, called TmTAT, was known in T. media. The ORF of the TmTAT gene was 1317 bp encoding 439 amino 374 acids [38]. The molecular weight of the T5AT protein was 48.17 KDa. The theoretical pI of T5AT was 643, and 375 the instability index for this protein was 37.94. The result of BLASTp indicated that the T5AT protein was 376 closely similar to the HHT protein in Juglans regia and Ziziphus jujube. The isoelectric point of HHT in J. regia