Modeling of paclitaxel biosynthesis elicitation in Corylus avellana cell culture using adaptive neuro-fuzzy inference system-genetic algorithm (ANFIS-GA) and multiple regression methods

Paclitaxel as a microtubule-stabilizing agent is widely used for the treatment of a vast range of cancers. Corylus avellana cell suspension culture (CSC) is a promising strategy for paclitaxel production. Elicitation of paclitaxel biosynthesis pathway is a key approach for improving its production in cell culture. However, optimization of this process is time-consuming and costly. Modeling of paclitaxel elicitation process can be helpful to predict the optimal condition for its high production in cell culture. The objective of this study was modeling and forecasting paclitaxel biosynthesis in C. avellana cell culture responding cell extract (CE), culture filtrate (CF) and cell wall (CW) derived from endophytic fungus, either individually or combined treatment with methyl-β-cyclodextrin (MBCD), based on four input variables including concentration levels of fungal elicitors and MBCD, elicitor adding day and CSC harvesting time, using adaptive neuro-fuzzy inference system (ANFIS) and multiple regression methods. The results displayed a higher accuracy of ANFIS models (0.94–0.97) as compared to regression models (0.16–0.54). The great accordance between the predicted and observed values of paclitaxel biosynthesis for both training and testing subsets support excellent performance of developed ANFIS models. Optimization process of developed ANFIS models with genetic algorithm (GA) showed that optimal MBCD (47.65 mM) and CW (2.77% (v/v)) concentration levels, elicitor adding day (16) and CSC harvesting time (139 h and 41 min after elicitation) can lead to highest paclitaxel biosynthesis (427.92 μg l-1). The validation experiment showed that ANFIS-GA method can be a promising tool for selecting the optimal conditions for maximum paclitaxel biosynthesis, as a case study.


Introduction
Plants are a rich source of active pharmaceutical components used in treatment of many diseases [1][2][3][4][5][6][7]. Some plant derived natural products (NP) such as aspirin have simple structure which could produce by chemosynthesis. However, chemical synthesis of some valuable plant NPs e.g. paclitaxel is difficult because of their complex structure [8]. On the other hand, the extraction of NPs from intact plant limit the commercial production of these compounds owing to low yield, environmental restrictions and extinction risk of these valuable pharmaceutical sources [9,10]. Therefore, using biotechnological approaches particularly plant cell culture named "green cell factories" is a promising bioproduction platform to overcome these limitations and produce plant NPs on a large scale [1,9,11]. Paclitaxel as the most well-known anticancer drug is widely used for the treatment of a vast range of cancers [12]. Taxus species are the main source of this fantastic diterpene alkaloid, paclitaxel. Nevertheless, Taxus recalcitrant behavior under in vitro culture is a drawback for fast-growing in vitro culture establishment of these valuable species [13]. Corylus avellana is a promising alternative for paclitaxel production because of its advantages including easy in vitro cultivation and fast-growing cells, and also extensive availability [13][14][15][16][17][18][19]. Large scale production of secondary metabolite (SM) through plant cell culture needs to apply several strategies including high-yielding cell line, growth medium optimization, precursor feeding, elicitation, etc. [10,20]. Amongst the available strategies for boosting the biosynthesis of SMs in plant in vitro cultures, the elicitation is considered as the most effective one [21,22].
The combined treatment of abiotic and biotic elicitors in Corylus avellana [23] and Taxus [24] cell cultures highly boosted paclitaxel biosynthesis. Methyl-β-cyclodextrin (MBCD) has lately absorbed striking attention as an agent eliciting paclitaxel in vitro biosynthesis [25,26]. On the other hand, our previous researches [9,15,27] indicated the positive influences of different elicitors derived from endophytic fungi, cell extract (CE), culture filtrate (CF) and cell wall (CW), on paclitaxel biosynthesis in C. avellana cell culture. Several factors including elicitor concentration level, cell culture age and elicitor exposure time affect the process of paclitaxel biosynthesis elicitation in C. avellana cell culture [5,11,21], and optimal selection of these factors is a determinative issue for maximum biosynthesis of this valuable SM. Nevertheless, optimization of elicitation process is time-consuming and costly. Modeling of paclitaxel biosynthesis elicitation using mathematical methods can effectively identify the non-explicit relationships among mentioned factors and predict optimal conditions for maximum paclitaxel biosynthesis.
Multivariate statistical methods including multiple liner regression (MLR), stepwise regression (SR), ordinary least squares regression (OLSR), principal component regression (PCR) and partial least squares regression (PLSR) have been used to model biological process [28][29][30][31]. MLR studies the relationship between two or more independent variables and one dependent variable [32]. SR is a well-known data-mining method selecting the explanatory variables for regression model from a group of input variables [33]. OLSR, PCR, and PLSR are three methods to model dependent variable when there is a large number of independent variables which are highly correlated. OLSR is a statistical method estimating the relationship amongst independent variable(s) and dependent variable by minimizing sum of square differences among the predicted and observed values of dependent variable [34]. PCR is a regression method established on principal component analysis (PCA) [35]. PLSR, combining PCA and multiple regression, is a powerful modeling technique especially when the factors (input variables) are highly collinear [36].
Different predictive and fitting abilities of MLR (R 2 = 0.32-0.91) [37] and SR (R 2 = 0.17-0.89) [38] were demonstrated in pear rootstocks tissue culture. Also, SR was used to model phenolic profile of grapevine foliar wastes, and displayed different ability (R 2 = 0.05-0.78) for predicting various phenolic compounds [28]. Additionally, great performance of OLSR was reported for predicting rainfed soybean and maize yields [39]. PCR (R 2 = 0.30-0.49) was likewise applied to model grain yields based on soil properties [40]. Moreover, PLSR model was successfully used to predict important management goals in land application [41].
Poor non-linear predictive and fitting abilities of traditional modeling methods [18,19,38,[42][43][44] have shifted the studies to the use of data mining techniques such as artificial neural network (ANN) and neuro-fuzzy models. These models are able to identify and learn correlated patterns between input variables and corresponding target values in a complex and nonlinear process.
ANN is a brain-inspired method that imitates the way that the human brain works [45]. It processes information and makes decision in systems involving vagueness and uncertainty [44,46].
Adaptive neuro-fuzzy inference system (ANFIS), hybrid of fuzzy logic and neural network, incorporates the advantages of both methods including learning capabilities, interpretability, quick convergence, adaptability and high accuracy, having no disadvantages of them [47]. ANFIS displays excellent performance in approximation and prediction of nonlinear relationships in various fields [47][48][49][50]. The successful application of ANFIS as an effective modeling method in various fields explains why it has been a popular modeling approach for years. It is highly regretful that ANFIS has not been used to model SM biosynthesis in plant in vitro culture.
Mathematical optimization has been successfully used in plant science [19,49,[51][52][53]. Genetic algorithm (GA) as a robust optimizing tool has been successfully used to optimize culture medium composition for pear rootstock proliferation [37,38], in vitro rooting of Prunus rootstock [54] and melon differentiation [55]. GA is the search algorithms inspired by natural selection and genetics concepts [56]. The fundamental principles of GA are the creation of an initial population of search solutions and then elite search solutions were selected for crossover using a roulette wheel selection method, which will ultimately be the best solution among them (Fig 1).
The objectives of this research were (a) to develop MLR, SR, OLSR, PCR, PLSR and ANFIS models to predict output variable "paclitaxel biosynthesis" based on input variables "concentration levels of MBCD and fungal elicitor, fungal elicitor adding day and harvesting time of cell suspension culture (CSC)" in C. avellana cell culture responding to fungal elicitors (CE, CF and CW), either individually or combined treatment with MBCD, (b) to compare performance of various mentioned models in term of prediction accuracy of paclitaxel biosynthesis, and (c) to optimize the mentioned factors for maximum paclitaxel biosynthesis by GA (Fig 2).

Elicitation of cell suspension culture
C. avellana CSC was established as described by Salehi et al. [9,14,15]. Three fungal elicitors, CE, CF and CW, were used for paclitaxel biosynthesis elicitation in C. avellana CSC. Endophytic fungus applied in this research was a strain of Coniothyrium palmarum isolated from inner bark of Taxus baccata. CE, CF and CW were prepared as described previously [9,27]. For elicitation, 1.5 ± 0.1 g of C. avellana cells (fresh mass) was cultured in 100 ml flasks containing 30 ml MS medium supplemented with 2 mg l −1 2,4-D and 0.2 mg l −1 BAP, then treated with fungal elicitors, either individually or a combined treatment with 50 mM MBCD.
Four concentrations (1, 2.5, 5 and 10% (v/v)) of fungal elicitors including CE, CF and CW, and also mid (day 13) and late (day 17) log phase of C. avellana cell cultures were selected for adding fungal elicitors. Control received an equal volume of water (for CE)/ potato dextrose broth (PDB) (for CF)/ water containing 1% (v/v) acetic acid (for CW).

Quantification of paclitaxel
The extraction of intracellular and extracellular paclitaxel, and also HPLC analysis were performed with a procedure described by Salehi et al. [9,14,15].

Experimental design
The experiments were planned based on a randomized complete block design (RCBD) with factorial arrangement, four factors containing MBCD, fungal elicitor type, fungal elicitor concentration, fungal elicitor adding day, and three replicates. The cultures were harvested in two-day intervals after elicitation until 23 rd day.

Model development
Regression and ANFIS models were individually developed for each of fungal elicitors. It is noteworthy that the models developed for each of fungal elicitors "CE, CF and CW" were designated as FCE-MOD, FCF-MOD and FCW-MOD, respectively. The data (S1 Table-S3  Table) were randomly divided into a training subset (75%) and testing one (25%), respectively. Training subset was applied to develop regression and ANFIS models, and testing subset was applied to test the predictability of developed models [57].

Adoptive neuro-fuzzy inference system (ANFIS) model
ANFIS models were developed for each of fungal elicitors (CE, CF and CW), either individually or in a combined treatment with MBCD, to define the influences of MBCD and fungal elicitor concentration levels, fungal elicitor adding day and CSC harvesting day on paclitaxel biosynthesis.
ANFIS is made up of "if-then" rules with suitable membership functions to obtain preliminary stipulated input-output pairs (Fig 3).
Basic rule structure of ANFIS with two inputs x, y, and one output "f" can be defined as follows:

PLOS ONE
where x and y denote input variables, f is output specified by the fuzzy rules, A i and B i are the parameters of fuzzy sets, and p i , q i , and r i are the linear parameters determined during the training process. As shown in Fig 3, ANFIS is made up of five layers and two types of nodes, indicated by square and circle. The adaptive node (square node) accepts parameters. The fixed nodes (circle node) accept no parameter.
Layer 1 (fuzzification layer): it calculates membership values or membership degrees (μ Ai and μ Bi ) for each of input variables by membership functions. In this research, Gaussian membership function (Eq (3)) was used as membership functions. The outputs of first layer are defined by Eqs (4) and (5). {a, b, c} is called premise parameter set, and determine membership function form.
Layer 2 (rule layer): each node (P) computes firing strengths (w i ) (Eq (6)) for fuzzy rules via the multiplication of input signals of corresponding fuzzification nodes.
Layer 3 (normalisation layer): each node (N) computes the ratio of its own rule firing strength to sum of all rules's firing strengths, normalized firing strengths, as given in Eq (7).
Layer 4 (defuzzification layer): each node computes weighted values of rules by the multiplication of normalized firing strength ( � w i ) and consequent function as given in Eq (8). Consequent function is first order polynomial of consequence parameters {pi, qi, ri}.
Layer 5 (summation layer): This single-node layer (S) computes final output by the summation of all incoming signals from all defuzzification layer nodes as follows: Hybrid learning rules, back-propagation algorithm and least-squares estimate [49,58] were used to tune premise parameter set and consequent parameters. The performance of ANFIS models is determined by three statistical criteria including root mean square error (RMSE) (Eq (10)), mean absolute error (MAE) (Eq (11)) and coefficient of determination (R 2 ) (Eq (12)).

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi jy est À y act j ð11Þ Where "y act " are the actual values, "y est " are the predicted values, and "n" is the number of data.

Optimization process by genetic algorithm (GA)
GA was applied to optimize the value of input variables (MBCD and fungal elicitor concentration levels, fungal elicitor adding day and CSC harvesting time) in developed ANFIS models for maximum paclitaxel biosynthesis. An initial population of 200, crossover rate of 0.7, generation number of 1000, mutation rate of 0.03 and uniform function as a mutation function, two-point crossover function, and a roulette wheel selection function were set to select the optimal levels of input variables (Fig 1).

Sensitivity analysis of the models
The sensitivity of paclitaxel biosynthesis against input variables (MBCD and fungal elicitor concentration levels, fungal elicitor adding day and CSC harvesting time) was determined by the criteria including variable sensitivity error (VSE) value displaying the performance (RMSE) of ANFIS model when that particular input variable is unavailable in the model. Variable sensitivity ratio (VSR) value was calculated as ratio of VSE and ANFIS model error (RMSE value) when all input variables are available. The input variable with higher VSR was considered as higher important variable in model. The mathematical codes for the development and evaluation of ANFIS and regression models were written using MATLAB software [59] and XLSTAT [60], respectively, and the graphs were made by GraphPad Prism 5 [61] software.

Validation experiment
CE, CF, CW and MBCD concentration levels, fungal elicitor adding day, and CSC harvesting time optimized by ANFIS-GA were examined to evaluate the efficiency of ANFIS-GA model for forecasting and optimizing paclitaxel biosynthesis in C. avellana cell culture responding to used elicitors. The culture conditions for C. avellana cell growth remained the same as mentioned above.

Modelling of paclitaxel biosynthesis using regression and ANFIS models
Elicitation is the most important and promising approach for increasing SMs biosynthesis in plant cell culture platform. However optimization of elicitation process is a key step for achieving this goal. The type, concentration level and adding day of elicitors, and also CSC harvesting time are effective factors in paclitaxel biosynthesis in C. avellana CSC elicited by different elicitors [9,15,27]. Predicting the optimal amount of these mentioned factors is highly promising and essential for paclitaxel biosynthesis increment and cost decrement.
To model paclitaxel biosynthesis in C. avellana cell culture responding to fungal elicitor and MBCD by regression and ANFIS methods, fungal elicitor and MBCD concentration levels, fungal elicitor adding day and CSC harvesting time were used as input variables, and paclitaxel biosynthesis as output variable. Here, various regression methods (MLR, SR, OLSR, PCR and PLSR) were tested to find the best regression model to predict paclitaxel biosynthesis in C. avellana responding to fungal elicitors and MBCD. All developed regression models for three elicitors "CE, CF and CW" showed statistically significant relationships between output variable "paclitaxel biosynthesis" and input variables (fungal elicitor and MBCD concentration levels, fungal elicitor adding day and CSC harvesting day)," (Table 1). MLR, SR, OLSR, PCR and PLSR models developed for paclitaxel biosynthesis regarding fungal elicitor and MBCD concentration levels, fungal elicitor adding day and CSC harvesting time were shown in Table 1. Goodness-of-fit of MLR, SR, OLSR, PCR and PLSR models was performed to detect the best model for predicting paclitaxel biosynthesis. High significant R 2 value and low RMSE and MAE values displayed the model capability.
As shown in Table 1 Table 1

Models
Training

PLOS ONE
subset suggested the best mentioned models can explain 17, 54 and 38% variability in paclitaxel biosynthesis in FCE-MOD, FCF-MOD and FCW-MOD, respectively, when they face with unseen data (Table 1).   (Figs 4-6). The great accordance between the predicted and observed values of paclitaxel was observed for both training and testing subsets (Table 1). Goodness-of-fit of developed ANFIS models showed that the developed models could accurately predict paclitaxel biosynthesis of testing subset (0.88, 0.90 and 0.94 in FCE-MOD, FCF-MOD and FCW-MOD, respectively), not used during training processes (Table 1). Also, developed ANFIS models displayed the balanced statistical values for both training and testing subsets ( Table 1). The statistical values for ANFIS models displayed the higher prediction accuracy as compared to regression models, as estimated R 2 for ANFIS vs. best regression models were 0.94 vs. 0.17 in FCE-MOD; 0.95 vs. 0.54 in FCF-MOD and 0.97 vs. 0.38 in FCW-MOD (Table 1). Comparing ANFIS and MLR, SR, PLSR, PCR and OLSR models performed regarding prediction accuracy displayed a higher accuracy of ANFIS models as compared to regression models ( Table 1). The superiority of artificial intelligence (AI) models than regression method was demonstrated in culture medium formulation for pear rootstock proliferation [37,38]. Also, AI displayed a higher accuracy as compared to regression method in predicting targeted phenolic profile of grapevine foliar wastes [28].
Our results suggested that ANFIS models could accurately predict paclitaxel biosynthesis in C. avellana CSC (Table 1). Very small absolute error values (Table 1) showed the high potential of ANFIS models in predicting output variable, paclitaxel biosynthesis.
Paclitaxel biosynthesis is the complex biological processes which require the accurate techniques for modeling and optimization. As observed in this study, AI technology has exhibited high powerful potential for modeling the complex relationships in biological systems, and displaying the superior prediction performances as compared to classical statistics [19,28,44].

Sensitivity analysis of the developed models
Regardless of previous studies on the effects of CE and CF concentration levels, fungal elicitor adding day and CSC harvesting time on paclitaxel biosynthesis, there remains the question to be answered; which input variables are the most important in paclitaxel biosynthesis. Sensitivity analysis determines the role of input variables in ANFIS models. Indeed, sensitivity analysis shed light on relationships between "MBCD and fungal elicitor concentration levels, fungal elicitor adding day and CSC harvesting time" and paclitaxel biosynthesis. To rank input variables based on their relative importance in the model, VSRs were estimated. VSRs were obtained for paclitaxel biosynthesis regarding fungal elicitor (CE, CF and CW) and MBCD concentration levels, fungal elicitor adding day and CSC harvesting time ( Table 2). Analysis of FCE-MOD indicated that paclitaxel biosynthesis was more sensitive to CSC harvesting time (VSR = 3.32), followed by fungal elicitor (CE) concentration level (VSR = 2.41), fungal elicitor adding day (VSR = 2.24) and MBCD concentration level (VSR = 1.89) ( Table 2). According to analysis of FCF-MOD, paclitaxel biosynthesis displayed more sensitivity to CSC harvesting time (VSR = 2.62), followed by fungal elicitor (CF) concentration level (VSR = 1.88), fungal elicitor adding day (VSR = 1.28) and MBCD concentration level (VSR = 1.05). Accordingly, paclitaxel biosynthesis in FCW-MOD exhibited more sensitivity to CSC harvesting time (VSR = 2.43), followed by fungal elicitor (CW) concentration level (VSR = 2.04), fungal elicitor adding day (VSR = 1.42) and MBCD concentration level (VSR = 0.98) ( Table 2). Overall, sensitivity analysis displayed that CSC harvesting time and fungal elicitor concentration level are the most important variables affecting paclitaxel biosynthesis ( Table 2).

Model optimization and validation experiment
GA has been efficiently used to solve problems with extremely difficult and unknown solution in various fields [19,38,54,62]. The optimization analysis on developed ANFIS models was performed using GA to determine optimal levels of input variables for achieving maximum paclitaxel biosynthesis in C. avellana CSCs ( Table 2). The optimization results of paclitaxel biosynthesis in FCE-MOD showed that adding 4.30% (v/v) of C. palmarum CE on 16 th day to cell culture pre-treated with 2 mM MBCD, and harvesting CSC 98 h and 53 min after elicitation   (Table 2).
To test the efficiency of ANFIS-GA models for forecasting and optimizing paclitaxel biosynthesis in C. avellana CSC responding to fungal elicitors and MBCD, C. avellana cell culture exposed to optimized input variables in ANFIS models using GA. C. avellana cell culture pretreated with 2 mM MBCD exposed to 4.30% (v/v) CE on 16 th day, and harvesting it 98 h and 53 min after elicitation, produced 278.12 ± 23.64 μg l -1 paclitaxel. Also, adding 8.4% (v/v) CF on 17 th day to C. avellana cell culture pre-treated with 42.35 mM MBCD and harvesting CSC 126 h after elicitation resulted in paclitaxel biosynthesis of 283.83 ± 21.39 μg l -1 . Validation experiment of FCW-MOD showed that C. avellana cell culture exposed to optimized input variables by GA (CW concentration: 2.77% (v/v); MBCD concentration: 47.65 mM; CW adding day: 16 th day; CSC harvesting time: 139 h and 41 min after elicitation) biosynthesized 402.92 ± 34.48 μg l -1 paclitaxel (Table 3). These results show validity of developed ANFIS-GA for forecasting and optimizing paclitaxel biosynthesis in C. avellana CSC responding fungal elicitors and MBCD. This is the first study on predicting the optimal conditions for maximum paclitaxel biosynthesis in C. avellana CSC exposed to fungal elicitors in combine with MBCD using ANFIS-GA.

Conclusion
This research applied regression and ANFIS-GA models for forecasting and optimizing paclitaxel biosynthesis in C. avellana cell culture treated with fungal elicitor and MBCD for the first time. The great accordance between the predicted and observed values of paclitaxel biosynthesis supports excellent performance of developed ANFIS models for modeling paclitaxel biosynthesis. Overall, AI models like ANFIS, effectively handle complex input-output patterns and display a supreme ability for modeling and forecasting results, and present an effective guidance for improving the biosynthesis of SMs in plant in vitro culture. Since this research focused on ANFIS as one of AI method for modeling paclitaxel biosynthesis in C. avellana cell culture, as a case study, it is recommended to evaluate other AI methods to model SM biosynthesis in plant in vitro culture in future studies.
Supporting information S1 Table. Levels of input variables related to paclitaxel biosynthesis in Corylus avellana cell culture responding to cell extract (CE) and methyl-β-cyclodextrin (MBCD). (DOCX) Table 3. Paclitaxel biosynthesis in Corylus avellana cell suspension culture (CSC) exposed to optimized fungal elicitor and methyl-β-cyclodextrin concentration levels, fungal elicitor adding day and CSC harvesting time in adaptive neuro-fuzzy inference system (ANFIS) models using genetic algorithm (GA), and predicted paclitaxel biosynthesis via ANFIS-GA.