Design Space Development for the Extraction Process of Danhong Injection Using a Monte Carlo Simulation Method

A design space approach was applied to optimize the extraction process of Danhong injection. Dry matter yield and the yields of five active ingredients were selected as process critical quality attributes (CQAs). Extraction number, extraction time, and the mass ratio of water and material (W/M ratio) were selected as critical process parameters (CPPs). Quadratic models between CPPs and CQAs were developed with determination coefficients higher than 0.94. Active ingredient yields and dry matter yield increased as the extraction number increased. Monte-Carlo simulation with models established using a stepwise regression method was applied to calculate the probability-based design space. Step length showed little effect on the calculation results. Higher simulation number led to results with lower dispersion. Data generated in a Monte Carlo simulation following a normal distribution led to a design space with a smaller size. An optimized calculation condition was obtained with 10000 simulation times, 0.01 calculation step length, a significance level value of 0.35 for adding or removing terms in a stepwise regression, and a normal distribution for data generation. The design space with a probability higher than 0.95 to attain the CQA criteria was calculated and verified successfully. Normal operating ranges of 8.2-10 g/g of W/M ratio, 1.25-1.63 h of extraction time, and two extractions were recommended. The optimized calculation conditions can conveniently be used in design space development for other pharmaceutical processes.


Introduction
Danhong injection is a botanical injection used in the treatment of coronary heart disease, angina, myocardial infarction, and cerebral diseases [1]. The injection is made from Salviae miltiorrhizae Radix et Rhizoma (Danshen) and Carthami Flos (Honghua) using unit operations of extraction, concentration, ethanol precipitation, adsorption, etc. The water extraction process of mixed Danshen and Honghua affects both the drug efficacy and the drug safety of the Danhong injection. The optimization of the extraction process will contribute to an increase in the batch-to-batch consistency of the Danhong injection.
Quality by design concept has become an essential part of the modern approach to ensure pharmaceutical product quality [2,3]. Design space development plays an important role in the implementation of the quality by design concept [4,5]. To develop a reliable design space, process critical quality attributes (CQAs) and critical process parameters (CPPs) must be identified, and the mathematical models between CPPs and process CQAs must be built. Then, the design space can be calculated and verified. Model building is an important consideration in design space development. Both statistical models and mechanistic models can be used to calculate process design space [6,7]. However, for the separation processes in the manufacture of botanical drugs, building mechanistic models is usually very difficult because of the complex composition and lack of fundamental physical and chemical data. Therefore, statistical models between CPPs and process CQAs are usually used. Typically, quadratic models containing linear terms, nonlinear terms, and interaction terms are applied. These quadratic models are simple and can be a good approximation of the true relationships. The models can be developed after the design of the experiments (DOE). The response surface DOE such as central composite design and Box-Behnken design are widely used to establish the quadratic models [6,[8][9][10]. To remove insignificant terms, stepwise regression is usually used in model building. The significance level values in stepwise regression should be selected carefully.
Recently, probability-based design space has come to the forefront because it can provide the assurance of design space to meet all process specifications. Rozet et al. reinterpreted the definition of design space, which was "a multivariate domain of input factors ensuring that critically chosen responses are included within predefined limits with an acceptable level of probability" [11]. Bayesian modeling, bootstrapping techniques, and Monte Carlo simulation are three methods to calculate the probability of meeting the specifications imposed on the CQAs [11]. Peterson et al. gave several examples using the Bayesian predictive method to calculate the probability-based design space [4,12]. Our group developed design spaces for the ethanol precipitation process [13], the water precipitation process [14], and the extraction process [15] using a Monte Carlo simulation method. The Monte Carlo simulation method was also used successfully in the design space development for several analytical methods [16][17][18][19]. In the Monte Carlo simulation, random data following a given distribution will be generated. The data distribution type and simulation number are important factors that may affect the calculation results. However, these calculation parameters have not been published.
In this work, process CQAs and CPPs of the extraction process were selected. Quadratic models were built. Simulation number, calculation step length, data distribution type, and the significance level value of the stepwise regression were optimized in the Monte Carlo simulation. Different simulation results were calculated and compared. The probability-based design space was obtained using the optimized Monte Carlo simulation conditions. Then, the design space was verified. The characteristics of this calculation method are also discussed.

Materials and Chemicals
Danshen was purchased from Nepstar Drugstore (Hangzhou, Zhejiang, China). Honghua was purchased from Daily Healthy Drugstore (Hangzhou, Zhejiang, China). No specific permission was required for the field studies described in this paper. The locations are neither privately owned nor protected by the Chinese government. No endangered or protected species were sampled.
Standard substances, including rosmarinic acid, Danshensu, and lithospermic acid, were purchased from Winherb Medical S&T Development Co., Ltd. (Shanghai, China). Salvianolic acid B was purchased from Chengdu Biopurify Phytochemicals Ltd. (Chengdu, Sichuan, China). Hydroxysafflor yellow A was obtained from Aladdin Industrial Inc. (Shanghai, China). Deionized water was produced using an academic water purification system (Milli-Q, Milford, MA, USA). HPLC-grade formic acid was obtained from ROE SCIENTIFIC INC. (Newark, DE, USA). HPLC-grade acetonitrile was purchased from Merck (Darmstadt, Germany). HPLCgrade ammonium formate was obtained from Alfa Aesar China (Tianjin, China) Co., Ltd. All materials were used as received without any further purification.

Procedure
After 45 g of Danshen and 15 g of Honghua were placed in a round bottom flask, water was added. The flask then was heated using a heating jacket (TC-15, Haining Huaxing Instrument Co. Ltd, China). After reflux extraction for a period of time, the extract was collected by filtration. If the extraction number was more than 1, water was then added to the flask after filtration to extract mixed Danshen and Honghua again. The extracts were mixed before the measurement of active ingredient content and dry matter content.

Design of experiments
In this work, the three parameters of extraction time, extraction number, and the mass ratio of water and material (W/M ratio) were investigated. Table 1 shows the coded and uncoded values of the parameters. The run order is listed in Table 2. The ranges of the three parameters were set based on production experience. After the development of the design space, verification experiments were repeated three times with a reflux time of 1.6 h, W/M ratio of 8.3 g/g, and extraction number of 2.

Data processing
Eqs 1 and 2 were used to calculate the dry matter yield (DMY) and the active ingredient yield (ACY), respectively.
where DM is dry matter content, and M is the mass. Subscripts ext and mat refer to extract and material, respectively.
where AC is the active ingredient content and subscript i (i = 1, 2, . . ., 5) represents Danshensu, hydroxysafflor yellow A, rosmarinic acid, lithospermic acid, and salvianolic acid B, respectively. The experimental data were analyzed using Design-Expert 8.0.6 software (State-Ease Inc., MN, USA) to obtain response surface models. The mathematical model is shown in Eq 3.
where a 0 is a constant, a 1 , a 2 , . . ., a 9 are regression coefficients, and Y is a CQA. All the variables were coded before modeling. Insignificant terms were removed using a stepwise regression. The significance levels to remove or add a term were both set to 0.35. The design space was calculated using a Monte Carlo method with a self-written program of Matlab (R2011b, MathWorks, USA). In the calculations, the uncertainty of the measured data is considered. Random data for the active ingredient content and dry matter content in the supernatant are generated following a given distribution. The given distribution was assumed to be a normal distribution, a lognormal distribution, a square-root-normal distribution, or a reciprocal normal distribution. For the center point, the average value and the corresponding standard deviation value of each process CQA were considered the mean value and standard deviation value of the given distribution. For other experimental points, measured values were considered the mean values in the given distribution. Relative standard deviations (RSDs) of measured data were assumed to be the same as the RSD of the center point. For other experimental points, standard deviation values of the given distribution can then be calculated using the product of the RSD values and the measured values. After the generation of random data for all of the experimental conditions, each data set was applied to develop a model using a stepwise regression. All the models developed were used to predict process CQA values under a given set of conditions. The significance level values were set the same for the move-in and the move-out of model terms. The acceptable level of probability for the design space was set as 0.95. In the investigation of the Monte Carlo simulation conditions, the design space when the extraction number is 2 was calculated as a sample. Two criteria were calculated using coded values of CPPs, namely, average dimensionless size of the design space (ADSS) and the relative standard deviation of the dimensionless design space size (RSDDSS). Calculations were repeated 10 times to obtain the RSDDSS value.

CQA selection
According to the risk assessment results in our previous work [13], the extraction process significantly affects the active ingredient content, fingerprint similarity, and dry matter content of the Danhong injection. Phenolic acids from Danshen such as salvianolic acid B, lithospermic acid, Danshensu, and rosmarinic acid, and flavones from Honghua such as hydroxysafflor yellow A are considered active ingredients of the Danhong injection [21,22]. These active ingredients can be extracted easily with hot water. However, some active ingredients such as salvianolic acid B or hydroxysafflor yellow A also easily degrade in the extraction process due to hydrolysis or other reactions [23][24][25]. Accordingly, the yields of active ingredients in the extraction process are prone to fluctuations. Therefore, in this work, the yields of Danshensu, hydroxysafflor yellow A, rosmarinic acid, lithospermic acid, and salvianolic acid B are considered process CQAs. Impurities of saccharides, tannins, pigments, and inorganic salts are extracted simultaneously with the active ingredients, which leads to an increase in dry matter yield. The similarity of fingerprints is affected by both the active ingredient content and the impurity content in injections. Therefore, dry matter yield is considered another process CQA in this work. According to industry experience and literature results [26,27], the criteria for all of the CQAs were obtained and are listed in Table 3.

CPP selection
Ishikawa diagram analysis was performed to obtain an initial list of potential factors that affect the results of the extraction process, as shown in Fig 1. Four main causes are involved, including environment, material attributes, equipment, and extraction procedure. Water is the recommended extractant because a higher ethanol content in the mixed ethanol-water solvent results in a lower safflower yellow yield [28]. Extraction time, extraction number, extraction temperature, and solvent amount had a significant impact on safflower yellow yield [28,29]. Solvent amount, extraction time and extraction number were also considered to be important factors for the extraction of salvianolic acid B from Danshen [30]. Extraction temperature is determined mainly by the solvent composition for reflux extraction at atmospheric pressure. Therefore, the W/M ratio, extraction number, and extraction time are selected as CPPs of the extraction process of Danhong injection.

Effects of CPPs on CQAs
The experimental results of six CQAs in the extraction process are listed in Table 2. Salvianolic acid B yield was between 18.76 and 45.67 mg/g Danshen, which was much more than that of Danshensu, rosmarinic acid, or lithospermic acid. Rosmarinic acid yield and lithospermic acid yield were lower than 3 mg/g Danshen. Hydroxysafflor yellow A yield varied from 2.588 to 7.707 mg/g Honghua. Dry matter yield values varied from 233.3 to 586.0 mg/g material. Because dry matter yield values are much higher than the sum of the five active ingredient yields, most of the dry matter extracted appeared to be impurities.
In this work, second-order polynomial models were applied to describe the nonlinear effects of parameters. Models were simplified using stepwise regression. The estimated values of the regression coefficients are listed in Table 4. The determination coefficients (R 2 ) are higher than 0.94 for all the models, which means that most variations can be explained by these models. Analysis of Variance (ANOVA) was applied to determine the impact of the W/M ratio, extraction number, and extraction time on all the CQAs. As shown in Table 4, the linear terms of the W/M ratio and extraction number are significant for all the CQAs. The linear term of extraction time is insignificant for the yields of rosmarinic acid and hydroxysafflor yellow A. For the yields of Danshensu, rosmarinic acid, lithospermic acid, and salvianolic acid B, the quadratic term of the extraction number is very significant because the p values are less than 0.01. Contour plots of the active ingredient yields are shown in Figs 2-6. All the active ingredient yields increase as the extraction number increases. Danshensu yield, lithospermic acid yield, rosmarinic acid yield, and lithospermic acid yield also increase as the W/M ratio increases. Danshensu and lithospermic acid are two of the hydrolyzates of salvianolic acid B [24,25]. Accordingly, their yields both increase as extraction time increases, as shown in Fig 2 and Fig 5. Salvianolic acid B will hydrolyze and form many compounds [24,25]. Therefore, the increase in extraction time results in a lower salvianolic acid yield, as shown in Fig 6. Compared with the degradation rate of salvianolic acid B, the degradation rate of rosmarinic acid is much slower [31]. Hydroxysafflor yellow A is a hydrolyzate of anhydrosafflor yellow B [23]. Hydroxysafflor yellow A can also hydrolyze and form p-coumaric acid [23]. As shown in Fig 3, when extraction time increases, hydroxysafflor yellow A first increases, then decreases. Fig 7 provides the contour plots of parameter interactions on dry matter yield. Dry matter yield increases as the extraction number, W/M ratio, and extraction time increase. Different types of saccharides such as sucrose, fructose, and glucose are found in Danshen [32]. These saccharides are easily soluble in water [33][34][35]. Phenolic acids usually exist in medicinal plants in their salt forms [36]. The phenolic acids can also be extracted using hot water. Accordingly, the dry matter yield can be higher than 500 mg/g material.

Design space development
3.4.1 Simulation number and calculation step length. The results for different simulation numbers and calculation step lengths were calculated. The significance level value was 0.15 for both adding and removing a term. Random data were generated following a normal distribution. The results are shown in Table 5. The variations in calculation step length did not affect ADSS values and RSDDSS values. ADSS changes little as simulation number increases. An increase in simulation number led to a smaller RSDDSS, which means that more reliable simulation results can be obtained. When the simulation number was more than 10000, RSDDSS values were less than 0.5%. Therefore, the simulation number was set at 10000, and the calculation step length was set as 0.01 in the following calculations.

3.4.2
The significance level value in stepwise regression. Different model criteria of R 2 , the Akaike information criterion, the Bayesian information criterion, R 2 predict , and R 2 adj were used to evaluate the quadratic models after stepwise regression with different significance level values. The simulation was repeated 10000 times. The average results for R 2 , the Akaike information criterion, the Bayesian information criterion, R 2 predict , and R 2 adj were obtained and are plotted in Fig 8. In Fig 8a, average R 2 values increase as the significance level value increases for all the models. In Fig 8b and 8c, the average model R 2 adj and R 2 predict values increase first but Design Space Development Using a Monte Carlo Method then decrease as the significance level value increases. Overfitting occurs when too many terms are included in the models. Average Akaike information criterion and Bayesian information criterion values both decrease first and then increase slightly as the significance level value increases, as shown in Fig 8d and 8e. Higher R 2 adj , higher R 2 predict , lower Akaike information criterion, or lower Bayesian information criterion values are favored in the selection of models. Because the turning points of average R 2 adj , Akaike information criterion, and Bayesian information criterion values were all between 0.3 and 0.4, the significance level in the stepwise regression for adding or removing terms was set as 0.35 in following calculations.  position of the design space. In Fig 11, the smallest design space was obtained when the data distribution was normal. Therefore, normal distribution is favored in design space calculation.
3.4.4 Design space and verification. The optimized Monte Carlo simulation conditions were obtained as follows: the simulation number was 10000; the calculation step length was 0.01; the data were generated following a normal distribution; the significance level used in the stepwise regression was 0.35. The design space can be obtained when the extraction number is 2 or 3, as shown in Fig 12. Considering the consumption of solvent and time, an extraction number of 2 is recommended. The recommended normal operation range is 8.2-10 g/g of W/M ratio and 1.25-1.63  Table 6. Most of the experimental results agree well with the prediction results. All the results of the verification experiments are within the limits of the CQAs.
3.4.5 Discussion of the present calculation method. Compared with the Bayesian method or the bootstrap method, the uncertainty of the measured data is simulated in the present method. This method is easy to understand from the perspective of classical statistics. This method is also promising in design space development for other pharmaceutical processes. However, there are several possible drawbacks. First, new data were generated from a given distribution. The assumptions to obtain the mean value and standard deviation of the given distribution facilitate calculation. However, the generated data are just a rough approximation of the actual situation and will result in some deviations in probability prediction. Second, only the RSD value of the center point is used in calculations when the other experimental conditions are not repeated. If the RSD value of the center point is not occasionally determined correctly, the calculated design space may be affected dramatically. Third, many new data sets are generated in this method. For each data set, a new equation is developed for prediction. Therefore, a   large amount of computation is required. An Intel Xeon CPU (E7-4820, 2.00 GHz) was used to calculate the design space, requiring 239 min to complete the calculation at the optimized conditions. A total of 498,000 K of memory was occupied. To obtain more reliable predicted probability, multiple repetitions of the experiments for each condition are suggested to obtain more reliable mean values and standard deviations.

Conclusions
The probability-based design space for the extraction process of the Danhong injection was developed in this work using a Monte Carlo simulation with models built using stepwise regression. The dry matter yield and the yields of Danshensu, rosmarinic acid, lithospermic acid, hydroxysafflor yellow A, and salvianolic acid B were selected as process CQAs. Extraction time, W/M ratio, and extraction number were selected as CPPs. The effects of the CPPs were investigated using a three-level experimental design. After stepwise regression, the R 2 values of all the models are higher than 0.94. Hydroxysafflor yellow A yield increases first and then decreases as the extraction time increases. Salvianolic acid B yield decreases as extraction time increases. More active ingredients can be extracted when the extraction number increases. The    Design Space Development Using a Monte Carlo Method increase in extraction time, extraction number, and W/M ratio all result in higher dry matter yield. The influence of the calculation step length on calculation results was small. A higher simulation number led to lower dispersion results. The smallest design space was obtained on the assumption of a normal distribution. The optimized Monte Carlo simulation conditions were obtained: normal distribution for concentration data, 10000 times for the simulation, 0.01 for the calculation step length, significance level of 0.35 for adding and removing terms in the model development. The design space for the Danhong extraction process was calculated using these conditions with a probability higher than 0.95 of attaining CQA criteria. Normal operation ranges of 8.2-10 g/g of W/M ratio, 1.25-1.63 h of extraction time, and two extractions were also calculated. Verification experiments were carried out on a larger scale. All the results are within the CQA limits, which means that the calculated design space is accurate. Determination of all the RSD values of results under different conditions is encouraged. With these RSD values used in Monte Carlo simulation, more reliable probability is expected. The use of the present method with optimized conditions to develop design space for other pharmaceutical processes appears to be promising.