Optimization of multiplex quantitative polymerase chain reaction based on response surface methodology and an artificial neural network-genetic algorithm approach

Multiplex quantitative polymerase chain reaction (qPCR) has found an increasing range of applications. The construction of a reliable and dynamic mathematical model for multiplex qPCR that analyzes the effects of interactions between variables is therefore especially important. This work aimed to analyze the effects of interactions between variables through response surface method (RSM) for uni- and multiplex qPCR, and further optimize the parameters by constructing two mathematical models via RSM and back-propagation neural network-genetic algorithm (BPNN-GA) respectively. The statistical analysis showed that Mg2+ was the most important factor for both uni- and multiplex qPCR. Dynamic models of uni- and multiplex qPCR could be constructed using both RSM and BPNN-GA methods. But RSM was better than BPNN-GA on prediction performance in terms of the mean absolute error (MAE), the mean square error (MSE) and the Coefficient of Determination (R2). Ultimately, optimal parameters of uni- and multiplex qPCR were determined by RSM.


Introduction
Real-time quantitative PCR (qPCR) can quantitatively analyze a reaction template (nucleic acid) via the real-time continuous monitoring of the fluorescence signal generated from each cycle of the PCR amplification process. This technique has the advantages of being highly specific, highly sensitive, reproducible, accurately quantifiable, and highly automatable [1,2]. Therefore, real-time PCR has been widely applied in fields such as molecular diagnostics, life sciences, agriculture, medicine, and food science [3][4][5]. Despite this broad application, there are still difficulties in the practical implementation of the technique, especially in multiplex qPCR systems. Due to the addition of multiple pairs of primers and probes, changes of factors such as annealing temperature, elongation temperature, and number of cycles can result in a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 non-specific products and different effects on amplification. Furthermore, there is a lack of uniformity in data interpretation, and standardization needs to be improved [6]. These issues are mainly caused by the external factors that affect qPCR amplification and the practical difficulties in controlling these factors.
Prior studies [7][8][9]have attempted to optimize various qPCR parameters to ensure accuracy and stability. However, most of these studies employed single-factor tests when optimizing qPCR systems [9]. Although these studies determined the effects of individual factors on qPCR amplification, they required many and repeated tests and overlooked the effects of parameter interactions. The determined parameters may not have been optimized yet.
The recent studies to optimize qPCR parameters have focused on uniplex qPCR. The wide range of application for multiplex qPCR has brought increased attention to improving its amplification efficiency and optimizing the parameters. The construction of a reliable and dynamic mathematical model for multiplex qPCR that can be used to analyze the effects of interactions between various parameters is urgently needed. Response surface methodology (RSM) uses statistical and mathematical analysis to design experiments and involves a combination of experimental design techniques, including Plackett-Burman (PB) design, central composite design (CCD), Box-Behnken design, and others. RSM can evaluate the effects of variables on test results (yield) by analyzing experimental data. Additionally, RSM can analyze the interactions between various factors, and can be used to construct a mathematical model applicable for determining optimal conditions or ranges for a desired response [10]. RSM overcomes the disadvantages of single-factor tests, including the time required, the limited number of factors investigated, and the production of unreliable conclusions [11]. RSM has the advantages of being easy to implement and allowing investigation of the interplay of different factors. For these reasons, this methodology is commonly used in various fields such as pharmacy, architectural science, biology, agriculture, and microbiology [12][13][14].
As an alternative to RSM, artificial neural networks (ANNs) [15] are an integral component of artificial intelligence that can be applied for data analysis and prediction. As typical ANNs, back-propagation neural networks (BPNN) optimize and monitor the performance of neural networks under learning rules. BPNN can approximate any continuous function and have robust non-linear mapping capabilities. Genetic algorithms (GA) simulate the principle of survival of the fittest in nature and search for the most globally optimized combination of parameters for a given system. These algorithms can be used for discontinuous, indistinguishable, random, or highly nonlinear target functions. Therefore, the prediction models of ANNs are developed to analyze the obtained experimental data, while GA is utilized to optimize experimental parameters for the above well-established models. An increasing number of studies have focused on optimizing experimental conditions via ANNs and RSM in recent years [16][17][18]. However, there has been little work on the optimization of qPCR systems using these two methods. A few studies addressed that, as a predictive tool, ANNs can theoretically be used to understand complex systems and assist in optimizing conditions for biological experiments and related techniques such as PCR [19].
This study is based on previously developed AllGlo multiplex PCR systems for respiratory syncytial virus (RSV), influenza virus (INF), and human metapneumovirus (HMPV). CCD is used to design experiments with different concentrations of primers, probes, DNA polymerase, Mg 2+ , and dNTPs in uni-and multiplex qPCR systems, based on the experimental results, the interplay of the tested factors and their effects on cycle threshold (Ct) values of uni-and multiplex qPCR are analyzed and discussed by RSM. Subsequently, prediction models for the tested factors and Ct values are constructed via RSM and BPNN-GA respectively. The prediction performance of these two models is then evaluated using the coefficient of determination (R 2 ), the mean absolute error (MAE), and the mean square error (MSE) [20][21][22]. The model resulting in better prediction performance is further tested for condition optimization. The optimal conditions for uni-and multiplex qPCR of the three viruses are then determined.

Preparation of qPCR templates
E. coli DH5α samples containing RSV, INF, or HMPV target gene plasmids were cultured separately at 37˚C under 5% CO 2 . A single colony was then transferred with an inoculation needle to Luria-Bertani (LB) culture and shaken overnight at 200 rpm/min at 37˚C. Plasmid extraction was performed using a TaKaRa plasmid extraction kit following the manufacturer's protocol (Takara Biomedical Technology Co., Ltd., Dalian, China, lot number 9760). The extracted plasmids were dissolved in 50 μL of eluent. A Nanodrop 2000 spectrophotometer was used to measure the A value of plasmid DNA at 260 nm/280 nm. Based on the measured copy number, the plasmids for the three viruses were separately diluted, mixed at equal ratios to 10 4 copies/ mL, and stored at -20˚C for later use. The repeated re-configuration of the template due to insufficient storage and material preparation could lead to biased results [23]. In this study, the amount of template needed was calculated and prepared prior to experiments, and the templates were stored in aliquots to increase test stability.

Primers and probes
The genetic sequences of the three viruses were downloaded from the GenBank database. The primer picking tool and Oligo 6.22 from the NCBI database were used for comparison and optimization. Conservative segments with high homogeneity were selected for primer and probe designs. NCBI Blast was used to test the specificity of the primer and probe segments. Primers were synthesized by Intragen Trading (Shanghai) Co., Ltd., China, and probes were synthesized by Shanghai Yiyue Biotechnology Co., Ltd. (Table 1).

Experimental design
In this study, methods for uniplexqPCR are given in supplementary S1 Text. The concentrations of primer (Factor A), probe (Factor B), DNA polymerase (Factor C), Mg 2+ (Factor D), and dNTPs (Factor E) were selected as independent variables and subjected to five levels of design using RSM-CCD. According to the recommended concentrations of these variables given by the VazymeLAmp1 DNA Polymerase PCR kit (Vazyme Biotech Co., Ltd, Nanjing,  China), baseline levels and ranges were confirmed. The coded values and actual values of the selected RSM design factors are provided in supplementary S1 Table. This study used a 50-test-point second-order RSM. The experiments for uni-and multiplex qPCR of the three viruses (RSV, INF, and HMPV) with the same designs; the uniplex qPCR designs are provided in S2 Table, and the multiplex qPCR designs are given in S3 Table. Quantitative PCR amplification qPCR was performed in a final volume of 50 μL, which included 5 μL of 10 ×VazymeLAmp1 Buffer (Mg 2+ -free), 0.5 μL of ROX Reference Dye II (50 ×) Ã3 , and 4 μL of mix template. Additionally, MgCl 2 (25 mmol/L), dNTP (10 mmol/L), primers (10 μmol/L), probes (10 μmol/L), and LAmp TM DNA Polymerase (5 U/μL) were added in the concentrations provided in S2 and S3 Tables (VazymeLAmp1 DNA Polymerase, Vazyme). An ABI 7500 Real-Time PCR System was used for amplification. A two-step method was implemented that included pre-denaturation at 95˚C for 30 s, followed by 40 cycles of denaturation at 95˚C for 5 s and elongation at 60˚C for 32 s. Three parallel repeats were conducted for each test, and the averaged Ct value from the three repeats was taken as the result.

RSM
The averaged Ct value resulting from the three parallel tests was treated as the response value (Y). Design-Expert.V8.0.6 was used for RSM analysis of the test data. The statistical significance of the RSM-based model (model I) was checked by analysis of variance (ANOVA). The variables were treated as continuous random factors. The complete CCD matrix for the experimental Ct values of uni-and multiplex qPCR is described in S2 and S3 Tables. Optimized conditions for Ct values were obtained in combination with 3D response surface graphs ( The mathematical expression used by model I to describe the response value, Y (Ct value), versus the five factors studied (A, B, C, D, and E) can be written as follows [25]: Y is the response value (Ct value), X i and X j denote the coded levels of the independent variables, and β 0 is the intercept; i, j, β i , β j , and β ij are coefficients, ε is the test error, and k represents the number of independent factors. In the following descriptions, Y 1 , Y 2 , and Y 3 represent the corresponding multiplex qPCR RSM polynomial equations for RSV, INF, and HMPV, respectively.

BPNN-GA
A 3-layer BPNN-GA comprised of input, hidden layer, and output layers was used to construct a mathematical model (model II), Concentrations of primers, probes, DNA polymerase, Mg 2+ , and dNTPs were used as the 5 inputs, and the Ct value was the output. Library (neuralnet) of R language was used for analysis, and we scaled Ct values to lie within the range (-)2.378-(+) 2.378 in BPNN-GA, while the activation functions were sigmoid for all the models, a five-fold cross-validation was employed to select the number of nodes on the hidden layer and to train the model [26].
Optimization of uni-and multiplex qPCR conditions for the three viruses (RSV, INF, and HMPV) was performed using the trained model obtained using five-fold cross-validation with GA. Optimized conditions were obtained along with the corresponding predictive Ct values.

Model evaluation and validation
The Coefficient of Determination (R 2 ), the mean absolute error (MAE) and the mean square error (MSE) were monitored to evaluate and compare the stabilities and prediction performances of the model I and model II. R 2 is applicable to the training data set, and the larger R 2 indicates that the more percent of the variance in the response variable can be explained by the explanatory variables [27]. And the smaller the MAE and MSE means the better the model [28]. The model that performed better was selected as a predictive model for condition optimization. Validation was conducted under the predictive optimal operation conditions subsequently.
The equations representing the evaluation indices used for model performance are provided below [27]: Y i,p is the value predicted by the model, Y i,e is the experimental Ct value, Y a is the averaged experimental Ct value, and n is the number of data points.

RSM
The effects of operating variables were investigated according to the statistical analysis of CCD. According to variance analysis (Table 2), the F and P values of the corresponding multiplex qPCR models for the three viruses were F = 6.61 and P<0.0001, F = 3.67 and P = 0.0009, and F = 7.89 and P<0.0001; the P values were all less than 0.05, indicating that all the RSM models of multiples qPCR were statistically significant; meanwhile, the statistical results on factors showed that there was evident that at least one of the 5 predictors had an effect on the response, such as the statistical results on factors of RSV, the P value of DNA polymerase and Mg 2+ were 0.0452 and <0.0001 respectively, demonstrated DNA polymerase and Mg 2+ were statistically significant for Ct value in multiplex qPCR of RSV, in addition, the P value of primers, probes, and other factors were >0.5, they had no significant effect on Ct value; the statistical results on factors of INF and HMPV showed the same conclusion. Variance analysis of uniplex qPCR was given in S4 Table. Y 1 ¼ 23:600 À 0:420 Â C À 1:630 Â D þ 0:950 Â D 2 ð5Þ Eqs (5)-(7) represent the multiplex qPCR polynomial model for the three viruses (RSV, INF and HMPV). Coefficients that had no statistical significance are eliminated. As seen from polynomial equations of multiplex qPCR for three viruses, the Mg 2+ all affected significantly the Ct values, thus, the most important factor under the conditions used for these experiments was the concentration of Mg 2+ .
The RSM-3D plot used for RSV multiplex qPCR is illustrated in Fig 1, and the RSM-3D plot other viruses of multiplex qPCR is set out in Fig 2. The x-and y-axes represent the two

BPNN-GA
The fitting error and predictiveerrorof five-fold cross-validation of model II for multiplex qPCR produced by training neurons in the hidden layer are provided in Table 3. The fitting error and predictive error are the bases used to select the number of nodes in the hidden layer. When fitting error and predictive error were relatively small, and took the minimum prediction error as the primary condition, the corresponding number of hidden-layer nodes was treated as the training model. Such as RSV multiplex qPCR, the fitting error had improvement in performance by increasing the number of hidden layer, whereas, the performance of predictive error showed negative effect for 3, 5 neurons, respectively, so the number of neurous was 4 as final model of RSV multiplex qPCR. At the same, the number of nodes selected for INF and HMPV multiplex qPCR was 4, 3, respectively. And the fitting error and predictive error of five-fold cross-validation of model II for uniplex qPCR were given in S5 Table.

RSM versus BPNN-GA
In this study, RSM and BPNN-GA were used for data analysis. Corresponding mathematical models were constructed and referred to as model I (RSM) and model II (BPNN-GA). The R 2 values for model II were closer to 1 than for model I in both uni-and multiplex PCR for all three viruses, The MAE and MSE of model II was much smaller than those of model Ifor both uni-and multiplex qPCR of the three viruses (Table 4, S6 Table). Though R 2 of model II are all larger than RSM, compared with RSM, ANN's predictive Ct values are more deviated from the actual situation, especially in HMPV and INF for multiplex PCR (Table 5,  S7 Table), therefore, we believe that in the PCR system optimization, RSM is more appropriate than ANN. Hence, model I was selected as the final prediction model.

Validation
After comparing the stabilities and prediction performances of models I and II, model I was selected for condition optimization. The optimized conditions were subjected to validation tests, three parallel repeats were performed and relative errors (shown as Eq ( (8) results indicate a greater relative error for multiplex qPCR of INF; the error rates for multiplex qPCR of the other two viruses were within an acceptable range.
Relative Error: Actual relative error, generally given as a percentage; Ct p : predictive Ct value under optimal conditions; Ct e : experimental Ct value under optimal conditions

Discussion
As a detection technique, qPCR has significant advantages [1]. However, the number of factors influencing the process is relatively high, and the interactions between these factors are complicated, especially in multiplex qPCR systems. Due to the addition of multiple pairs of primers and probes, changes to factors such as annealing temperature, elongation temperature, and number of cycles can result in non-specific products and different effects on amplification, which leads to difficulties in comparing results of qPCR.
Considering the important effects of different primers on uni-and multiplex qPCR [29], we considered the primer melting temperature (T m ) and GC percentage as well as the homogeneity of the primers and the target nucleic acid sequence when designing primers. Having close T m values can improve the amplification of product in multiplex qPCR [30][31][32]. The uni-and multiplex qPCR mathematical models built using RSM for the three viruses produced statistically significant results. Additionally, adequate precision measures the signal to noise ratio, and this ratio greater than 4 is desirable. The ratios for the models were all greater than 4 ( Table 2, S4 Table), indicating models are sufficiently precisionand can be used to guide experiments [33].  Fig 1 also indicate magnesium ion is the most important parameter influencing the Ct value. Thus, the most important factor under the conditions used for these experiments is the concentration of magnesium ion.
To our knowledge [34], dNTP and DNA polymerase are competing reagents in PCR systems. Free magnesium is necessary for DNA polymerase to exert its biological activity; the formation of bonds between dNTP and DNA also requires magnesium ion. Furthermore, in multiplex PCR with a constant dNTP concentration, the amplification product increases as the concentration of Mg 2+ increases, and non-specific bands are eliminated. However, when an excessive concentration of Mg 2+ is used, amplification slows down and may even stop, which is speculated to result from a suppression effect [35].
Multiple factors affecting multiplex PCR amplification are consistent with those observed in this study. Mg 2+ significantly influenced the formation of the PCR amplification product; at appropriate concentrations, positive effects on the amount of PCR product were observed, but suppression occurred at excessively high concentrations. Under the experimental conditions used, changes in dNTP concentration did not significantly affect PCR amplification, which could be due to the range of dNTP concentration was too narrow in this study.
Interestingly, Mg 2+ had a positive effect on uniplex PCR but a negative effect on multiplex qPCR of RSV in this study, which are concordant with prior work [32], DNase concentration differentially affects the amplification of uni-and multiplex PCR; an increase in DNase concentration in uniplex PCR leads to an increase in non-specific product but positively affects multiplex PCR amplification.
Nonetheless, model I was relatively better than model II under the experimental conditions used in this study, its prediction function was still unsatisfactory. This could be related to the limitations of RSM, such as the ability to only build quadratic polynomial functions [36] and the inability to portray the relationship between multiplex qPCR factors. Although model II was better at solving non-linear functions [37], it produced over-fittings [28].
As a result, model I was selected for condition optimization. However, in multiplex qPCR, unlike the concentrations of template-specific primers and probes, the same concentrations must be used for DNA polymerase, Mg 2+ , and dNTPs. Therefore, further adjustments to multiplex qPCR conditions based on parameter optimization results are necessary. For multiplex qPCR, combined with RSM-3D plots, the smaller Ct values were obtained at greater concentrations of primers and probes. However, considering the increase in non-specificity when primer and probe concentrations are too high and the differences in the optimized conditions for the primers used for different viruses [32], the primers and probes concentrations suggested by the RSM models for each virus were determined to be the optimized condition for each virus. DNA polymerase and Mg 2+ behaved similarly within a median range of concentrations (0.04-0.08 mmol/L and 2.0-2.5 μmol/L respectively), which could be sufficient for all three viruses. Considering the increase in non-specificity triggered by a high concentration of Mg 2 + [29], the concentration used was set at 2.35 mmol/L. dNTPs did not significantly influence the response value. The final and cost-effective optimal conditions to be used for uni-and multiplex qPCR are given in Table 6 and S8 Table. Conclusions Optimization conditions of uni-and multiplex qPCR were predicted by RSM. By operating designed experiments, the effects between different variations were investigated. Furthermore, two mathematical models were constructed via RSM and BPNN-GA. The statistical results of RSM indicated that Mg 2+ was the most important factor in this study. And interestingly, the same factor played different role in different reaction system. Though the R 2 of GA-BPNNs are all larger than RSM, the predictive Ct value of qPCR is too different from the actual value, therefore, we believe that in the PCR system optimization, RSM is more appropriate than ANN, perhaps because its internal relationship, so RSM was performed better in modeling both uni-and multiplex qPCR.