Comparisons of Prediction Models of Quality of Life after Laparoscopic Cholecystectomy: A Longitudinal Prospective Study

Background Few studies of laparoscopic cholecystectomy (LC) outcome have used longitudinal data for more than two years. Moreover, no studies have considered group differences in factors other than outcome such as age and nonsurgical treatment. Additionally, almost all published articles agree that the essential issue of the internal validity (reproducibility) of the artificial neural network (ANN), support vector machine (SVM), Gaussian process regression (GPR) and multiple linear regression (MLR) models has not been adequately addressed. This study proposed to validate the use of these models for predicting quality of life (QOL) after LC and to compare the predictive capability of ANNs with that of SVM, GPR and MLR. Methodology/Principal Findings A total of 400 LC patients completed the SF-36 and the Gastrointestinal Quality of Life Index at baseline and at 2 years postoperatively. The criteria for evaluating the accuracy of the system models were mean square error (MSE) and mean absolute percentage error (MAPE). A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and to rank the variables in order of importance. Compared to SVM, GPR and MLR models, the ANN model generally had smaller MSE and MAPE values in the training data set and test data set. Most ANN models had MAPE values ranging from 4.20% to 8.60%, and most had high prediction accuracy. The global sensitivity analysis also showed that preoperative functional status was the best parameter for predicting QOL after LC. Conclusions/Significance Compared with SVM, GPR and MLR models, the ANN model in this study was more accurate in predicting patient-reported QOL and had higher overall performance indices. Further studies of this model may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data.


Introduction
Laparoscopic cholecystectomy (LC) is among the most common operations performed by general surgeons. Accurately predicting quality of life (QOL), a standard outcome measure after LC, is important when selecting treatment modality and when allocating scarce medical resources [1][2][3].
Regression analysis, one of the most widely used multivariate analysis methods, assumes linear relationships between independent and dependent variables. However, studies show that changes in biomedical variables are often non-linear [4][5][6][7][8][9][10][11]. The major classifier methods use support vector machines (SVMs) to solve classification problems by constructing hyperplanes in a multidimensional space that separates cases of different class labels. However, SVMs have also proven effective for solving regression problems because they can handle multiple continuous variables [4][5][6]. Gaussian process regression (GPR) is a kernel-based nonlinear regression technique for using either a kernel or a covariance function for implicitly transforming the data into a high-dimensional reproduction of kernel Hilbert space. The method has proven effective for solving various regression problems [7,8]. Artificial neural networks (ANNs) are complex and flexible nonlinear systems with properties not found in other modeling systems. These properties include robust performance in dealing with noisy or incomplete input patterns, high fault tolerance, and the capability to generalize from the input data [9][10][11]. The computational power of an ANN is derived from the distributed nature of its connections. The ANN model is a well established data mining algorithm that is widely used in various fields, from engineering to biomedical science [9][10][11].
Although many outcome-predicting models have been developed using conventional statistical procedures, their application at the individual level is hampered by the high interdependence of the clinical variables involved, which potentially may interact with each other and have reciprocal enhancing effects [12,13]. Three major limitations of this algorithm are (1) the inability to capture interactions of the disease, (2) the inability to capture the process dynamics, and (3) the very large confidence interval in individual risk assessment. Hence, conventional statistical approaches have intrinsic limitations in handling this complex nonlinear information [14][15][16].
Gholipour et al compared ANNs with linear discrimination analysis in terms of their accuracy in predicting conversion of LC to open surgery [14]. They concluded that ANNs that consider the preoperative health characteristics of patients have superior prediction performance compared to discriminant analysis models. Another retrospective analysis of the prevalence of gallbladder disease and its risk factors by Liew et al compared logistic regression and ANN in terms of their accuracy in predicting conversion of LC to open surgery in obese patients [15]. Again, ANN models significantly outperformed the LR models in predicting the risk factors and prevalence of gallbladder disease and gallstone development in obese patients on the basis of multiple variables related to laboratory and pathological features. In Eldar et al, a comparison of logistic regression, linear discriminant analysis and ANN models in predicting conversion of LC to open surgery again showed that ANN-based models are relatively more effective and practical for predicting successful LCs and their conversion [16].
Despite their contribution to the growing understanding of LC surgery outcomes, previous studies of LC outcome have had major shortcomings [12,17,18]. Few studies of LC outcome have used longitudinal data for more than two years. Moreover, no studies have considered group differences in factors other than outcome such as age and nonsurgical treatment. Additionally, almost all published articles agree that the essential issue of the internal validity (reproducibility) of ANN, SVM, GPR and multiple linear regression (MLR) models has not been adequately addressed. Therefore, the primary aim of this study was to validate the use of ANN models in predicting patient-reported QOL after LC surgery, and the secondary aim was to compare the predictive capability of ANNs with that of SVM, GPR and MLR models.

Study population
Patients provided written informed consent. Patients with cognitive impairment (n = 1), severe organ disease (n = 4) or psychiatric disease (n = 1) were excluded. Of the 518 eligible subjects who gave written consent and were enrolled in the study at baseline, twelve were excluded due to conversion of LC to open cholecystectomy (OC), and 106 were excluded because they did not undergo postoperative assessments. All 400 of the remaining LC subjects completed the preoperative and 2-year postoperative assessments.

Instruments and measurements
The SF-36 (Chinese version) was administered to measure QOL outcomes, and the score was used as a dependent variable. As described in the literature, the physical component summary scale (PCS) and mental component summary scale (MCS) were calculated using norm-based scoring methods to compare QOL in the study population with that of the general Taiwan population [19]. A PCS or MCS value of 50 was considered average for the general Taiwan population. Both PCS and MCS have been widely adopted and were used in the present study to provide an overall QOL index and for further study of longitudinal changes in generic measures as a whole [20].
The GIQLI is recognized as a valid and reliable instrument for measuring functional status, especially in patients undergoing cholecystectomy [13]. Each of its thirty-six items is scored from 0 to 4 with a higher score indicating better health status, and the total GIQLI score ranges from 0 to 144. A Chinese version of the GIQLI has demonstrated validity [2,13].
The following patient data obtained by records review and questionnaire interview were tested as independent variables in this study: age, gender, body mass index (BMI), education, Charlson co-morbidity index (CCI) score, marital status, previous abdominal surgery, surgical factors, patient referral source, current alcohol or tobacco use, preoperative functional status, operating time, American Society of Anesthesiologist (ASA) score, current complications, operation time, length of stay (LOS) and rehospitalization within 30 days.

System model development
The factors used in the MLR model to predict long-term QOL of LC patients included both demographic and clinical characteristics. The MLR model can be formulated as the following linear equation:Ŷ whereŶ Y is the actual output value, b 0 is the intercept, b i is the model coefficient parameter, X i is the independent or input variable, e i is the random error, and m is the number of variables.
The SVM model employs non-linear mapping to transform the original training data into higher-dimensional data and searches for the linear optima that define a hyperplane within the new dimension [4]. With appropriate non-linear mapping to a sufficiently high dimension, a decision boundary can separate data into two classes [4]. In the SVM model, this decision boundary is defined by support vectors and margins.
The GPR applies a Bayesian approach to nonlinear regression. The Bayesian paradigm provides probabilistic modeling of nonlinear regression. The Bayesian approach to regression  specifies a priori probabilities of the parameters to be estimated, and it computes the maximum a posteriori probabilities given the observed data samples. Unlike non-Bayesian schemes, which typically choose a single parameter based on a specified criterion, the Bayesian probabilistic model obtains both the optimal estimated function and the covariance associated with the estimation. Therefore, the Bayesian paradigm provides more information about the estimated parameters compared to non-Bayesian methodology. The GPR is a memory-based method of storing some or all of the training data for use in testing. Therefore, GPR can be quickly trained, which improves the efficiency of the massive-training methodology [7]. The ANN model used in this study was a standard feed-forward, back-propagation neural network with three layers: an input layer, a hidden layer and an output layer. The multilayer perceptron (MLP) network is an emerging tool for designing special classes of layered feed-forward networks [21]. Its input layer consists of source nodes, and its output layer consists of neurons; these layers connect the network to the outside world. In addition to these two layers, the MLP usually has one or more layers of neurons referred to as hidden neurons because they are not directly accessible. The hidden neurons extract important features contained in the input data.

Statistical analysis
The dataset was divided randomly into two sets, one set of 320 cases (80% of the overall dataset) for training the model and another set of eighty cases for testing the model. The model was built using the training set. Demographic and clinical characteristics were the independent variables, and the outcome (QOL) was the dependent variable. The SVM, GPR, MLR and ANN models were then tested using the eighty cases in the testing dataset.
The model fit and prediction accuracy of the system models were measured in terms of mean square error (MSE) and mean absolute percentage error (MAPE), respectively. The MSE, which is computed between the desired and predicted values and then averaged across all data, is used as an indicator of goodness of fit. The MAPE indicates the average deviation from the desired value and is usually expressed as a percentage [22]. The prediction accuracy of a model is considered excellent if its MAPE value is lower than 10%. Values between 10% and 20%, between 20% and 50%, and higher than 50% are considered indicators of high, average, and low prediction accuracy, respectively [22]. The formulas for calculating MSE and MAPE are where n is the number of observations, Y i is the desired (target) value of the i th observation, andŶ Y i is the actual output value of the i th observation. The change rates are also given. The optimal number of neurons in the hidden layer and the activation functions are iteratively determined by comparing the MSE index of the output error among several neural networks. The network training process continues as long as training and test errors decrease. That  The unit of analysis in this study was the individual LC surgery patient. The data analysis was performed in several stages. Firstly, continuous variables were tested for statistical significance by one-way analysis of variance (ANOVA), and categorical variables were tested by Fisher exact analysis. Univariate analyses were applied to identify significant predictors (P,0.05). Secondly, STATISTICA 10.0 (StatSoft, Tulsa, OK) software was used to construct the MLP network model, the SVM model, the GPR model and the MLR model of the relationship between the identified predictors and QOL. Finally, sensitivity analysis was performed to assess the importance of variables in the fitted models. To simplify the training process, key variables were introduced, and unnecessary variables were excluded. A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and to rank the variables in order of importance. The global sensitivity of the input variables against the output variable was expressed as the ratio of the network error (variable sensitivity ratios, VSR) with a given input omitted to the network error with the input included. A ratio of 1 or lower indicates that the variable degrades network performance and should be removed. Table 1 shows the patient characteristics in this study. The mean age of the study population was (55:9+14:6) years. The average CCI was 0:9+1:1, and 57.3% of the patients were female. Furthermore, Table 2 shows the coefficients for total GIQLI score, PCS score, and MCS score obtained by the training set in the MLR model. The selected variables included in the MLR models were age (X 1 ), CCI (X 2 ), gender (X 3 ), previous abdominal surgery (X 4 ), current complications (X 5 ), operation time (X 6 ), and preoperative functional status (X 7 ). All selected variables were statistically significant (P,0.05) ( Table 3). Additionally, in forty runs of the data using an 80%-20% random split, total GIQLI score (Appendix S1), PCS score (Appendix S2) and MCS score (Appendix S3) after surgery did not significantly differ between the training set and testing set. Table 4 shows the three-layer networks and number of support vectors of total GIQLI score, PCS score and MCS score in ANN and SVM models. The ANN-based approaches provided the 3layer networks and the relative weights of neurons used for predicting QOL. The activation functions of logistic sigmoid and hyperbolic tangent were used in each neuron of the hidden layer and output layer, respectively.   The training set was also used to calculate the variable sensitivity ratios (VSR) for the ANN model. Table 6 presents the VSR values for the outcome variable (QOL) in relation to the three most influential variables. In the ANN model, preoperative functional status (VSR 1.38) was the most influential (sensitive) parameter in terms of its effects on total GIQLI score, PCS score, and MCS score (VSR 1.38, 1.15, and 1.07, respectively). All VSR values exceeded one, indicating that the network performs better when all variables are considered.

Discussion
This study confirmed that, compared to the SVM model, the GPR model and the MLR model, the ANN model is significantly more accurate in predicting QOL (P,0.001). To the best of our knowledge, this study is the first to use ANNs for analyzing predictors of QOL after LC surgery. This model was tested against actual outcomes obtained by a neural network model, a support vector machine model and a linear regression model constructed using identical inputs. We also showed that, given the same number of demographic and clinical inputs and the same two outcome measures, the predictive accuracy of ANN is superior to that of SVM, GPR and MLR.
Recently, SVM, GPR and ANN models have been used for non-linear modeling in many fields, particularly bioinformatics [4][5][6][7][8][9][10][11]. Although the efficacy of SVM and GPR models is well established in the field of machine learning, its performance in surgical outcome prediction and prognosis has not been measured. The ANNs are adaptive models that use a dynamic approach to analyzing the risk of outcomes. That is, they perform bottom-up computation by modifying their internal structures in relation to a functional objective (i.e., the model is generated by the data it analyzes). Despite their incapability to deal with missing data, ANNs can simultaneously process numerous variables and can consider outliers and nonlinear interactions among variables. Unlike standard statistical tests, ANNs effectively manage complexity even when samples sizes are small and when ratios between variables and records are unbalanced. In this respect, ANNs avoid the dimensionality problem and can achieve a predictive accuracy superior to those of SVM, GPR and MLR. To ensure a sufficiently robust basis for network training, the present study used a large and homogeneous dataset comprising all demographic and clinical variables shown to affect patient-reported QOL in previous linear regression models [10,11].
Piaggi and colleagues demonstrated that ANN models can accurately predict weight loss in obese women treated by laparoscopic adjustable gastric banding. Their integrated multidisciplinary approach showed that ANN may be a valuable tool for selecting the best candidates for surgery [23]. Segal and colleagues compared ANN with a multiple linear regression model in terms of accuracy in predicting several different functional outcome scores at 1 year after traumatic brain injury [24]. The predictive accuracy of their sophisticated linear models was comparable to that of ANNs. Recently, Salvatore and colleagues combined multiple linear regressions with artificial neural networks to predict how relationships among lower urinary tract symptoms, anatomical findings, and baseline characteristics affect outcome in women with pelvic organ prolapse. They also found that ANNs are valuable instruments for improving understanding of complex biological models [25].
The ANN approach developed in this study extends the predictive range of the linear regression model by replacing identity functions with nonlinear activation functions. The approach is apparently superior to linear regression for describing systems. The ANNs may be trained with data acquired in various clinical contexts and can consider local expertise, racial differences, and other variables with uncertain effects on clinical outcome. The analysis need not be limited to clinical parameters. Other potentially useful variables could be tested to improve the predictive value of the model. The proposed ANN architecture with MLP can also include more than one dependent variable and can perform a non-linear transformation between dependent variables. Future studies may evaluate how other demographic or clinical characteristics affect the proposed architecture.
In ANNs, overfitting occurs when a model describes random error instead of the underlying relationship. Data overfitting is indicated by an increasing testing error concurrent with a steadily decreasing training error. The model with the best predictive performance and the best data fit is that in which testing error is at the global minimum [26]. Based on the above rule for obtaining a fitted model while avoiding overfitting, this study used the testing set as the controlling criterion for determining when to stop training. Additionally, the testing data were not included in the training data.
Throughout this two-year follow-up study, the best single predictor of QOL subscale scores was preoperative functional status, which is consistent with reports that preoperative functional scores are the best predictors of postoperative QOL [2,13]. Therefore, effective counseling is essential for apprising patients of expected post-surgery impairments. If QOL outcomes are considered benchmarks, then preoperative functional status, which is a major predictor of postoperative outcome, is crucial. Patients should also be advised that their postoperative QOL might depend not only on the success of their operations, but also on their preoperative functional status.
Post-surgery QOL may be related to surgical risk. For example, Quintana et al. [27] suggested that men with low surgical risks are more likely to be diagnosed with complicated cholelithiasis compared to men with high surgical risks. Since men with low surgical risks are more likely to experience complicated cholelithiasis, they have greater potential for improvement in QOL. This suggests that gender differences in QOL outcomes may result from gender differences in the treatment of complicated presentations, i.e., in terms of QOL, men may derive a greater benefit compared to women because they have a greater potential for complications and thus a greater potential for improvement. Another possible but untested explanation is gender differences in health care received. Some authors also suggest that patient values and the reporting of health status also differ by gender [18,27]. Another factor is age. This study confirmed previous findings that QOL improvement after LC surgery is inversely related to age [18,27].
Additionally, a recent study indicates that, compared to patients in early stages of a disease, patients in advanced stages not only tend to have more co-morbidities, they also tend to have less social support [2]. Notably, the CCI score in the current study was inversely related to QOL, which is consistent with the reported association between increased comorbidity and poor postcholecystectomy QOL [28][29][30].
Although all research questions were satisfactorily addressed, several limitations are noted. First, this study collected data for LC surgery patients who had been under the supervision of four surgeons in four different medical centers, each of whom had performed the highest volume of LC surgery procedures in his respective hospital during the previous years. This sample selection procedure ensured that patient outcome data would not be affected by surgeons with limited experience. By focusing the analysis on procedures performed by these four surgeons, the results of this study are more representative of all LC patients compared to one analyzing those performed by a single surgeon. However, a notable limitation is that the first patient in the prospective patient cohort was enrolled in 2007. Therefore, depending on their inclusion date, some surveyed patients had a longer follow-up than others did, which may have caused selection bias. Nonetheless, in most QOL subscales, the characteristics of subjects who continuously participated throughout this 2-year study did not significantly differ from those of subjects who died or dropped out during the study (data not shown).

Conclusions
Compared with the SVM model, the GPR model and the MLR model, the ANN model in the study was more accurate in predicting patient-reported QOL and had higher overall performance indices. The global sensitivity analysis also showed that preoperative functional status is the most important predictor of total GIQLI score, PCS score and MCS score after LC surgery. The predictors analyzed in this study could be addressed in preoperative and postoperative health care consultations to educate candidates for LC surgery in the expected course of recovery and expected functional outcomes. Further studies of this model with differential evolution [31] may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data. Hopefully, the model will evolve into an effective adjunctive clinical decision making tool.

Supporting Information
Appendix S1 Forty data sets used for comparing predictions of total gastrointestinal quality of life index (GIQLI) score.