Using Multivariate Regression Model with Least Absolute Shrinkage and Selection Operator (LASSO) to Predict the Incidence of Xerostomia after Intensity-Modulated Radiotherapy for Head and Neck Cancer

Purpose The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Methods and Materials Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3+ xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R2, chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Results Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R2 was satisfactory and corresponded well with the expected values. Conclusions Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT.


Introduction
Head and neck cancers (HNC) are a leading cause of cancer mortality in Taiwan. Radiotherapy (RT) plays an important role in the treatment of HNC. However, xerostomia is a common complication in patients with HNC after radiotherapy. According to the Late Effects of Normal Tissues -Subjective, Objective, Management, Analytic (LENT-SOMA) criteria, severe xerostomia is defined as long-term salivary function ,25% of the pre-RT baseline value [1,2]. Based on the quantitative analysis of normal tissue effects in the clinic (QUANTEC) guideline to limit the incidence of severe xerostomia to below 20%, at least one parotid gland should receive a mean dose of #20 Gy, or both parotid glands should receive a mean dose of #25 Gy [1,2]. The patientrated quality of life (QoL) questionnaire (QLQ-C30) and the xerostomia-specific QoL questionnaire (QLQ-H&N35) have been shown to allow a reliable assessment of the relationships between QoL and salivary function and xerostomia in patients receiving radiotherapy [3][4][5][6]. The normal tissue complication probability (NTCP) model was developed using either a univariate or multivariate logistic regression model to predict the incidence of xerostomia. However, the development of xerostomia as reported by patients most likely depends on a variety of prognostic factors [3][4][5]7,8]. Some variables such as clinical and treatment-related factors that may have important effects on the risk of radiationinduced complications need to be taken into consideration. Therefore, one goal of this study was to develop predictive models for patient-rated xerostomia, taking into account dose distributions in parotid glands as well as other potential clinical and treatmentrelated prognostic factors.
Developing a multivariate logistic regression model requires knowing the optimal number of prognostic factors to include. Many investigators used the log likelihood (LL), average likelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC) to deal with this problem [3]. Xu et al. [9][10][11] introduced the least absolute shrinkage and selection operator (LASSO) and Bayesian model averaging (BMA) to build NTCP models of xerostomia after three-dimensional conformal radiation therapy (3D-CRT) for HNC. However, the BMA needs a priori information to build a fair predictive model. The LASSO is based on shrinkage estimation and has been widely used in the statistical field [12][13][14][15][16]. The advantages of LASSO include: 1) a smaller mean squared error (MSE) than conventional methods; 2) it handles the multicollinearity problem; 3) overall variable selection; and 4) coefficients shrink [9][10][11]. Being easy to implement is another of the merits that attract users. Xu et al. recommended the LASSO method for multivariate logistic regression NTCP modeling [9]. Beetz et al. showed that 3D-CRT-based models for patientrated xerostomia among HNC patients treated with primary RT turned out to be less valid for patients treated with intensitymodulated radiotherapy (IMRT), and that the 3D-CRT NTCP models cannot be used for IMRT cohorts [5]. The main message was that models developed in a population treated with a specific technique cannot be generalized and extrapolated to a population treated with another technique without external validation. Therefore, the purpose of this study was to develop a multivariate logistic regression model with LASSO to make valid predictions about the incidence of patient-rated xerostomia among HNC patients treated with IMRT at 3 and 12 months after treatment.

Study population
We aimed to develop a multivariate logistic regression NTCP model with LASSO to make valid predictions about the risk of moderate-to-severe patient-rated xerostomia using QoL datasets. In total, 206 patients with HNC were enrolled. All participants were treated with IMRT at the Kaohsiung Chang Gung Memorial Hospital between May 2007 and October 2010. The characteristics of the 206 patients with HNC are presented in Table 1. QoL questionnaires completed by patients prior to treatment and at 3 and 12 months after treatment were analyzed and used for multivariate logistic regression NTCP model response fitting. This study was approved by the Chang Gung medical foundation institutional review board (99-1420B, 96-1231B) and all participants gave written informed consent.

IMRT techniques
Each patient was immobilized using a commercially available thermoplastic mask and/or an individually customized bite block. Computed tomographic images (2.5-mm slice thickness, 5126512 pixels/slice) were acquired from the top of the vertex to the level of the carina. Both parotid glands were delineated by a radiation oncologist. Dose distributions were calculated and dosevolume histograms (DVHs) were generated separately for each parotid gland, enabling separate analysis. Two IMRT techniques were used: simultaneous integrated boost (SIB) and sequential mode (SQM). The prescribed total dose ranged from 54.0 to 77.4 Gy (median, 70.0 Gy). IMRT was delivered by the computer-controlled auto-sequencing segment or the dynamic multileaf collimator of a linear accelerator [Varian Clinac 21 EX (Varian Medical Systems, Palo Alto, CA) or Elekta Precise (Elekta, Crawley, UK)] as described previously [17], aiming to spare the parotid glands (predominantly contralateral side) while treating the primary targets and lymph nodes at risk. The prescribed doses were 54.0-77. According to the Radiation Therapy Oncology Group 0225 and 0615 trials, the planning objectives for PTVs were a minimum dose .95%, and no more than 5% of any PTV1 received $110% of the prescribed dose. The structural constraints for the parotid gland were a mean dose #26 Gy or V 30Gy #50%; a mean dose #40 Gy was used in the oral cavity excluding the PTV. Mean DVH values were calculated for the parotid glands for every patient. All data were based on the DVHs obtained using Pinnacle 3 H (Philips, Fitchburg, WI) with a bin size resolution of 0.01 Gy. The resolution of dose calculation was 2.5 mm for all IMRT plans.
Details about the prescribed dose and fractions for the SIB and SQM techniques can be found in previous studies [18,19].

QoL evaluation
A prospective survey of QoL using the European Organization for Research and Treatment of Cancer (EORTC) C30 and H&N35 QoL questionnaires (QLQ-C30 and QLQ-H&N35) was performed on 206 survivors of HNC. The patients were asked to complete the questionnaire prior to treatment and 3 months, 6 months, 1 year, and 2 years after IMRT. For the purposes of this analysis, the 3-month (n = 206) and 12-month (n = 128) follow-up time points were used. Chinese versions of the EORTC QLQ-C30 and QLQ-H&N35 questionnaires were obtained from the Quality of Life Unit, EORTC Data Center, Brussels, Belgium [20]. For each item on the EORTC QLQ-C30 and QLQ-H&N35 questionnaires, the following four-point Likert scale was used: none (0), a little (33), quite a lot (66), and a lot (100). All QoL scores are given in the text. A high score on the functional or global QoL scale represents a relatively high/healthy level of functioning or global QoL, whereas a high score on the symptom scale represents the presence of a symptom or problem. The EORTC QLQ-H&N35 questionnaire was used to evaluate xerostomia (i.e., the analytical endpoint). Grade 3 + xerostomia was defined as moderate (66) to severe (100) xerostomia 3 and 12 months after the completion of RT; this corresponds to the two highest scores on the four-point Likert scale. As we were primarily interested in grade 3 + xerostomia induced by RT itself, patients with moderateto-severe xerostomia at baseline were excluded from further analysis [3][4][5]21]. The primary endpoint (grade 3 + xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER 3m ) and 12 months (XER 12m ) after the completion of IMRT, and excluded patients with grade 3 + xerostomia at baseline.

Statistical analysis
NTCP models for moderate-to-severe patient-rated xerostomia were developed using a multivariate logistic regression analysis with an extended bootstrapping technique as described by El Naqa et al. [7] and Beetz et al. [3][4][5]. For each patient, predictive values were calculated for each set of prognostic factors based on the multivariate logistic regression coefficients according to the formula: in which n is the number of prognostic factors in the built model; variables x i represent different prognostic factors; and bi are the corresponding regression coefficients. For each patient, 16 candidate prognostic factors were initially included in the variable selection procedure. The candidates included 14 clinical and two dosimetric factors. The clinical candidate factors were chemotherapy (C/T), treatment mode (SIB or SQM), gender, age, AJCC stage, baseline xerostomia, T stage, node classification, education, family history, financial status, marriage, smoking, and alcohol abuse.
The dosimetric candidate factors were the mean dose given to the contralateral parotid gland (Dmean-c) and the ipsilateral parotid gland (Dmean-i) (Gy). We excluded Vx values, which were previously found to be highly correlated with each other [4]; Dmean-c and Dmean-i were the only two DVH-parameters in this study. The candidate prognostic factors are listed in Table 2. We used the LASSO process along with the Hosmer-Lemeshow test to select the optimal and suboptimal numbers of potential prognostic factors for the predictive model.
The LASSO was first proposed by Tibshirani in 1996, the details of which can be found in [22]. It uses the following equation to shrink the coefficients and select the prognostic factors: where d is the number of variables selected, and t are tuning parameters that control the degree of penalty, which can be determined by cross-validation [9,23]. Based on the nature of the constraint, LASSO tends to produce some coefficients as zero, and it improves the overall prediction accuracy by allowing a small amount of bias to reduce the variance of the predicted values. The selection of the optimal number of prognostic factors was performed by cross-validation in this study.
To account for the overfitting problem in our study, two datasets were used, i.e., a training set and a test set; a model was built based on a training set and fitted to the training set itself and also tested with a test set. We used nested 10-fold cross-validation to obtain the best prognostic factor subsets.
To generalize the use of the models, a compact model can be generated with a small suboptimal number of prognostic factors for more user-friendly handling; however, accuracy has to remain at a certain level. For the selection of a suboptimal number of prognostic factors, we first processed the LASSO to rank the correlations for the potential prognostic factors, i.e., the DVH parameters and patients' clinical data. In order to reduce the number of prognostic factors in the model, the suboptimal number of prognostic factors for a multivariate logistic regression model was determined using a bootstrapping method with the Hosmer-Lemeshow test. We set the Hosmer-Lemeshow test to show a significant agreement between predicted risk and observed outcome when the Hosmer-Lemeshow test value was $0.05. The first number needed for the model was recorded when the value was $0.05, after which we put the selected prognostic factors into the model to calculate the area under the receiver operating characteristic curve (AUC). We carried on increasing the number of prognostic factors with a higher Hosmer-Lemeshow test value and stopped when the system AUC did not increase significantly (,5%). The Hosmer-Lemeshow test revealed the accuracy of the model. The AUC was also used as a measure to evaluate prediction performance and it described the discriminatory ability of the model. The prognostic factors selected were used for the definitive NTCP model for moderate-to-severe patientrated xerostomia.

Forward selection
Beetz et al. used the forward selection method to develop NTCP models for moderate-to-severe patient-rated xerostomia using a multivariate logistic regression analysis with an extended bootstrapping technique and forward variable selection [3,4]. We compared their method with the LASSO mentioned above. For every model order, the likelihood of predictions was calculated and the number of selected variables with the highest likelihood was selected for the definitive predictive model for patient-rated xerostomia. Two thousand bootstraps were used for each analysis. The details can be found in [3,4].

Results
All patients completed QoL questionnaires at three time points (before RT, during RT, and at 3 months after RT). In addition, 128 patients completed a QoL questionnaire 12 months after IMRT. Of the 206 patients assessed at the 3-month time point, 21 who were already suffering from moderate-to-severe xerostomia at baseline were excluded, leaving 185 patients for analysis. Of the 128 patients assessed at the 12-month time point, 11 who were already suffering from moderate-to-severe xerostomia at baseline  were excluded from further analysis, leaving 117 patients for analysis. At 3 months after treatment, 42.2% of the patients reported moderate-to-severe xerostomia. After 12 months, 33.6% reported moderate-to-severe xerostomia ( Table 1).
The following steps were used to select the optimal and suboptimal numbers of prognostic factors. First, LASSO ranked how strongly the factors correlated, then chose the suboptimal number of prognostic factors using the Hosmer-Lemeshow test and AUC. LASSO of bootstrap prediction in the multivariate logistic regression analysis ranked the prognostic factors in descending order, as shown in Table 3 (the LASSO shrinking path diagrams are shown in Figs. 1a and b)  The overall performance for both time points of the NTCP model for patient-rated xerostomia in terms of scaled Brier score, Omnibus, and Nagelkerke R 2 was satisfactory and corresponded well with the expected values ( Table 6). The AUC for the suboptimal model was 0.84 (95% CI 0.80-0.86) and 0.84 (95% CI 0.81-0.86) for 3 and 12 months, respectively. The AUC for the optimal model was only slightly improved (less than 5%) compared with the suboptimal model; this was 0.86 (95% CI 0.83-0.88) and 0.87 (95% CI 0.84-0.89) for 3 and 12 months, respectively. Finally, the Hosmer-Lemeshow test showed significant agreement between predicted risk and observed outcome for both LASSO optimal and suboptimal models ( Table 6). In our population, the DVH diagrams for the mean doses delivered to the ipsilateral and contralateral parotids are shown in Fig. 2 for both the 3-and 12month time points.
For the likelihood forward selection, the numbers of prognostic factors selected were 9 and 11 for the 3-and 12-month time points, respectively. The AUC of the forward selection model was similar to that achieved by LASSO: 0.85 (95% CI 0.83-0.88) and 0.86 (95% CI 0.83-0.88) for 3 and 12 months, respectively.

Discussion
Early NTCP models, like the Lyman-Kutcher-Burman and the univariate logistic regression model [24], are based on information derived from DVHs generated from dose distributions in the target volumes and the surrounding organs at risk. For example, the mean dose received in the parotid glands is the only prognostic factor for xerostomia in univariate models. In this multivariate model study, not only the mean doses to the contralateral/ ipsilateral parotid glands were the principal components causing xerostomia; but also age, education, and smoking were also to be the effected components.
For the IMRT planning goal, the mean dose to each parotid gland should be kept as low as possible, consistent with the desired clinical target volume coverage. Sparing at least one parotid gland appears to eliminate complications [2,25]. Severe xerostomia can usually be avoided if at least one parotid gland has been given a mean dose #20 Gy or if both glands have been given a mean dose #25 Gy. A lower parotid mean dose usually results in better function with respect to the effects on patients' QoL [2,6,25].
In our population, the average mean doses to the ipsilateral and contralateral parotid were 36.3 Gy vs. 30.6 Gy, respectively. These doses were more or less similar to those observed in previous IMRT treatment reports, such as Beetz et al. [26], who found that they were 35.2 Gy and 28.0 Gy to the ipsilateral and contralateral parotid gland, respectively; Nutting et al. [27] reported that the average mean doses to the ipsilateral and contralateral parotid glands were 47.6 Gy and 25.4 Gy, respectively. In our cohort, these average dose levels were much higher in the subset of nasopharyngeal cancer patients: 41.6 Gy and 38.7 Gy to the ipsilateral and contralateral gland, respectively. A possible explanation for this is that the overlap of the PTV with parts of the parotid glands in nasopharyngeal cancer patients makes it more difficult to spare the parotid glands with IMRT [26]. This implies that more effective new technologies that could spare the parotid glands without compromising target volume coverage need to be investigated.
The current analysis, for the 3-month time point, showed that the mean dose to the contralateral parotid gland was the most important variable determining acute xerostomia. The threefactor model containing the mean dose to the contralateral gland and the ipsilateral gland performed significantly better than the single variable model. The dose to the contralateral parotid gland was more important than the dose to the ipsilateral parotid gland when treated with IMRT; similar reports can be found in a number of papers on patient-rated xerostomia [4,26,[28][29][30].
For the 12-month time point, the dose to the contralateral parotid gland was also significantly correlated with late xerostomia. However, the mean dose to the ipsilateral parotid gland was the most important variable. As the mean dose to the contralateral parotid gland was significantly lower in patients with/without xerostomia when treated with IMRT, it is not surprising that recovery can be expected during this period. Indeed, some investigators showed that the contralateral parotid recovered after  Table 2. doi:10.1371/journal.pone.0089700.g001 9 to 24 months in different recovery levels. Eisbruch et al. [29] reported the correlation between mean parotid dose and parotid salivary flow recovery in patients with non-nasopharyngeal headand-neck cancer. In the Kwong et al. [31] study, 60% of patients recovered at least 25% of their baseline stimulated parotid salivary flow at 12 months post-IMRT. In Hsiung et al. [32], 64.5% of 31 parotid glands recovered at least 25% of their baseline maximal excretion ratio at 9 months post-IMRT. This was consistent with the results of another study [33], which suggested that reducing radiotherapy doses to both parotid glands to ,26 Gy can reduce xerostomia significantly.
Although parotid gland dysfunction plays an important role in the development of patient-rated xerostomia [3], it is not the only prognostic factor. Beetz et al. recently showed that age and baseline xerostomia were independent prognostic factors for patient-rated xerostomia, in addition to the mean dose to the parotid glands [3,26]. In the current study, we likewise found that elderly patients have a higher probability of suffering from xerostomia than younger patients. Beetz et al. claimed that older patients are more likely to use medication and to have comorbidities that may influence and reduce saliva production at rest [3,34]. Fortin et al. [35] showed that in a univariate and multivariate analysis, total parotid mean dose and age were strongly associated with a lower incidence of grade $2 xerostomia at 6, 12 and 24 months.
In this study, those who had a higher financial status or a higher level of education tended to avoid the inconvenience of xerostomia. Similarly, Ramsey et al. [26] showed that lower financial status in colorectal cancer patients was associated with a worse outcome for reported pain. Fang et al. [21] found that NPC survivors with a higher annual family income and level of education presented a significantly better outcome on QoL scores. These findings suggest that the patient's individual abilities and the resources available to cope with the threat of treatment complications are powerful variables that affect their future quality of life. Financial status and education remain two of the most significant variables correlated with xerostomia in our cohort. In addition, smoking, a recognized carcinogenic factor in the overwhelming majority for head and neck cancers [36,37], was shown to be a contributory cause of xerostomia after IMRT in our cohort. Rad et al. [38] also indicated that long-term smoking would significantly reduce whole-mouth salivary flow rate.
That the risk of complications may depend on more factors than simply the dose to a single organ seems to be true. Clinical datasets on normal tissue complications often include a large number of variables, many of which need to be investigated and incorporated in a model because they may be related to a given complication. As reported by El Naqa et al., the prediction of endpoints can be improved by mixing clinical and dose-volume factors, while bootstrap-based variable selection analysis increases the reliability of the predictive models [7]. Indeed, our results showed better performance of the multivariate model compared with the univariate relationships between dose-volume prognostic factors and XER 3m or XER 12m .
Recently, we reported the results of a retrospective study that was conducted to develop a univariate NTCP model for patientrated moderate-to-severe xerostomia among HNC patients treated with IMRT [5]. The AUC values for the model were 0.68 (95% CI 0.61-0.74) and 0.72 (95% CI 0.64-0.80) for the 3-and 12month time points, respectively. The only prognostic factor was the mean dose to parotid glands. In this study, for the selection of suboptimal prognostic factors, the system performance AUC values improved from 0.68 to 0.84 for the 3-month time point, and from 0.72 to 0.84 for the 12-month time point when the number of prognostic factors used increased from one to three or four, respectively. The multivariate approach allowed the integration of different prognostic factors for estimating the risk on XER 3m and XER 12m in individual patients. In this regard, it should be emphasized that dose-effect relationships for this endpoint should be described by multiple NTCP curves rather than by a single NTCP curve. Moreover, whether the gain is worth the increased complexity requires further investigation. The problem of increased complexity is the potential limitation of this study. After all, a large number of selected prognostic factors may lead to instability in the models.
In the literature, a similar report by Xu et al. stated that the NTCP model obtained by LASSO was statistically significant [10]. Their dataset contained 185 patients who were all treated with primary 3D-CRT for HNC at 6 months after RT; 106 patients were assessed as having xerostomia. The primary endpoint was defined as Radiation Therapy Oncology Group (RTOG) grade 2 + xerostomia using the RTOG late radiation morbidity scoring system. The prognostic factors used for the NTCP modeling included five clinical and 16 dosimetric factors. The five clinical variables were chemotherapy, gender, age, treatment center, and baseline xerostomia score. The dosimetric factors were the mean dose given to the organs at risk and their volume. These organs were the lower lip, the soft palate, the contralateral and ipsilateral parotid gland, the contralateral and ipsilateral sublingual gland, and the contralateral and ipsilateral submandibular gland. The differences in comparison with our models included treatment technique (IMRT vs. 3D-CRT), time point (3-and 12-month vs. 6-month), endpoint (G3 + vs. G2 + ), and candidate prognostic factors (14 clinical and two dosimetric factors vs. five clinical and 16 dosimetric factors). However, the system performance AUC values were similar with both regression models (approximately 0.85). Therefore, LASSO was recommended for multivariate logistic regression NTCP modeling. The LL forward selection of bootstrap predictions in the multivariate logistic regression analysis was optimal with a model consisting of 9 and 11 variables for the 3-and 12-month time points, respectively. The system performance AUC values were no better than the results of LASSO. However, the selection of too many prognostic factors may result in overfitting.
There are a number of potential limitations of this study. Arjen van der Schaaf et al. [39] claimed that approximately 200 patients are required to obtain a model with high predictive power. In the current study, the number of patients evaluable at 12 months was far below the recommended 200 patients. The predictive power can be improved by increasing the sample size. Data from 6 months were not evaluated on the basis of previously published data [40][41][42], in which the dose dependency of salivary function  and recovery with time after RT was found up to 12 months post-RT. In addition, in the Kwong et al. [31] study, 60% of patients recovered at least 25% of their baseline stimulated parotid salivary flow at 12 months post-IMRT. Eisbruch et al. [28,29] showed that a parotid mean dose ,26 Gy decreases the severity of late complications and gives a better chance of functional parotid recovery at 12 months. The stability for evaluating late toxicity was more reliable at 12 months than 6 months. That chemotherapy, a non-dosimetric patient factor, may affect the risk of xerostomia is an issue of special concern. Deasy et al. [25] and Moiseenko et al. [2] stated that the use of chemotherapy was not typically correlated with xerostomia risk. This is consistent with our results, as chemotherapy was not significant among the 16 candidate prognostic factors and there was no association between chemotherapy and risk of xerostomia.

Conclusion
LASSO is recommended for multivariate NTCP modeling. However, in practice, the suboptimal LASSO method tends to incorporate fewer prognostic factors than the optimal approach. The suboptimal LASSO method is useful in practice. The prognostic factors included in the model are useful to further optimize current IMRT treatment with regard to patient-rated xerostomia and to indicate which prognostic factors are the most important, thus helping to spare the glands as much as possible, and to optimize current treatment with IMRT.