Prediction of Survival with Alternative Modeling Techniques Using Pseudo Values

Background The use of alternative modeling techniques for predicting patient survival is complicated by the fact that some alternative techniques cannot readily deal with censoring, which is essential for analyzing survival data. In the current study, we aimed to demonstrate that pseudo values enable statistically appropriate analyses of survival outcomes when used in seven alternative modeling techniques. Methods In this case study, we analyzed survival of 1282 Dutch patients with newly diagnosed Head and Neck Squamous Cell Carcinoma (HNSCC) with conventional Kaplan-Meier and Cox regression analysis. We subsequently calculated pseudo values to reflect the individual survival patterns. We used these pseudo values to compare recursive partitioning (RPART), neural nets (NNET), logistic regression (LR) general linear models (GLM) and three variants of support vector machines (SVM) with respect to dichotomous 60-month survival, and continuous pseudo values at 60 months or estimated survival time. We used the area under the ROC curve (AUC) and the root of the mean squared error (RMSE) to compare the performance of these models using bootstrap validation. Results Of a total of 1282 patients, 986 patients died during a median follow-up of 66 months (60-month survival: 52% [95% CI: 50%−55%]). The LR model had the highest optimism corrected AUC (0.791) to predict 60-month survival, followed by the SVM model with a linear kernel (AUC 0.787). The GLM model had the smallest optimism corrected RMSE when continuous pseudo values were considered for 60-month survival or the estimated survival time followed by SVM models with a linear kernel. The estimated importance of predictors varied substantially by the specific aspect of survival studied and modeling technique used. Conclusions The use of pseudo values makes it readily possible to apply alternative modeling techniques to survival problems, to compare their performance and to search further for promising alternative modeling techniques to analyze survival time.


Introduction
Predicting the survival probability of patients is important for various purposes in biomedical research, such as patient counselling, medical decision making, and benchmarking. The conventional analysis of survival problems mainly relies on Kaplan-Meier analysis and Cox regression modeling to predict the survival probability in relation to predictor variables [1,2].
Alternative modeling techniques are available, such as support vector machines and artificial neural networks [3][4][5], which might possibly provide better predictions. For example, feed forward neural networks were already used in 1998 for the analysis of censored survival data [6]. In 2007, applications of random survival forests were described [7]. In 2009, prognostic indexes were compared using data mining techniques and Cox regression analysis in breast cancer data [8].
In 2000, Schwarzer and Vach [9] reviewed the use of artificial neural networks in medical research and found several problems.
A major problem was that some of the alternative techniques did not deal adequately with censoring, which is essential for analyzing survival data. The conventional analysis of survival outcomes requires two variables: the status of the patient (e.g. dead or alive) and the time point at which this status is measured. In 2008, Klein et al. [10,11] proposed to predict the survival at particular time points using pseudo values, which combine the variables status and time point in one outcome variable. The use of these pseudo values in generalized estimating equation modeling (GEE) using a log-minus-log link function leads to statistically appropriate analyses, which are in line with the results of Cox regression modeling.
In the current study, we aimed to study the use of pseudo values for analyses of survival outcomes with other modeling techniques, including support vector machines (SVM), neural networks (NNET), general linear models (GLM), recursive partitioning (RPART) and logistic regression (LR). To compare the performance, we applied these techniques and conventional regression analysis in the prediction of survival of 1282 Dutch patients with Head and Neck Squamous Cell Carcinoma (HNSCC), using predictors as described in earlier studies [12][13][14]. The survival of this particular population of newly diagnosed patients with HNSCC has already been studied by applying conventional Kaplan-Meier analysis, Cox regression and random survival forests (RSF) to 60-month survival and overall survival [15][16][17].

Patients and Data
We considered a cohort of 1371 patients with Head and Neck Squamous Cell Carcinoma (HNSCC) of the oral cavity, pharynx or larynx, diagnosed at Leiden University Medical Centre. The data were obtained from files used in an earlier study [16]. The same data had been used before to derive a prediction model based on the Cox regression modeling technique [15]. Predictors in this model included Tumor location, Age at diagnosis, Gender, T-N-M classification (T = the extent of the primary tumor, N = the absence or presence and extent of regional lymph node metastasis, M = the absence or presence of distant metastasis) and Prior malignancies. In 2010, Datema et al. [16,17] published an updated model including comorbidity according to the Adult Comorbidity Evaluation, based on a 27-item comorbidity index (ACE27) [18]. In our study, we excluded patients for whom comorbidity was unknown, resulting in a total of 1282 patients.

Outcome Variables
We defined three outcome variables related to patient survival: a) The 60-month survival (dichotomous, dead or alive, ignoring censoring before 60 months) b) The pseudo values at 60 months (continuous) c) The estimated survival time (continuous) We focused on 60-month survival, since this is a common time point in cancer research. We subsequently calculated pseudo values for the time points 12, 24,…, 288, and 300 months to reflect the individual survival patterns of patients using the R-package ''Pseudo''. The pseudo values form a new set of observations to allow for analysis as if we had time-to-event data without censoring [10,11].
The estimated survival time was calculated as the sum of the pseudo values at these time points, because this sum reflects the area under the survival curve and can be interpreted as the mean survival time. The choice for a time interval of 12 months was motivated by the wish to have around 25 time intervals per subject for sufficient accuracy in estimating the survival time. File S1 (appendix 1) gives a more detailed description of the calculation and interpretation of the pseudo values and the estimated survival time. For univariate analysis of 60-month survival and overall survival we used Kaplan-Meier analysis and Cox regression analysis.

Modeling Techniques
We considered the following modeling techniques: support vector machines (SVM), neural networks (NNET), recursive partitioning (RPART), general linear models (GLM) and logistic regression (LR), with their implementations as available in the software package R, version 2.14.1 [19]. The parameters of the various modelling techniques are presented in Table 1.

Tuning of the Modeling Techniques
Before applying a modeling technique, we tuned that technique by varying the parameters to create an optimal model fit. The optimal parameter setting was based on the smallest prediction error after 10-fold cross validation. The modeling technique SVM was tuned using a simultaneous grid search for the parameters cost and gamma when a radial or linear kernel was used and for the parameters cost, gamma and degree when a polynomial kernel was used. The modeling technique NNET was tuned using a simultaneous grid search for the parameter size, and the modeling technique RPART was tuned by varying the cp-value.

Validation and Performance of the Modeling Techniques
For all models, internal validation was done by bootstrap resampling (200 bootstrap samples). From the original data set a bootstrap sample was drawn (randomly and with replacement). Then the modeling technique was tuned to create an optimal model fit for this bootstrap sample. With the optimal setting resulting from the tuning, we applied the modeling technique to the bootstrap sample and calculated the performance of the resulting model (bootstrap performance). We then applied the model to the original data base and calculated the performance (validated performance). This process was repeated 200 times. The 200 results were averaged to produce a single estimation of the bootstrap performance and the validated performance [28]. The difference of the mean bootstrap performance and the mean validated performance indicated the optimism of a model. The optimism corrected performance was calculated by subtracting the optimism from the apparent performance estimate, i.e. when the model was optimized and assessed for its performance on the original data set. With respect to dichotomous 60-month survival, the performance measure was the area under the ROC-curve (AUC). With respect to continuous pseudo values at 60 months and estimated survival time, the performance of the models was calculated using the root of the mean squared error (RSME). Variable Importance We calculated the relative importance of each of the eight predictor variables in a model by calculating the difference between the validated performance of the full model with all eight predictor variables and the validated performance of the model with seven predictor variables, leaving out each predictor variable in turn.

Ethics Statement
Patient data were used that had been collected prospectively and anonymously between 1981-1998. According to Dutch regulations, neither medical nor ethical approval was required to conduct the study, as no interventions were initiated and the study had no influence on medical care nor on decision making. The data was anonymised. The study was not supported financially in any way.

Patients and Data
Of the 1371 patients included originally, we dropped 89 patients for whom the comorbidity was unknown. As a result, we included 1282 patients in our analysis. Of these, 986 patients died during a median follow-up of 66 months (60-month survival: 52% [95% CI: 50%255%], Figure 1). The censoring pattern of the patients (censoring rate before 60 months: 4%) is presented in Figure 2. Table 2 shows the overall number of events and the survival probabilities for each category of the predictor variables with respect to the Kaplan-Meier estimated 60-month survival. Several characteristics were associated with a poor 60-month survival: Tumor location in the Hypopharynx, Oral cavity and Oropharynx (60-month survival 0.33, 0.36 and 0.37 respectively), cancer stages T3, T4, and N3 (60-month survival 0.38, 0.27, 0.11 respectively), higher age (Age . = 70, 60-month survival 0.40) and severe comorbidity (Grade 3 of ACE27, 60-month survival 0.25).

Model Performance and Optimism
We evaluated the performance of the various models with respect to the three survival related outcome variables.
For the outcome 'dead or alive at 60 months', the LR model had the highest optimism corrected AUC (0.791, Table 3) followed by the SVM model with linear kernel (AUC 0.787, Table 3). The NNET model performed slightly poorer (AUC 0.785, Table 3). The RPART model had the lowest AUC (0.725, Table 3).
Considering the outcome 'pseudo values at 60 months', the GLM model had the highest optimism corrected RMSE (0.436, Table 4). The SVM model with polynomial kernel and the NNET model performed poorly (RMSE 0.482 and 0.486 respectively, Table 4).
Analyzing the outcome 'estimated survival time', the GLM model had the lowest optimism corrected RMSE (77.7, Table 5), followed by the SVM model with a linear kernel (79.2, Table 5). The NNET model had the worst RMSE (83.7, Table 5).
The regression based models (LR and GLM) had relatively small optimism. This small optimism was also noted for the SVM models with a linear kernel. The bootstrap-estimated optimism was substantial for NNET and the more complex SVM models with polynomial and radial kernels ( Table 3 to Table 5).

Variable Importance
For each model and for each outcome we calculated the variable importance ( Figure 3). We chose the parameter settings of the modeling techniques based on the highest frequency (mode) resulting from the bootstrap procedure (Table 6). Figure 3 shows the variable importance for each model and for each outcome with these parameter settings.
Overall, the variables Tumor location, T-class and N-class were the most important predictor variables for predicting the dichotomous and continuous 60-months survival (Figure 3). Survival probability was considerably lower for patients with cancer stages T4 and N3 (File S3 (appendix 3), Table 7, Table 8).  For the estimated survival time, age at diagnosis was the most important predictor variable (Figure 3). Cancer stages T1 and N0 indicated a relatively good survival probability (File S3 (appendix 3), Table 9). The relative importance of each predictor variable varied substantially by the specific aspect of survival studied and modeling technique used.
The variable plots with observed 60-month survival (dichotomous) proved to be very similar to the variable plots with pseudo values at 60 months (continuous), except for the NNET model ( Figure 3). *Cox regression was added as reference technique.

Discussion
In this study, we demonstrated that pseudo values as described by Klein et al. [10,11] enable statistically appropriate analyses of survival outcomes when used in in three variants of support vector machines (SVM), neural networks (NNET), general linear models (GLM), recursive partitioning (RPART) and logistic regression (LR). We showed that pseudo values enabled us to apply these techniques to predict survival in a case study of 1282 Dutch patients with newly diagnosed HNSCC, and to compare the performance of the resulting models.
Our analysis showed that conventional regression analysis approaches (logistic regression and the generalized linear model) outperformed the performance of relatively modern modeling techniques. However, the SVM model with an optimal setting and a linear kernel performed only slightly worse with respect to our outcomes. The NNET model and the RPART model performed relatively poorly.
We compared the performance of the alternative modeling techniques in predicting three variants of survival outcome for our case study. The first, admittedly rather simplistic, outcome variable was based on the 60-month survival in terms of dead or  alive. This outcome may produce bias unless the censoring rate is small (4% in our study). The other two outcome variables were defined by means of pseudo values, which were derived from the Kaplan Meier survival function.
A drawback of outcome definitions for 60 months is that they only consider survival at a particular point in time rather than the full survival curve. By contrast, the approach with the estimated survival time is attractive, because it considers the full survival Table 6. Mode of the parameter settings identified as optimal in bootstrap samples.  curve. We consider the total expected survival time the most relevant to inform patients about their prognosis and to support decision making. In our study, SVM models with a linear kernel and optimal settings performed slightly worse than conventional regression modeling. These findings are in line with other studies that used support vector machines for analyzing survival [3][4][5][6]. On the other hand, our findings also support the results of previous studies that relied on Cox regression modeling to predict the five year mortality and the overall mortality of newly diagnosed patients with HNSCC [15][16][17].
None of the investigated models showed a very satisfactory performance. This may possibly be explained by the low signal-tonoise ratio in our data. In 1998, Ennis et al. discussed the predictive performance of adaptive non-linear algorithms versus conventional statistical techniques. Based on their quite negative findings for the more modern algorithms, they postulated that adaptive non-linear methods may be most useful in problems with high signal-to-noise ratios, which sometimes occur in engineering and physical science. Since the signal-to-noise ratio is often quite low in medical prediction studies, they concluded that modern methods may have less to offer [24].
A limitation of this study is that the results were based on a single cohort of 1282 Dutch patients, diagnosed at a single center [16]. We had to rely on bootstrap validation to estimate the performance of alternative modeling techniques. On the other hand, the number of events was more than sufficient to allow for detailed statistical modeling with modern techniques for the relatively small set of candidate predictors.
We showed that the use of pseudo values opens new possibilities for analyzing survival problems with techniques other than conventional regression techniques. The validity of the pseudo value approach is supported by the concordance between Cox regression modeling for censored survival time and Generalized Estimating Equation modeling (GEE) using a log-minus-log link function [11]. Therefore, this approach deserves a central role in the ongoing search for improved prediction models for survival. On the other hand, our results also show that it may be hard to find modeling approaches that are superior to conventional regression analysis in terms of performance, applicability and simplicity. In conclusion, the use of pseudo values makes it readily possible to analyze survival time with alternative modeling techniques, to compare their performance and to search further for promising alternative modeling techniques to analyze survival time. In our case study on patients with newly diagnosed HNSCC, none of the alternative modeling techniques provided better predictions for survival than conventional regression modeling techniques. The estimated importance of predictors depends on the specific aspect of survival studied and the modeling technique used.

Supporting Information
File S1 Appendix 1.