Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients

Currently in patients with bladder cancer, various clinical evaluations (imaging, operative findings at transurethral resection and radical cystectomy, pathology) are collectively used to determine disease status and prognosis, and recommend neoadjuvant, definitive and adjuvant treatments. We analyze the predictive power of these measurements in forecasting two key long-term outcomes following radical cystectomy, i.e., cancer recurrence and survival. Information theory and machine learning algorithms are employed to create predictive models using a large prospective, continuously collected, temporally resolved, primary bladder cancer dataset comprised of 3503 patients (1971-2016). Patient recurrence and survival one, three, and five years after cystectomy can be predicted with greater than 70% sensitivity and specificity. Such predictions may inform patient monitoring schedules and post-cystectomy treatments. The machine learning models provide a benchmark for predicting oncologic outcomes in patients undergoing radical cystectomy and highlight opportunities for improving care using optimal preoperative and operative data collection.


Introduction
Bladder cancer (BCa) is the 6th most common cancer in the U.S, with an estimated 79,030 new cases and 16,870 deaths in 2017 [1] and has a 5-year relative survival rate of 79% [2]. BCa staging is based on the TNM system (tumor, nodes, metastasis). In BCa, the "T" stage is dictated by how deep the tumor invades into the various layers of the bladder wall. Ta represents a noninvasive papillary tumor, while T1, T2, T3 and T4 stages represent more aggressive a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 cancers invading the sub-epithelial tissue, muscle, peri-vesical fat and adjacent organs, respectively. Radical surgery is the primary treatment method for invasive cancer and may be augmented with other forms of therapy such as chemotherapy to treat more advanced and aggressive cancers [3]. Radical cystectomy, the recommended method for treating invasive BCa [4], is surgical removal of the bladder, regional lymph nodes and adjacent organs (prostate, uterus, etc.) which may contain cancer. Technical precision of this surgical operation can dictate long-term oncologic outcomes, for instance, post-cystectomy survival is higher when negative surgical margins are obtained and more than ten pelvic lymph nodes removed during radical cystectomy [5]. Conversely, cancer recurrence rates are higher with positive margins and removal of less than ten nodes.5 Furthermore, patients with organ-confined disease are less likely to relapse beyond 5 years, and unlikely beyond 10 years after cystectomy, even without adjuvant treatment [6].
These trends are derived from focused studies with disparate cohorts. The large size of the current dataset offers a chance to confirm and refine these relations. Beyond knowledge discovery, larger and electronically managed medical databases lend to predictive tool development. Consequently, machine learning techniques have been applied extensively on clinical, epidemiological, and molecular data to predict prognosis and outcome in various cancers. Cruz and Wishart [7], and more recently, Kourou et al. [8] offer a review of some of these studies which predict of susceptibility, recurrence and survival, where the merit of techniques and the quality of the data are quantified by prediction accuracies. In BCa, the most relevant existing study used a multi-institution dataset of 9000 patients, including 980 data points from the present dataset, to construct a nomogram for predicting 5-year recurrence which achieved a concordance index of 0.75 [9].
The present work focuses on: (i) using preoperative and operative BCa data to uncover patterns of long term outcomes and (ii) assessing the predictive power of BCa-specific factors in elucidating overall survival (OS) and recurrence. We employ the information theory concept of mutual information (MI) to uncover correlated parameters. We then stratify the set of predictors by correlation with 5-year binary recurrence and binary OS to quantify their relative importance. The prognostic power of these variables is assessed by developing a machinelearning classification pipeline to predict recurrence and survival after radical cystectomy, urinary diversion and extended lymphadenectomy, the standard-of-care for high-risk, muscleinvasive BCa. The models presented deliver a quantitative method for stratifying patients into higher resolution risk groups than is possible with current methods.

Data summary
The original dataset (details in Table A in S1 File) comprised of 3503 patients is pruned to 3499 (mean age 67.8 years) patients by removing 4 cases with missing survival data. All patients underwent radical cystectomy at the USC Institute of Urology from 1971 to 2016. Statistical results based on this dataset up to 1997 were published by Stein et al. in 2001 [10] on a subset of 1054 urothelial carcinoma patients. Presently, this is one of the largest known singleinstitute datasets of BCa cystectomy patients in terms of sample size and the 45-year timespan over which the data were prospectively and continuously collected with University of Southern California Institutional Review Board (IRB) approval. Consequently, the evolution of preoperative and operative assessments is also explored. In addition to information pertinent to BCa, comorbidity data were also collected to study the effect of preexisting diseases on progression of BCa. Remainder of the data is comprised of demographics, clinical diagnostic information prior to cystectomy, tumor markers prior to cystectomy, and pathologic and surgical data at time of cystectomy including adjuvant therapy treatment information. In the context of machine learning, these preoperative and operative measurements are called predictors, and the target variables are binary indicator variables for recurrence and survival after a given number of years post-cystectomy.

Statistics and information theory
We perform survival analysis using the Kaplan-Meier estimator to differentiate OS by various predictors. However, to develop an understanding of system-wide patterns between all the predictors, recurrence, and OS, a network approach is more suitable. Relevance, or correlation networks [11][12][13][14] can be created using a similarity measure. Therefore, we create a mutual information (MI) network and subsequently a Euclidean distance based complete-linkage agglomerative hierarchical clustering of the most closely associated variables. Here, we use normalized MI which ranges from 0 to 1 for entirely unrelated to maximally related pairs of variables. Larger values of MI correspond to higher dependence between two variables. We normalize MI by the maximum entropy of the two variables being compared [15], and normalized MI will be abbreviated as MI. The set of all pairwise MI relations make up the adjacency matrix of the MI network, which is visualized as a clustered heat-map. For this analysis we limit the dataset to 2618 patients whose recurrence and survival data is known for five years post-cystectomy.
The predictors are ranked by their association with the two long-term 5-year binary outcomes, recurrence and OS, using the chi-squared test of independence which measures the association between two categorical variables. Age, and other continuous variables are discretized to perform the chi-squared test. The composite assessment identifies higher fidelity variables and encapsulates the clinical relevance of the measurements. A composite predictor ranking, rank i ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ð� w 2 based on the chi-squared values for recurrence (w 2 Rec ), and the OS chi-squared values (w 2 OS ), is used to identify predictor importance. The chi-squared values for both outcomes are normalized by their respective standard deviations (Eqs 1-3) to weigh the effect of both outcomes equally. Appendix C in S1 File describes the datasets used for this analysis.

Machine learning approach
The performance of multivariate predictive models is compared to univariate logistic regression models. To create the multivariate models, a series of base predictive models are employed, subsequently mixture-of-experts and stacking based ensemble models are trained using these base models. The base models consist of: support vector machines (SVM), bagged SVM, K-nearest neighbor (KNN), adaptive boosted trees (AdaBoost), random forest (RF), and gradient boosted trees (GBT). The mixture-of-experts models are based on hard-voting among the base models, whereas the stacking ensemble models perform dimensionality reduction of the base model predictions before employing a second logistic regression or support vector machine (SVM) model. For each prediction task, a different triplet of models forms the final meta-classifier, which is constructed by combining one each of the best base, mixture-of-experts, and stacking classifiers using hard-voting. This method of combining various models is achieves the highest performance metrics.
Patients who leave the study before the target year of the survival models are removed from the dataset, resulting in n = 3201, 3066, and 2780 patients for the 1-, 3-, and 5-year survival datasets respectively. However, only patients who have no recurrences and leave the study before the target year of the recurrence models are removed from the corresponding models' datasets; resulting in n = 3071, 2955, and 2695 patients in the 1-, 3-, and 5-year recurrence datasets respectively. To avoid class imbalance while training, the subset of patients who recur are randomly oversampled to yield an equal count of recurring and non-recurring patients in the training sets. Similarly, for the survival classifiers, the fraction of surviving and non-surviving patients is balanced by random oversampling.
The procedure for feature selection consists of two steps: removal of irrelevant predictors and removal of redundant predictors. To remove redundant predictors, the hierarchical clustering of the 73 predictors is used to define 60 predictor clusters, and the predictor with the highest MI with the target variable is selected from each cluster. Subsequently, to remove irrelevant predictors from the dataset, predictors with low MI with the target variables are excluded from the dataset (MI<0.006 for predicting recurrence, MI<0.003 for predicting survival). These two feature selection steps yield a set of 52, 54, and 51 predictors for the 1-, 3-, and 5year recurrence models respectively, and 42, 45, and 45 predictors for predicting 1-, 3-, and 5-year survival respectively.
Final performance scores are found using nested cross-validation with ten outer folds and five inner folds in which the SVM, RF, GBT, and AdaBoost hyper-parameters are tuned. The Scikit-Learn platform [16] is used to implement the models.

Survival statistics
3503 patients' OS is outlined in Fig 1 and patients with unknown recurrence status are excluded from analysis pertaining to recurrence in the rest of the study. There is an exponential decay in survival by age groups in the five-year period post-cystectomy, which suggests the burden of BCa diminishes significantly within five years for patients undergoing radical cystectomy (Fig 2). Consequently, our prediction tasks focus on 1-, 3-, and 5-year survival and recurrence.
Comparing survival for clinical staging prior to surgery (Fig 3) and pathologic staging (pT staging: TNM 5th edition staging) at time of cystectomy (Fig 4) reveals the higher fidelity of pathologic staging. Clinical staging fails to separate staging as clearly as pathologic staging, for example clinical staging does not separate T2b and T3a patients as clearly as pathologic staging P2b and P3a patients.
During the study  there is a 24% (811/3417) agreement between the two staging measures, with clinical staging over-estimating pathologic stage by 25% (865/3417 patients). Since 2010, there is a surge in stage over-estimation, with a corresponding decrease in underestimation; however, overall concordance between the two staging measures has remained relatively constant over the decades studied (Fig A in S1 File). A graver consequence of the inferior resolution of clinical staging is that it underestimated pathologic stage in 51% of patients (1741/3417). Patients with carcinoma in situ had no discernable difference in OS as well as 5-year probability of survival compared to other patients.  In contrast to comorbidity factors which are strongly associated with each other, the mean MI between predictors and the binary 5-year recurrence target variable (43 in Fig 5) is 0.01100, and 0.01863 between predictors and binary 5-year OS (44 in Fig 5), highlighting the difficulty in predicting long term BCa outcomes by using only preoperative and operative data. The association between 5-year recurrence and OS, MI = 0.257, is much higher than the mean MI between the predictors and either long term outcome. Both long-term outcomes are in the blue cluster (Fig 5) where there is a lack of strong associations among the variables aside from three sets of variables related to neoadjuvant chemotherapy (18-20 in Fig 5), adjuvant chemotherapy (29-31 in Fig 5), and radiation (21-23 in Fig 5) which are highly related because  they are clinical re-classifications or sub-groupings of each other within the original variable's domain. Aside from TNM 7th edition staging (7 in Fig 5) which is nearly identical to pT staging (TNM 5th edition, 8 in Fig 5), the highest associations with pT staging are pathologic stage subgroup (2 in  Fig 5) does not have equally high MI with any of the other predictors except for the regrouped clinical staging variable (41 in Fig 5). The MI based heat-map and clustering in Fig 5 provides a system-wide view of the entire medical database for BCa patients and correlations between the predictors can be used to assess the quality of clinical measurement techniques.  Table I in S1 File.

Fig 5. MI between the set of predictors, binary recurrence, and binary OS five years post-cystectomy.
Predictors are clustered into four groups using a hierarchical clustering algorithm to discover associated predictor groups. BCa predictors and long-term outcomes are contained in the purple (anatomic staging), green (histologic staging), and blue (treatment and OS and recurrence) clusters, whereas the comorbidity factors comprise a solitary (red) cluster. Correlations within the purple and red clusters are high, but correlations between the comorbidity cluster and other clusters is low. Predictors are measured at three time points: � before cystectomy, �� at time of cystectomy, ��� post-cystectomy.

Correlations with long-term outcomes
The chi-squared test of independence for binary 5-year recurrence (vertical axis) and binary 5-year OS (horizontal axis) in Fig 6 and Table 1 shows the relative importance of each predictor. The variance in chi-squared values is computed by singular value decomposition [18] and is shown by an ellipse whose axes are the standard deviations (SD) along the first (SD = 164.9) and second (SD = 26.8) principal components (green lines) in Fig 6A. Some predictors intrinsically contain more information about survival than recurrence, and vice versa. For instance, urinary diversion (rank 11) and age (rank 13) are more strongly correlated with OS than recurrence, nevertheless, they rank high due to large effect on survival. Pathologic stage subgroup (rank 1), which indicates whether patients have OC, EV, or N+ disease at time of cystectomy, has the largest w 2 Rec and w 2 OS . Comparing pathologic stage (rank 2) to clinical staging (rank 10) reinforces the superiority of pathologic staging in differentiating patients by outcome as observed in the Kaplan-Meier curves (Figs 3A and 4A). The number of positive lymph nodes removed at time of cystectomy (rank 7) is significantly more correlated with recurrence than the total number of lymph nodes removed (rank 27). The predictors ranked 4-8 in Fig 6 have similar correlations with either outcome, and these predictors are part of the highly associated (purple) cluster in Fig 5. Appendix C in S1 File shows results of the same analysis for 1-year and 3-year binary outcomes, however most of the top correlates remain constant across these three time periods.
2757 patients who did not receive adjuvant chemotherapy had 5-year survival rate of 0.563 (95% CI [0.543, 0.583]), and 633 patients who did receive adjuvant chemotherapy had a 5-year  E in S1 File). Since the prescription of adjuvant chemotherapy is limited to a homogeneous set of patients who are node positive at time of cystectomy, the association of adjuvant chemotherapy with recurrence and OS may be artificially high in this dataset. Although the predictors with below average chi-squared values are less important than the others, they may still differentiate patients who are similar in the higher ranked variables. The lowest correlates of 5-year binary OS and recurrence are pathologic micropapillary (w 2 OS = 4.52) and existence of diabetes (w 2 Rec = 0.248) respectively. Overall the predictors have a weaker association with recurrence than OS. Since the MI between BCa specific predictors is small, even the predictors with small chi-squared values add new information about a patient. However, this new information may not necessarily inform OS and recurrence.

Predicting post-cystectomy recurrence
We evaluate the performance of machine learning models to predict post-cystectomy disease recurrence using preoperative and operative data as well as the type and number of adjuvant therapy cycles administered (Table 2). Both univariate (logistic regression) and more complex multivariate models (meta-classifiers in Table 2) are used to predict 1-, 3-, and 5-year recurrence. Pathologic stage subgroup (rank 3 in Fig 6) and pT stage (rank 2 in Fig 6) are used to create the univariate models and these have lower precision and F1 scores than the meta-classifiers. Furthermore, the single predictor models tend to suffer from imbalance between sensitivity and specificity. In contrast, all recurrence meta-classifiers have sensitivities and specificities Single predictor (pT stage and pathologic stage subgroup classifiers) and multiple predictor (Meta-classifier) models for predicting 1-, 3-, 5-year recurrence and survival after cystectomy. The performance of all models for a given year is ranked per F1 scores (2 � precision � recall/(precision+recall)) as well as mean sensitivity, specificity, and precision on test sets from a 10-fold cross validation. https://doi.org/10.1371/journal.pone.0210976.t002 Machine learning models for predicting bladder cancer recurrence and survival over 70%. F1 scores improve with year perhaps due to a more even number of positive and negative recurrence cases in the corresponding datasets.

Predicting post-cystectomy survival
Like the recurrence predictions, the meta-classifiers outperform the univariate models in predicting survival (Table 2), however the disparity is greater as the meta-classifiers have considerably higher performance metrics for all year predictions. Additionally, and unlike the recurrence models, the survival meta-classifiers have comparable precision and probability of detection, except for the 1-year survival models. The combination of high precision and sensitivity leads to significantly higher F1 scores for the 3-and 5-year survival meta-classifiers.

Discussion
Although recurrence and OS are highly associated, preoperative and operative measurements generally do not relate equally to recurrence and OS, and the two outcomes should be assessed separately. The primary predictors of long-term outcomes are pathologic stage and its subgrouping into localized or metastatic conditions. However, the machine learning pipeline developed here can leverage less powerful predictors to improve accuracy of long term predictions. The benefit of having low MI between variables means that each variable offers unique information however the drawback is that each patient needs to be described by many variables and thus the prediction task becomes a higher dimensional problem, for which lack of data can greatly limit predictions of long-term outcomes. Clinical T stage offers a lower resolution signal than the true pathologic T stage, and this loss of information can be particularly impactful in cases where there is an underestimation of disease severity prior to surgery [19]. The sensitivity and specificity of all the survival meta-classifiers, and the 1-year recurrence meta-classifiers are considerably higher than 70%. Recurrence meta-classifiers are less accurate, perhaps, because of undetected metastatic disease at the time of cystectomy. 1-year metaclassifier predictions for both outcomes offer a better combination of sensitivity and specificity than the 3-and 5-year meta-classifiers. However, the later year models may also be used in the clinical setting to differentiate lower-and higher-risk patients due to higher precision scores.
In current clinical practice, post-radical cystectomy prognostication in the individual patient is informed by the best-evidence found in the literature [10,20] which reflect probable outcomes in cohorts not the individual, or the prognostic nomogram which only calculates a 5-year outcome [9]. To improve upon this, we employ machine-learning algorithms to construct novel, patient prognostication models for survival and recurrence. Presently the international bladder cancer nomogram has proven to be a validation of multivariate approaches in predicting long term outcomes in the clinical setting [9,21], and the models developed here offer higher resolution predictions which can assist post-cystectomy treatment and screening decisions.
Despite several machine learning research efforts in predicting outcomes of cancer patients there is a low penetration of such models in clinical practice [8]. There are two specific hurdles before current models can be deployed in a clinical setting, first, because the performance reported here reflects the quality of data collected at one center and to ensure the generalizability of the models, data from other institutions should also be studied. Combining additional datasets, such as the international BCa dataset [9], may also improve the performance of the algorithms due to general sparsity and low frequency of certain combinations of predictors in the present data. Secondly, the recurrence and survival models use a total of 42-54 predictors, therefore the standardized collection of these parameters must be ensured before the machine learning models can be deployed successfully in a clinical setting. Once these issues are addressed, machine learning models such as the ones developed here can be trained using similar BCa radical cystectomy datasets to predict the recurrence and survival of an individual patient using the pre-, peri-, and post-operative predictors.
The accuracy of predicting cancer recurrence, which may depend on several evolutionary steps beyond cystectomy, can undoubtedly be improved by combining genomic and molecular data, and this would be fruitful direction to pursue. The quality of the dataset, coupled with machine learning models in the present work, offers a benchmark of the value of current preoperative and operative patient assessment standards with respect to forecasting long term outcomes during the most vulnerable 5-year timespan in BCa treatment post-cystectomy. Furthermore, due to the absence of widely recognized biomarkers for BCa [22], clinicopathological-based predictions of clinical outcomes as shown here set the standard for long-term personalized predictions in BCa. If deployed correctly, machine learning models can transform preoperative and operative data into accurate predictions and mitigate post-cystectomy burden of BCa.
Supporting information S1 File. Supporting information file. (PDF)