Predictors of Individual Response to Placebo or Tadalafil 5mg among Men with Lower Urinary Tract Symptoms Secondary to Benign Prostatic Hyperplasia: An Integrated Clinical Data Mining Analysis

Background A significant percentage of patients with lower urinary tract symptoms (LUTS) secondary to benign prostatic hyperplasia (BPH) achieve clinically meaningful improvement when receiving placebo or tadalafil 5mg once daily. However, individual patient characteristics associated with treatment response are unknown. Methods This integrated clinical data mining analysis was designed to identify factors associated with a clinically meaningful response to placebo or tadalafil 5mg once daily in an individual patient with LUTS-BPH. Analyses were performed on pooled data from four randomized, placebo-controlled, double-blind, clinical studies, including about 1,500 patients, from which 107 baseline characteristics were selected and 8 response criteria. The split set evaluation method (1,000 repeats) was used to estimate prediction accuracy, with the database randomly split into training and test subsets. Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM) and Random Forest (RF) models were then generated on the training subset and used to predict response in the test subset. Prediction models were generated for placebo and tadalafil 5mg once daily Receiver Operating Curve (ROC) analysis was used to select optimal prediction models lying on the ROC surface. Findings International Prostate Symptom Score (IPSS) baseline group (mild/moderate vs. severe) for active treatment and placebo achieved the highest combined sensitivity and specificity of 70% and ~50% for all analyses, respectively. This was below the sensitivity and specificity threshold of 80% that would enable reliable allocation of an individual patient to either the responder or non-responder group Conclusions This extensive clinical data mining study in LUTS-BPH did not identify baseline clinical or demographic characteristics that were sufficiently predictive of an individual patient response to placebo or once daily tadalafil 5mg. However, the study reaffirms the efficacy of tadalalfil 5mg once daily in the treatment of LUTS-BPH in the majority of patients and the importance of evaluating individual patient need in selecting the most appropriate treatment.


Introduction
Lower urinary tract symptoms (LUTS) secondary to benign prostatic hyperplasia (BPH) are a common problem, affecting more than 50% of men aged 50 years and older [1].
Medical treatment has focused mainly on the use of α-blocking agents and 5-α reductase inhibitors, either alone or in combination, and aims to alleviate symptoms as well as alter the course of disease progression and prevent complications [2]. Treatment options for LUTS-BPH have since increased with regulatory approval of tadalafil 5mg once daily, a longacting phosphodiesterase type 5 (PDE-5) inhibitor, initially in the US in 2011 and subsequently in the EU and other major territories in 2012 [3]. Treatment of LUTS-BPH, either alone or with coexisting erectile dysfunction (ED), with PDE-5 inhibitors and notably tadalafil 5mg, has recently been added to EU-wide treatment guidelines for non-neurogenic LUTS [4].
The efficacy of once daily tadalafil 5mg in LUTS-BPH has been demonstrated in four randomized controlled trials (RCTs) [5; 6; 7; 8]. At a lower dose of 2.5mg per day, tadalafil did not consistently alleviate symptoms of LUTS-BPH while higher doses of 10 and 20mg per day provided only minimal additional improvement over the 5mg once daily dose [5]. Assessment of treatment response (primary endpoint) was based primarily on the International Prostate Symptom Score (IPSS), a validated, self-administered, 1-month recall questionnaire that has good reliability for recall of obstructive and urinary problems and their global impact on quality of life (QoL). The IPSS is the most widely used instrument to assess the severity of BPHrelated LUTS-symptoms and gauge response to treatment [9; 10].
An integrated analysis of the four RCTs confirmed that tadalafil 5mg achieved significantly greater improvements in total IPSS score, IPSS voiding subscore, IPSS storage subscore and IPSS QoL Index score versus placebo [11]. A separate analysis of IPSS storage and voiding subscores, showed both were significantly improved in the active treatment arms compared with placebo (p<0.001) and that both storage and voiding subscores made a nearly linear contribution to total IPSS in a 4:6 ratio that was maintained from baseline to endpoint [12].
A further post-hoc integrated analysis of the data from the four RCTs showed that approximately two-thirds of tadalafil-treated patients achieved a clinically meaningful improvement (CMI) in LUTS-BPH symptoms, as defined by a total IPSS improvement of !3 points or !25% from randomization to endpoint at Week 12 [14]. Moreover, tadalafil 5mg once daily, demonstrated increasing benefit over placebo as the efficacy threshold was raised from !25% to a demanding !50% and !75% improvement in IPSS [14].
Being able to identify which individual patient is most likely to respond well to treatment with placebo or tadalafil, rather than just knowing its average benefit to a subgroup of patients, would be clinically useful and consistent with the growing trend towards more patient-tailored treatment [15]. Treatment directed at patients most likely to achieve CMI would help address the problem that for too many patients with LUTS-BPH, medical therapy achieves only a fairto-good improvement in symptoms [16].
In this integrated clinical data mining analysis, we set out to identify the factors associated with response to placebo or tadalafil 5mg once daily in an individual patient with LUTS-BPH. Implicit in a study of this nature was the need to carefully estimate the true prediction performance of a factor for unknown patients.

Study design
This clinical data mining analysis was based on the Knowledge Discovery in Databases (KDD) process and was set up to be consistent with the underlying principles of data mining [17].
Applied data mining algorithms were considered suitable only if a graphical presentation could be obtained that could be followed by practicing physicians. We therefore focused on models that were easily visualised or those expected to yield good predictive outcomes. Our aim was to produce an output that could be displayed on paper and used by clinicians and so we decided at the outset to adopt the simplest model first. This can be seen by the inclusion of single decision rules (SDRs). These models consider just one clinical variable at a time to predict one response variable, without any additions, and they perform well.
Rigorous care was taken to evaluate the prediction error for unknown data. Every effort was made to control for potential data mining biases (i.e. those induced by applying too flexible data mining algorithms or those stemming from the desire to achieve 100% accurate predictions). To this end we adhered to a pre-specified statistical analysis plan (SAP), which did not allow for removal of data points. We set out our experience first, wrote down our approach, and kept to it without deviation. We did not intend to optimize prediction performance further than what had been pre-specified. To do so would only bias results for models that are adapted and optimized for a specific combination for the training algorithm and evaluation method, and which are thereby unlikely to capture the clinical information that is predictive in clinical practice.
More extensive methodological details not covered here are provided in Supporting Information.

Data sources and pre-processing
Data for this clinical data mining analysis were pooled from four, randomized, placebo-controlled clinical studies (NCT00384930, NCT00827242, NCT00855582, NCT00970632), all of which had a broadly similar design and enrolled patients with LUTS-BPH (Fig 1) [6; 7; 8; 16]. Common inclusion criteria for all four studies were age !45 years, LUTS-BPH duration of >6 months, total IPSS !13, and maximum urinary flow rate (Qmax) !4 to 15ml/s prior to the placebo lead-in period. Patients were excluded if PSA was >10ng/ml (or for PSA 4-10ng/ml, prostate malignancy had to be excluded), if post-void residual (PVR) urine volume was !300ml, or if they had used finasteride or dutasteride within 3 or 6 months (12 months in one study), respectively. Following screening and, if needed, a washout period for LUTS-BPH or ED medications, patients entered a 4 week placebo lead-in period. On completion, patients were randomized to study treatment with tadalafil 5mg once daily for 12 weeks. Minor differences between the studies included the following: one enrolled patients with BPH and concomitant ED [7]; one was a dose-finding study in which tadalafil was administered at doses of 2.5mg, 5mg, 10mg, 20mg once daily [16]; one included a tadalafil 2.5mg treatment arm [7]; and one included an additional tamsulosin 0.4mg treatment arm [8].
For the purposes of this clinical data mining analysis the study population (N = 1,499) consisted solely of subjects in the intent-to-treat (ITT) population who had been allocated to tadalafil 5mg once daily or placebo irrespective of an IPSS baseline assessment (Table 1). Data from the tadalafil 2.5mg, 10mg and 20mg once daily treatment groups did not form part of the data mining analysis, as these doses are not approved for the treatment of LUTS-BPH.
IPSS, IPSS QoL, and BPH Impact Index (BII) were assessed in each of the four studies at baseline (after the 4 week placebo lead-in period following randomization) and after 12 weeks treatment (primary endpoint). Patient Global Impression of Improvement (PGI-I) was evaluated at baseline and endpoint in three of the four studies so as to assess the impression of change in urinary symptoms [6; 7; 8].
Overall, 107 baseline characteristics were included in the clinical data mining analysis (Table 1). Baseline characteristics were categorized as key or supportive and selected on the basis of clinical input from study authors that was derived from knowledge of the published literature and clinical experience. All IPSS, IPSS QoL and BII baseline scores and their subscores were key characteristics, in addition to age (<65 or !65 years), previous LUTS therapy and a history of ED (Table 1). Key characteristics were expected to be predictive for a response to treatment. Two primary and 6 secondary definitions of response were used ( Table 2). The primary responder definitions were considered of equal importance and both were based on Minimal Clinically Important Differences ('overall' or 'severity MCID'), a concept validated using an anchor-based approach [19]. MCID is a threshold that represents a CMI in patients' healthrelated QoL as perceived by the patient [24]. 'Overall MCID' was defined as an improvement in IPSS total score of !3 for all patients (overall response) and 'severity MCID' defined as an improvement in IPSS total score of !2 for patients with mild-to-moderate LUTS and of 6 for those with severe LUTS [14; 19]. Secondary definitions of response were ranked in order of decreasing validation, although to the best of our knowledge they have not been subject to formal validation.

Implementation
Bias stemming from the desire to achieve 100% prediction accuracy was controlled by following the pre-specified SAP as described earlier, which was approved by all study authors and peer reviewed by Lilly data mining experts prior to programming. A non-clinical benchmark data mining dataset was used for program development. Results from the clinical dataset were produced after program peer review, which was carried out by an independent statistician. All modifications of the analysis after this run were reported as post-hoc. LR and DT models were selected as our data mining models as both can be presented visually and translated into easy decision rules or scores for practical use in medical applications [25; 26; 27] (S1 Technical Appendix). To avoid bias from an overly complex prediction model when a simple one would suffice [17], we compared all models against SDRs. These were implemented using the DT algorithm that was allowed to generate a single decision. In addition, SVM [28] (S2 Technical Appendix) and RF classifiers [29] were applied to obtain estimates for best prediction accuracy (S3 Technical Appendix).
The split set evaluation method was used to estimate prediction accuracy on unknown data. To this end, the database was randomly split into training (60% of the database) and test (40% of the database) subsets (Fig 2). Then LR, DT, SVM, RF and SDR models were generated on the training subset and used to predict the response of patients in the held-out test subset. Prediction models were generated for the tadalafil 5mg once daily and placebo groups. Prediction accuracy was measured by sensitivity (true positives) and specificity (true negatives), for which 95% confidence intervals were calculated. Sensitivity and specificity were calculated as follows: Table 2. Definition of treatment response on the IPSS, BII and PGI-I after 12 weeks treatment with tadalafil or placebo as used in the clinical data mining analysis.

Instruments
Primary objectives IPSS Reduction of !3 points in overall IPSS score [19; 20] Improvement of !2 points in patients with IPSS baseline score <20 and of !6 points in patients with baseline score !20 [19] Secondary Objectives

BII
Total score of <9 Reduction of >1 point [19] PGI-I Any improvement from baseline [23] BII, BPH Impact Index; IPSS, International Prostate Symptom Score; PGI-I, Patient Global Impression of Improvement; QoL, quality of life. In the equation, TP and TN denote the true positive and true negative predictions and FP and FN denote the false positive and false negative predictions on the test split.
Receiver Operating Curve (ROC) analysis was used to identify optimal prediction models lying on the ROC surface [30] (Fig 2). For ROC curve interpretation we adopted a systematic approach in which models on the ROC surface were first documented by their respective sensitivity and specificity, after which the model on the ROC surface that gave equal weight to false positive and false negative errors was discussed in detail. For the primary objectives, the resulting sensitivity and specificity was then compared to the Q1-Q3 range of 1,000 repeated runs of the 60:40 split set evaluation to ensure consistency (non-random data) (S4 Technical Appendix). Additionally, these results were compared with results obtained from 1,000 repeated runs with a randomly permuted response variable (random data). Finally, sensitivity and specificity findings were compared against an 80% cut-off, representing a performance threshold suitable for routine clinical use.
Post-hoc sensitivity analyses were conducted to determine whether or not excluding a minimised combination of characteristics affected primarily by missing data would allow the generation of improved models (S5 Technical Appendix). Again, emphasis was placed on those models being optimal when false positive and false negative errors were of equal importance (i.e. a sensitivity and specificity threshold of >80%).

Overall findings
Analyses were based on pooled data from four randomized, placebo-controlled trials that primarily compared the effect of 12 weeks treatment with tadalafil 5mg once daily with placebo on symptomatic LUTS improvement in men with LUTS-BPH. Baseline characteristics of patients in the two treatment groups were well balanced (Table 1). There was negligible heterogeneity across the four studies.
The complete ITT population was used in all our models. However, depending on the algorithm, there were exclusions due to missing response or incomplete data from the run. LR, SVM and RF implementation could not be used with incomplete patient records, whereas DTs were able to handle missing predictor, but not missing response variable information, by using 'surrogate splits', for which we allowed 5. Post-hoc sensitivity analysis was used to explore the influence of missing data on the primary result. The set of predictors was reduced such that a sufficient number of complete records were available to the logistic regression, SVM and RF training algorithms. In the end, all patients included in the ITT population were available for inclusion in the data mining algorithms and no patient was excluded for reasons other than technical ones.
Based on these data, the output from our clinical data mining analysis did not find sufficiently good predictors of treatment response to placebo or tadalafil. None of the 107 preselected baseline characteristics achieved a combined sensitivity and specificity of >80% that would enable reliable allocation of an individual patient to either the tadalafil responder or non-responder group.
As the detailed results presented below demonstrate, IPSS baseline (mild/moderate vs. severe group) for both placebo and tadalafil 5mg once daily was found several times on the ROC surface and generated the highest combined sensitivity and specificity of 70% and~50%, respectively, for all analyses.

Significance of outliers
Outliers were assessed in this clinical data mining study but were not removed for the reasons described earlier. The assessment of outliers led to relatively few observations. It is worth noting that 3 baseline characteristics had skewed distributions. These were maximum urinary flow rate (Q max ), body mass index (BMI), and frequency of alcohol intake, all of which had >23 outliers in the upper range of their respective scales. Full outlier results are given in the accompanying Supporting Information (S6 Technical Appendix).

Primary Objectives
In our ROC curve analyses, models on the ROC surface represented an optimal trade-off between prediction errors (false positive vs. false negative predictions). Here we describe results from the model in which we observed an equal trade-off between both errors as determined by ROC curve analysis. Only SDR models were obtained for the pre-specified analyses predicting 'severity MCID' and 'overall MCID' response. A reduction of !3 points in overall IPSS score, or improvement of !2 points in patients with IPSS baseline score <20 and of !6 points in patients with baseline score !20 were the primary objectives.
Prediction of 'severity MCID' response in the tadalafil 5mg once daily group produced SDR models on the ROC surface for IPSS severity group (mild/moderate vs. severe) and IPSS voiding subscore only ( Table 3). The model with equal importance for FP and FN error was based on IPSS severity group. These results (using this model) were supported by repeat evaluations, which lay within the Q1-Q3 ranges for sensitivity and specificity of 68-72% and 45-50%, respectively. Q1-Q3 ranges for random data were 34-66% for sensitivity, and as such did not overlap with the runs on non-random data, and 34-66% for specificity. For subjects in the mild/moderate group, this model predicted a positive 'severity MCID' response.
'Severity MCID' response in the placebo group was predicted by six SDR models lying on the ROC surface that included bioavailable testosterone, ED etiology, IPSS severity, cluster of lipid-lowering medications, antidepressants, and use of 5-α-reductase inhibitors (Table 3). Again, IPSS severity achieved the combination of best sensitivity and specificity when positive and negative prediction errors were of equal importance. The Q1-Q3 range for all evaluations was 71-74% for sensitivity and 39-44% for specificity, while random data yielded sensitivities of 32-65% and specificities of 36-68%. Again, there was no overlap with evaluations on nonrandom data, increasing confidence that the effect was not simply due to random effect. This model also predicted a positive 'severity MCID' response for subjects in the mild/moderate group.
SDR models predicting 'overall MCID' response in the tadalafil 5mg once daily group were based on ethnicity, IPSS severity, and IPSS voiding subscores (Table 3). Here, the IPSS voiding subscore SDR model achieved optimal predictions when false positive errors were assumed to have the same importance as false negative errors. Q1-Q3 ranges were 77-96% for sensitivity and 13-29% for specificity. For random data these were 8-89% for sensitivity, respectively, and 11-91% for specificity, respectively.
'Overall MCID' for the placebo group was best predicted by SDR models that included cluster of anti-diabetic drugs, IPSS severity, alcohol usage, and IPSS voiding subscore (Table 3). Giving equal importance to false positives and to false negatives, IPSS voiding scores obtained the best predictions. Subjects with an IPSS voiding subscore >5.5 were predicted to have a higher likelihood of 'overall MCID' response. Q1-Q3 ranges for this model in all evaluations were 93-95% for sensitivities and 19-23% for specificities. The corresponding results for random data were 10-88% for sensitivities and 13-90% for specificities.
The IPSS severity categories (mild/moderate vs. severe) based on a cut-off of 20 were part of the ROC surface regardless of MCID definition and regardless of treatment group (i.e. tadalafil 5mg once daily or placebo). IPSS voiding subscore was found on the ROC surface for 'overall MCID' prediction.

Secondary Objectives
Estimates for sensitivities and specificities for each of the secondary objectives for the two treatment groups are presented in Tables 4 and 5. The SDR models achieving optimal prediction performance when false positive predictions are given the same importance as false negative predictions are marked with a star ( Ã ) and are the results on which we have focused. A reduction of !1 point on the IPSS QoL question was the first secondary objective. SDR models found on the ROC surfaces included number of anti-hypertensive medications for the tadalafil 5mg once daily group, and ED etiology (mixed or psychogenic) for the placebo group to predict improvements. A reduction in the IPSS total score of 25% from baseline to 12 weeks was the next secondary objective, and SDR models on the ROC surface included presence of hypertension during treatment for the tadalafil group and PGI-S (<5) at baseline for the placebo group. Achieving an IPSS total score <12 points at 12 weeks was the third secondary objective. An IPSS score <12 was predicted using IPSS total score for the tadalafil 5mg once daily and placebo groups. Cut-off for response was selected as <16 for tadalafil 5mg once daily and placebo by SDR models on the ROC surface giving equal importance to false positive and false negative predictions.
A reduction to <9 points on the BII total score after 12 weeks treatment was the fourth secondary objective. IPSS severity (mild/moderate) was used to predict BII <9 after 12 weeks treatment for the placebo group, while the BII total score (<6.5) was used by the SDR predicting response/improvement in tadalafil-treated patients.
A reduction of >0.5 point on the BII scale was the fifth secondary objective. BII total score at baseline was used to predict any improvements in BII by the SDR models. The cut-offs employed were !1.5 and !2.5 for response in the tadalafil 5mg once daily and placebo groups, respectively.
The final secondary objective was any improvement on the PGI scale. SDR models lying on the ROC surface that gave equal importance to false positives and false negatives in predicting improvements were, % bioavailable testosterone (!35%) for the tadalafil 5mg once daily group and sex hormone binding globulin (SHBG) (<42nmol/l) for the placebo group, respectively.

Post-hoc Sensitivity Analysis
All pre-specified analyses returned only SDR models. LR, SVM, RF and DT approaches did not yield models because missing values, that included parameters that were either not measured or intended for collection, resulted in an insufficient number of complete patient records. Testosterone measurements were the key driver, responsible for 79% of incomplete records, while missing PSA assessments accounted for 70% of records, followed by frequency of alcohol intake and SHBG assessments (both missing in >30% of cases). Finally, PGI assessment (PGI-I was assessed in only 3 of the 4 studies), previous overactive bladder therapy, ED characteristics and assessment of Q max were missing for 20% to 30% of patients. Table 6 details sensitivities and specificities on held-out test data from non-SDR models lying on the ROC surface when testosterone, alcohol intake, Q max , SHBG, albumin, PGI-S and PSA were excluded. For 13 of these models, pre-selection via a t-test filter improved prediction performance (S7 Technical Appendix). In these cases the pre-selected variables are given in the last column of the table. Only 4 of the models were RF; not a single SVM was observed. Of the better performing models, sensitivity and specificity were best with respect to BII total score of <9. DTs for the tadalafil 5mg once daily group achieved a sensitivity of 77% (95% CI: 0.72, 0.82) and specificity of 62% (95% CI: 0.35, 0.85).

Discussion
Identifying predictors of response to drug therapy can be beneficial, especially where significant improvements in patient health-related QoL are sought, such as in LUTS-BPH where symptom relief is the primary goal of treatment for the majority of men. It also has benefits in an era where patients are encouraged to take an active role in treatment decisions alongside their physician.
The objective of this clinical data mining study was to identify prediction models and associated patient baseline characteristics that could be used in clinical practice to predict treatment response to tadalafil 5mg once daily among patients with a diagnosis of LUTS-BPH. To the best of our knowledge, this is the first clinical data mining analysis to use mathematical modelling in studies of patients with LUTS-BPH.
To meet this objective, we adopted a rigorous data mining approach involving commonly used models and evaluated their discriminative ability on held-out data using eight different measures of treatment response and 107 possible predictors. These were chosen from a large patient population enrolled in a series of almost identical, placebo-controlled, randomized studies of the same duration of randomized treatment and with similar inclusion/exclusion criteria. Results were backed up by repeated evaluations and comparison to non-informative data to control for bias.
As our results have demonstrated, we did not to obtain any sensitivities or specificities above an 80% threshold for the specified baseline characteristics. In other words, at this threshold there would be a 20% risk of an incorrect prediction, which we would argue is an acceptable basis on which to predict treatment response in a non-malignant condition in clinical practice. Thus, using our data from four clinical trials and modelling methods, no single predictive rule emerged from which a treatment algorithm could be developed to clinically guide the use of tadalafil 5mg once daily in patients with LUTS-BPH. Similarly, we found no characteristics that determined response to LUTS-BPH treatment when placebo is used. These findings applied to both primary and secondary objectives.
Across the 107 baseline characteristics, there was evidence that with respect to 'severity MCID', LUTS severity at baseline as measured by IPSS score (mild-moderate 20 vs. severe >20) had sensitivity and specificity levels that approached 70% and 50%, respectively. While this level of prediction is marginally better than random guessing, it is still too low for clinical use. However, IPSS continues to underpin assessments with respect to baseline symptom severity and monitoring symptom progression in cases of "watchful waiting" [34]. This may be due to the fact that during its validation, care was taken to generate a predictive questionnaire [10; 21]. Several analyses of pooled data from the four clinical trials of tadalafil versus placebo that were used in this clinical data mining study have shown that tadalafil significantly improves symptoms of LUTS-BPH, including small but significant improvements in Q max [35] with concomitant improvements in QoL [11; 36]. Subsequent analyses revealed improvement in both IPSS storage and voiding subscores [12], and that improvements in LUTS occurred irrespective of the presence of co-existing ED [37]. Thus, tadalafil has therapeutic benefit beyond its effects on ED in men with comorbid LUTS-BPH. These findings have been confirmed in a prospective, naturalistic observational study (TadaLutsEd), which closely mirrors routine clinical practice. In this non-selective study, 86% of men aged 50 years and older with LUTS-BPH saw an improvement in urinary symptoms following 6 weeks treatment with tadalafil 5mg once daily [38].
A subgroup analysis of the effects of tadalafil in various patient subgroups concluded that tadalafil improves LUTS-BPH symptoms, as measured by the IPSS, across all clinical subgroups that included LUTS severity (IPSS 20/>20) and previous use of α-blocking agents [13]. However, while this analysis looked at the various subgroups from a population perspective and, as such, evaluates improvement on average, our work crucially looks at it from the perspective of the physician and the individual patient (i.e. predicting the improvement on an individual basis). Both analyses are consistent in that efficacy occurred across all subgroups in the pooled analysis of data from the four clinical trials, while no reliable predictor of response was found in our analysis of the same trials on an individual patient basis.
Given that tadalafil provides early symptomatic relief [6] across a wide range of men with LUTS-BPH, including those with ED and other significant comorbidities, it is perhaps not surprising that we were unable to identify individual predictors of response to placebo or tadalafil 5mg once daily despite rigorous data mining. Many examples exist in the literature of predictors of response (or failure to respond) to drug therapy that include the use of drugs for LUTS-BPH. For example, large prostate volume and more severe symptoms at baseline have been identified as predictive factors for failure to respond to first-line medical therapy for LUTS-BPH [39]. Severity of symptoms is a strong influence on the extent to which patients judge treatment to give clinically meaningful improvement [19]-greater severity requires a proportionately greater improvement in symptom relief for patients to perceive the same degree of improvement as those with less severe disease [16]. A systematic review and metaanalysis of the use of PDE-5 inhibitors in LUTS-BPH suggested that younger men with lower BMI and severe urinary symptoms were the best candidates for PDE-5 inhibitor therapy [40], a finding we were unable to demonstrate and confirm in our analysis when examining patients treated with tadalafil or using placebo.
We did, however, identify some potential candidates for predicting treatment response. In addition to IPSS-related characteristics, we found that bioavailable testosterone, ED etiology, cluster of lipid-lowering medications, antidepressants and previous use of 5-α-reductase inhibitors may have potential as predictors for treatment response, especially in relation to 'severity MCID' response. Although substantial further work is needed to test these observations, there is some independent evidence to suggest that some, if not all, may be viable candidates. A recent study on the effects of tadalafil 5mg in men with hypogonadism and LUTS-BPH showed that while tadalafil was effective in both men with and without hypogonadism, IPSS storage subscore and IPSS-QoL was appreciably greater in men without hypogonadism than those with low testosterone levels [41]. There is also evidence to suggest that depression, anxiety and somatization may influence the clinical manifestation of LUTS-BPH and that anxious patients respond less well to treatment [42]. Conceivably, treatment with antidepressants could play in role in not only alleviating symptoms of depression and anxiety but also increasing the likelihood of response to specific LUTS-BPH therapy, something for which there is now published evidence [43].
In this study we chose to use established models for prediction, such as LRs, DTs, SVMs and RFs, rather than newer and more complex models. Surprisingly, none of them showed robustness with regards to handling missing data. This was unexpected, especially for DTs and RFs. Current data mining research is focused on developing models that achieve ever better prediction methods (on complete datasets), while simultaneously ignoring the problem of missing information that could be informative but could also completely compromise the method. In our modelling study, even DTs that have an integrated mechanism for dealing with missing data via surrogate splits, often failed to achieve better performance over models that made only a single decision. Only nine DTs were found on the ROC surface and of these, six required preselection of variables via a t-test filter. This clearly highlights the importance of this issue in clinical data mining research.
Despite its strengths, which include a pre-specified program of statistical analyses, this study has several limitations. Firstly, there was no subsequent independent study to validate our results. It is also possible that we may not have collected the "true factor" for predicting response, even though we examined 107 baseline characteristics. Better methods could have been employed to fine tune model parameters, especially for the SVM. For example, a triple split set evaluation, consisting of a training split for model generation, a validation split for model selection, and test split for hold-out evaluation could have been used to fine tune the selection of better generalising models. Evaluation of the training-test set bias did not, however, indicate a need for such additional complexity. An alternative pre-filtering step could have been used, adding clustering information as supportive predictor information, adding de-noising (whitening) data pre-processing steps, or using statistical bootstrapping. With respect to SVM, we employed the radial basis kernel but we could have used further kernels, such as random walk kernels, optimal matching kernels, or other kernel types or kernel machines for training this algorithm on our data. There were also limitations inherent in the trial inclusion and exclusion criteria; for example, patients with post-void residual urine volume >300ml were excluded and prostate volume was not directly assessed, although PSA can be used as a surrogate for prostate size.
In conclusion, none of the approaches presented here led to a prediction model with sufficient accuracy for the development of a tailoring algorithm for tadalafil 5mg once daily or placebo in LUTS-BPH. Thus, the ideal patient profile for which tadalafil should be prescribed with respect to baseline demographics, medical history, IPSS, International Index of Erectile Function (IIEF) score and Q max remains as yet unknown. Although the response to treatment in an individual patient cannot be reliably predicted from the characteristics and methods we have evaluated so far, this does not mean that patients with LUTS-BPH are not likely to respond on average to treatment with placebo or tadalafil 5mg once daily. Among the approximately two-thirds of men with LUTS-BPH who achieved CMI following treatment with tadalafil 5mg, over half achieved CMI after one week of therapy and over 70% within 4 weeks [44].
Although this study did not identify any pre-existing patient characteristics that might predict a treatment-response, tadalafil 5mg once daily has been shown to effectively impact LUTS-BPH across a range of patient subgroups. Therefore, the decision to treat an individual case of LUTS-BPH with tadalafil 5mg once daily continues to rest on medical assessment of the patient, consideration of contra-indications, presence of co-existing conditions, with the patient's expectations and preferences leading to mutual patient-physician agreement. This approach is entirely compatible with the current concept of shared decision making, in which the patient's voice should also be heard as an integral part of the treatment decision [45], especially for a condition in which part of the symptomatic improvement is a strong placebo response [16].