Comparison of multivariable methods for determining cutpoints of biomarkers in the context of survival time analyses: A simulation study with practical applications to survival data

Jan Porthun; Andreas Wienke

doi:10.1371/journal.pone.0338425

Abstract

Introduction

Survival time models are commonly employed in medicine and health sciences when analysing data. In these time-to-event analyses, it is often necessary to dichotomise variables that are metrically measured. One example could be to assign patients to different risk groups based on an occurring event. Besides univariable methods, multivariable approaches also exist for establishing cutpoints. Up to now, these multivariable approaches have hardly been investigated.

Methods

Using a Monte Carlo simulation study, we analysed eight multivariable methods from the literature to establish a cutpoint of a biomarker in the context of a semiparametric Cox regression model. The methods are the following: maximising the chi-square statistic, maximising the chi-square statistic with a split-sample approach, maximising the c-index using either the AddFor- or Genetic algorithm, maximising the concordance probability estimator (CPE) with the AddFor- or Genetic algorithm, and minimising the Akaike information criterion (AIC). We compared these methods with each other and in addition with the univariable log-rank minimum p-value approach. The simulation parameters analysed included the cutpoint’s distance from the biomarker’s median, sample size, total censoring, censoring before the end of the follow-up time (drop-outs), and the survival time distribution. Bias and empirical standard error were used as the primary performance measures. Furthermore, each method is illustrated using two practical data examples.

Results

All analysed methods are biased towards the biomarker’s median. Multivariable methods that estimate the cutpoint by using the lowest AIC or the maximum of the chi-square statistic have the lowest bias and empirical standard error in most simulation scenarios. The difference in bias between the methods based on maximising the c-index or maximising the CPE is minimal. Regardless of the distribution used (Weibull, Gompertz, or exponential), the respective bias shows similar dependencies on the simulation parameters.

Conclusions

Multivariable methods to estimate a biomarker’s cutpoint in survival time analyses using the Cox regression model may represent a good alternative to univariable methods. Our simulation has shown that methods maximising the chi-square statistic or minimising the AIC, respectively, perform better than the univariable method using the minimum p-value approach and outperform multivariable methods based on the c-index or CPE.

Citation: Porthun J, Wienke A (2025) Comparison of multivariable methods for determining cutpoints of biomarkers in the context of survival time analyses: A simulation study with practical applications to survival data. PLoS One 20(12): e0338425. https://doi.org/10.1371/journal.pone.0338425

Editor: Vidhura S. Tennekoon, Indiana University Indianapolis, UNITED STATES OF AMERICA

Received: July 7, 2025; Accepted: November 22, 2025; Published: December 5, 2025

Copyright: © 2025 Porthun, Wienke. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Survival models are regularly used in medical research as well as in research within the field of health sciences [1]. Their usage refers not only to analyses concerning the survival of patients over a certain period, but also to other time-to-event analyses. One example could be the time span of release from the hospital after surgery; another the time period between an intervention and the absence of symptoms afterwards. The semiparametric Cox proportional hazards model is often applied here [1,2]. It is not bound to specific distribution parameters, and estimated hazard ratios (HRs) allow reliable conclusions to be drawn, provided that the effect remains constant over time [3]. In relation to survival models, cutpoints of metrical biomarkers are regularly established [4–6]. Cutpoints are used, for instance, to divide patients into groups with different survival expectations depending on the levels of a specific biomarker [7]. This is utilised in clinical studies in which stratification is based on covariates [8,9]. In the literature, the terms ‘threshold’ or ‘changepoint’ are also used instead of cutpoint [10–12]. Cutpoints are also important in everyday clinical work. The interpretation of metric biomarkers is often done via established cutpoints. One of the most known illustrations is the category for which systolic and diastolic blood pressure is interpreted. According to the Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults of the American College of Cardiology and the American Heart Association, a systolic blood pressure <120 mmHg combined with a diastolic blood pressure <80 mmHg is considered ‘normal’ [13]. An exemplification of the formation of cutpoints in the context of survival time analyses is the study of Otten et al. with advanced-stage non-small cell lung cancer patients. They investigated the prognostic potential of immune checkpoint inhibitor clearance and determined the cutpoint for nivolumab clearance at ≥7.3 mL/h at first dose. Patients whose nivolumab clearance was below the cutpoint had a higher risk of death [14].

The dichotomisation or stratification of metric variables with the help of cutpoints is associated with a loss of information and power in the context of statistical analyses. Therefore, it should always be carefully weighed up, whether the formation of cutpoints is necessary [15,16].

As part of our research, we were preoccupied with the question of estimating cutpoints in the context of survival analyses using the Cox regression model. We focused on application areas, where data sets of up to 1000 patients are available. In these cases, AI-based analyses have been only helpful to a limited extent due to the small number of subjects. In the following, we always refer to the scenario, that a cutpoint of a biomarker is determined in the context of survival analysis using Cox regression for right-censored data.

Different methods are suggested for this purpose in the literature. One classic method uses the biomarker’s median or quartile boundaries as a cutpoint [17]. Another frequently applied method is the minimum p-value approach [18]. Two subgroups are formed at all potential cutpoints of the biomarker. Based on these two groups, a log-rank test is executed. The value of the biomarker, at which the p-value of the log-rank test is the smallest, is used as the cutpoint. This approach considers the observation time and status in addition to the distribution of the biomarker.

Studies published from 2003 to 2022 dealt with the formation of cutpoints in a multivariable setting within the framework of Cox regression. The researchers who published these studies suggested to consider not only the biomarker to be dichotomised but also other relevant covariates [19–22]. Thus, the determination of the cutpoint is carried out within the framework of a multivariable setting. Different procedures have been proposed for this. The authors of these multivariable methods argue that multivariable methods are superior to univariable methods that only consider the biomarker itself. They explain this by stating that a more precise estimate of the true cutpoint is expected. This is the case because the model includes other variables that contribute to the variability of the dependent variable [22]. All multivariable methods have in common that the biomarker, for which a cutpoint is to be determined, is included with other cofactors in a multivariable Cox regression model. Based on these Cox regression models, different parameters are used to determine and choose the cutpoint within the framework of the methods proposed in the literature. The different methods are listed below, having been added with abbreviations.

Method A) Maximising the chi-square statistic with a twofold cross-validation approach (max χ²)

Separate Cox regression models are calculated for all possibilities to dichotomise the biomarker under consideration. The cutpoint is used, for which the Cox regression model’s chi-square statistic assumes the largest value. This method is a modification of the approach described by Mazumdar et al. The corresponding p-values and hazard ratios are determined through a twofold cross-validation [22].

Method B) Maximising the chi-square statistic with a split-sample approach (max χ² split-sample)

This method also determines the cutpoint based on the maximum chi-square value. However, only half of the data set is used for this purpose. The p-values and hazard ratios are estimated with the other half of the dataset [22].

Method C) Maximising the c-index with the AddFor- or Genetic algorithm (c-index AddFor/ Genetic)

The cutpoint, for which the c-index takes the largest value, is chosen. Either the AddFor (method C1) or genetic algorithm (method C2) can be used [19].

Method D) Maximizing of the concordance probability estimator (CPE) with the AddFor- or Genetic algorithm (CPE AddFor/ Genetic)

The maximum of the CPE is the basis for determining the cutpoint. When using the CPE, either the AddFor (method D1) or genetic algorithm (method D2) can be used [19].

Method E) Minimum of the AIC (min AIC)

The cutpoint of the biomarker is the value, at which the AIC has its lowest value for the respective Cox regression model. The dichotomised variable can be included in the Cox regression model either as a covariate (method E1) or as a strata variable (method E2) [21].

The authors of the described methods (A, B, C1, C2, D1, D2, E1, and E2) have investigated them in the context of simulation studies [19,21,22]. Mazumdar et al. compared methods A and B with each other; additionally with the univariable minimum p-value approach. They conclude that the multivariable method A (max χ²) is more efficient in finding the cutpoint than the univariable method [22]. In the context of the multivariable methods, they prefer the cross-validation approach to estimate HRs and p-values and not method B (max χ² split-sample). In the latter only half of the dataset is used to determine the cutpoint. The authors generated the survival times utilising an exponential distribution. Barrio and colleagues compared the Genetic algorithm-based methods C2 and D2 [19]. They did not examine the AddFor algorithm. Based on their simulation study, these authors recommend using either the c-index or CPE for less than 50% censoring rates and the c-index for higher censoring rates. In their simulation, the survival times were generated using a Weibull distribution.

According to our research, no study has been published concerning the determination of a cutpoint for a metric biomarker within the context of survival analyses, wherein various variants of multivariable methods (A to E2) are compared to one another. Furthermore, in the existing simulation studies, the generated survival times are solely based on one specific distribution: either an exponential or a Weibull distribution.

Our main objective is to compare all methods identified in the literature for determining a cutpoint based on a metric biomarker within a Monte Carlo simulation study. These include the methods previously described (A, B, C1, C2, D1, D2, E1 and E2). We aim to ascertain whether it is feasible to identify a cutpoint using these methods. Additionally, we will compare these methods to the univariable approach using the minimum p-value method (method F). We also illustrate all methods on two real clinical data examples.

Materials and methods

The simulation study follows the recommendations by Morris et. al. [23].

Simulation design

The entire simulation was performed for a Cox proportional hazard model with right-censored data. The corresponding model is

(1)

with h₀(t) as baseline hazard function at time t and the predictor variables X and Z`. For the simulation, right-censored survival times T_S with max = 1 were generated with the help of the R-Package Simsurv [24]. According to Bender et al., Weibull and Gompertz distributions were also considered in addition to exponential distribution [25]. For the associated formulas to determine T_S, the detailed descriptions are referred to [25]. Four variables were included in the model (Formula 1). The continuous biomarker X, a binary covariate Z₁, and two continuous covariates Z₂ and Z₃ (See Fig 1). The associated betas are: β_X = ln(3), β_Z1 = ln(2), β_Z2 = ln(0.5) and β_Z3 = ln(2). A true cutpoint θ was used to dichotomise the biomarker X in a binary variable X_D with X_D = 0 if X ≤ θ and X_D = 1 if X > θ. The individual censoring times C have a uniform distribution U_[0,1]. The parameter pc_t was used to control the total number of censoring proportion. To distinguish between administrative censoring before the end of the follow-up time and at the end of the follow-up time, the parameter pc_f is used. The final follow-up times are T = min(T_S, C). An overview of the simulation parameters can be found in the flow diagram (Fig 1). The combination of all these parameters results in a total of 162 simulation scenarios. Fig 2 shows three examples of generated censored survival times (Fig 2).

Download:

Fig 1. Flow diagram of the simulation study and software used.

AIC; Akaike information criterion; cp, cutpoint; CPE, concordance probability estimator; n_obs, sample size; pc_t, total censoring; pc_f, censoring before end of follow-up time in percent of total censoring (pc_t); n_sim, repetitions; θ, true cutpoint.

https://doi.org/10.1371/journal.pone.0338425.g001

Download:

Fig 2. Examples of Kaplan-Meier curves with 95% CI for the three types of distributions.

Gompertz distribution with a = 0.003, b = 0.098; Weibull distribution with γ = 1.5, λ = 0.1; Exponential distribution with scale parameter = 0.2; CI, Confidence interval; n_obs, sample size, pc_t, total censoring; pc_f, censoring before end of follow-up time in percent of total censoring (pc_t); θ, true cutpoint.

https://doi.org/10.1371/journal.pone.0338425.g002

The generated datasets contain the variables follow-up time, event (no = 0, yes = 1), the biomarker X in its original continuous form, and the covariates Z₁, Z_2, and Z₃. All datasets are employed to estimate the true cutpoint θ of the biomarker X using the multivariable methods mentioned above (A, B, C1, C2, D1, D2, E1, and E2) as well as the univariable minimum p-value approach (method F) (see Table 1). The estimated cutpoints are referred to as , , , , , , , and – following the methods’ names and introduced above (see Table 1 and Fig 1).

Download:

Table 1. Overview and brief description of the simulation methods used.

https://doi.org/10.1371/journal.pone.0338425.t001

The respective cutpoints for the methods C1, C2, D1, and D2 are estimated using the R-package CatPredi [26]. For the calculation of the cutpoint of the univariate method (minimum p-values approach), the R package maxstat, was used [27]. Cutpoints for methods A, B, E1, and E2 were estimated using the R package cutpoint [28]. The Cox regression models were calculated for all possible variants to dichotomise the biomarker. The covariates Z₁, Z₂, and Z₃ are part of the Cox regression models. and are estimated from the Cox regression model with the highest chi-square statistic. For method B, only half of the data set was included. In the context of methods E1 and E2, the corresponding cutpoints and were estimated from this Cox regression model with the lowest AIC. For method E1, X_D was a variable in the Cox regression model. For method E2, X_D was used as a strata variable. The performance metrics – derived from the simulations – are presented in both tabular and graphical formats. In addition to boxplots, we utilised a nested loop plot for our visual representation [29].

Performance measures

We assessed the bias, empirical standard error (EmpSE), mean squared error (MSE), and relative precision gain versus the method with the lowest EmpSE. The corresponding Monte Carlo Standard errors (MCSEs) were determined for all performance measures. The estimated performance measures are defined as follows:

For details on the estimates of the performance measures and their Monte Carlo Standard Error, see [23]. Our most important performance measures are the bias and the EmpSE. Barrio and colleagues, who investigated the methods C2 and D2 in their simulation study, reported standard deviations (SDs) ranging from 0.010 to 0.096 for the means of their cutpoint estimates [19]. Following the practical example of Morris et al. [23], we also have decided that the Monte Carlo SE of the bias should be lower than 0.005. Applying the formula in the same way as Morris et al. for calculating n_sim and using the maximum SD of Barrio et al. (SD = 0.096), we get a required number of simulations n_sim of 369. As we do not know the SDs for the other methods, we have opted for a conservative approach and set n_sim = 500.

Results

In the 81,000 datasets (500 simulations x 162 scenarios), a maximum of 60 values for each method is missing for . In these cases, no cutpoints were found using the R package CatPredi. The missing ones are distributed among the methods as followed: C1: 19, C2: 48, D1: 18, D2: 60, meaning 0.022 to 0.074 percent per method. Methods A and E1 yield similar results concerning the performance measures, except for minimal differences (Tables 2 and 3). Therefore, the results for method E1 are presented in the following, but not additionally the results for method A.

Download:

Table 2. Performance measures of all simulation scenarios with their Monte Carlo standard error in parentheses.

https://doi.org/10.1371/journal.pone.0338425.t002

Download:

Table 3. Bias for the Weibull distribution.

https://doi.org/10.1371/journal.pone.0338425.t003

Bias

All methods – according to the true cutpoints θ – apart from one exception, have a bias with a negative sign (Fig 3). This corresponds to a tendency towards the median of biomarker X. Method E1 shows the lowest bias in most scenarios, regardless of the true cutpoint. All methods have a stronger bias, depending on how far the true cutpoint is from the median (Table 2 and Fig 5). This is mostly distinctive in method E2. The AddFor algorithm and the Genetic algorithm each have a different bias for the methods C1, C2, and D1, D2. However, neither of these algorithms is associated with lower bias in general. (Tables 2 and 4). Table 3 shows the bias values separated by the values for θ, n_obs, and pc_t (Table 3). In three scenarios, method B (max χ² split sample) has a lower bias than method E1. In one scenario, it is method D1 (CPE AddFor) (Table 3). Regardless of the distribution used (Weibull, Gompertz, or exponential), there are similar dependencies of the respective bias on the simulation parameters (Figs 4 and 5, Table 4). For all methods, the bias increases, the lower the sample size n_obs and the greater the total censoring rate pc_t is (Figs 4 and 5). There is also a larger bias, if n_obs = 250 or n_obs = 500 and simultaneously pc_t and pc_f are 0.8 and 0.25 (Fig 4 and Table 4). For censoring before the end of follow-up time in percent of total censoring (pc_f), there are minor effects on the bias depending on the parameters used for pc_f in the simulation (Fig 5). If the methods are sorted according to the lowest to highest bias, the order for θ = 0.2 is: E1, B, C2, F, E2, C1, D1, and D2 (Table 2). In the case of θ = 1.2, the order is different: E1, F, B, C2, D1, C1, D2, and E2. Table 4 shows the bias for all simulation parameters for method E1 (Table 4). The lowest absolute bias for method E1 is 0.0001, and the largest is found at n_obs = 250, pc_t = 0.8, pc_f = 0.25 with 0.1164. Table 4 also shows that for method E1, there are no particularly low or high bias values in any of the distributions used (Table 4).

Download:

Table 4. Bias for the method E1 (min AIC).

https://doi.org/10.1371/journal.pone.0338425.t004

Download:

Fig 3. Bias shown by method categorised according to the three true cutpoints θ.

Boxplot without outliers. The estimates shown are the means of the bias, categorised after the three true cutpoints θ used in the simulation (0.2, 0.7, 1.2). The smallest values in each group are framed. Method A is not shown because the results are identical to method E1.

https://doi.org/10.1371/journal.pone.0338425.g003

Download:

Fig 4. Hybrid nested loop plot considering all 162 simulation scenarios for four simulation methods.

A combination of trellis plot and nested loop plot showing six scenarios per layer: pc_t, = total censoring in percent (0.2, 0.5, 0.8); pc_f, = censoring before end of follow-up time in percent of total censoring (0.25, 0.50).

https://doi.org/10.1371/journal.pone.0338425.g004

Download:

Fig 5. Means of bias for four simulation methods depending on simulation parameters. θ, rue cutpoint (0.2, 0.7, 1.2); n_obs, sample size (250, 500, 750); pc_t, total censoring in percent (0.2, 0.5, 0.8); pc_f, censoring before end of follow-up time in percent of total censoring (0.25, 0.50).

https://doi.org/10.1371/journal.pone.0338425.g005

Empirical standard error (EmpSE) and mean squared error (MSE)

In an evaluation, in which only the true cutpoint θ = 0.2 or θ = 1.2 is distinguished, method E1 has the lowest EmpSE and MSE for both θ. The largest EmpSE for θ = 0.2 is obtained using the E2 method. At a θ = 1.2, methods D1 and D2 provide the largest EmpSE (Table 2). The largest MSE, regardless of the cutpoint, occurs using method E2. Sorting the methods used from lowest to largest EmpSE, for θ = 0.2, the results are: E1, C2, B, F, C1, D2, D1, and E2. For θ = 1.2, as for the bias, the order is also different for the EmpSE: E1, F, E2, B, C2, C1 and D1, D2. Therefore, the relative precision gain with method E1 as a reference shows, that method E2 for θ = 0.2, along with methods D1 and D2 for θ = 1.2, each have the greatest precision loss compared to the reference (E1). All methods indicate that the greater the deviation of the true cutpoint θ from the median, the larger the EmpSE becomes.

Illustrations and applications with clinical examples of data

We also illustrate all methods on two freely available clinical data examples, which are available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The first dataset consists of 312 patients with primary biliary cirrhosis (PBC) stages I to IV, all of whom took part in a study conducted at the Mayo Clinic [30,31]. The data was collected prospectively to evaluate the effectiveness of D-penicillamine in treating primary biliary cirrhosis. A liver biopsy was conducted initially to assess the histological stage. Out of the initial 312 patients, 125 died, resulting in a median trial duration of 39 months. Furthermore, 27 patients were either lost to follow-up or had undergone liver transplants. The remaining 160 patients were still alive and being monitored. According to Dickson and colleagues, we utilised relevant baseline variables for the prognosis of survival time [32]. The variables comprise: serum albumin (mg/dl), total serum bilirubin (mg/dl), age (years), and oedema. Prothrombin time was not included due to the significant variability inherent in the laboratory measurements for this parameter [33]. The cutpoints were determined for the biomarker serum albumin. Serum bilirubin, age, and oedema were used as covariates in the multivariable Cox regression model. The follow-up time was measured in months, and the status was indicated as deceased or alive. The distribution of serum albumin is illustrated in figure 6 (Fig 6). This figure further indicates that low albumin levels correlate with an increased risk in the studied population.

Download:

Fig 6. Combined log relative hazard and density plots with the estimated cutpoints as vertical lines.

The vertical, numbered lines indicate the estimated cutpoints. Red lines represent the median. The lower part of the figure shows the cutpoints for albumin and initial heart rate for each method. bpm, beats per minute.

https://doi.org/10.1371/journal.pone.0338425.g006

The second freely available dataset, from the R package smoothHR referred to as WHAS500, contains data from 500 patients who participated in the Worcester Heart Attack Study [34]. The Worcester Heart Attack Study aimed to identify factors related to trends of overall survival after hospital admission for acute myocardial infarction over time. From this dataset, we used the variables Initial Heart Rate (HR) in beats per minute (bpm), age at hospital admission in years (age), sex, body mass index (BMI) in kg/m², follow-up time (time), and vital status at last follow-up (status). Cutpoints were estimated for the biomarker initial heart rate. The mean of HR is 87, and the median is 85. For distribution of HR see Fig 6. Covariates in the multivariable estimation model are age, gender, and BMI.

Results of the cutpoint estimations

Cutpoints for serum albumin in the first dataset and for Initial Heart Rate in the WHAS500 dataset could be estimated with all multivariable methods (B, C1, C2, D1, D2, E1, E2) as well as with the univariable approach, method F (Fig 6).

The results for the CPE-based methods (D1 and D2) are identical (≤ 3.55 gm/dl for serum albumin and ≤ 76 bpm for HR). For initial heart rate, the cutpoints for methods C1, C2 and D1, D2 are equal. Methods E1 and E2 provide the same cutpoint for serum albumin, but different cutpoints for heart rate. Except for the cutpoint for albumin according to method C2, all cutpoints fall clearly within biomarker ranges where the relative hazard does not remain constant.

Discussion

A review of the literature was conducted prior to our simulation study. Through this, we aimed to identify publications on methods for determining cutpoints within the multivariate Cox regression framework. Four studies were identified, covering a total of eight different methods. These methods were chosen, since they were the only ones to be found in the literature.

Under simulation, there were only 0.022 to 0.074 percent missing values for some methods, so we could compare all the results without any restrictions. We couldn’t determine, however, why a few scenarios were aborted of the R-package CatPredi. All procedures that have been reviewed are biased. In most simulated scenarios, all methods tend towards the median (Tables 2–4, Figs 4 and 5). In our simulation, it has been shown that methods A (max χ²) and E1 (min AIC), which lead to the same results, have both the lowest bias and the lowest EmpSE. This confirms the study by Mazumdar et al., who found that the univariable method F is inferior to method A [22]. Method B (max χ² split sample) performs worse than the univariable method F, if the cutpoint θ is further away from the median. This is the case both in terms of the bias and the EmpSE. Method E2, in which the biomarker to be dichotomised is used as a strata variable in the Cox regression, performs well in this respect as the true cutpoint is close to the median. However, if the cutpoint is not the median, this method (E2) has the most considerable bias. For the E2 method, the BIAS in some scenarios is so large that the calculated cutpoint in our simulation deviates by more than one standard deviation from the theoretical cutpoint in some cases. If the calculated cutpoint is used to stratify patients or make treatment decisions, it could result in over 30% of patients being incorrectly classified or receiving the wrong treatment. The other multivariate methods exhibit a low BIAS on average (Fig 3). However, the BIAS of these methods becomes clinically relevant if the cutpoint is further from the mean, in cases where the sample size is smaller, and if the censoring rate is higher (Fig 4). Under these circumstances, the other methods also exhibit a BIAS that approaches or exceeds one standard deviation. This could lead to a relevant misclassification rate of patients in a clinical setting. As shown in the clinical examples, the cutpoints for both albumin and heart rate vary considerably depending on the chosen method. When calculated with method B, the cutpoint for heart rate is ≤ 100 bpm, which clearly differs from the cutpoint calculated with method E1 (HR ≤ 83 bpm) (Fig 6). This difference between methods B and E1 arises from the smaller sample size in the split-sample approach. This would mean that if patients were classified in a clinical study, according to method B, 125 patients would be placed in the above-cutpoint group, while 265 would be placed in that group using method E1. Therefore, using method B, which shows a higher BIAS in our simulation than method E1 (Fig 3), could lead to more than 100 of the 500 total patients being misclassified. By contrast, in both the albumin and heart rate examples, it makes little clinical difference whether the cutpoint is based on method D1 or D2, as the cutpoints are the same for both (HR ≤ 76 bpm and albumin ≤ 3.55 gm/dl).

When using method C2 (c-index Genetic), the bias and EmpSE are lower than with methods C1, D1, and D2. Barrio et al. pointed out, that the genetic algorithm is supposed to perform better than the AddFor algorithm when estimating two cutpoints of a biomarker [19]. Our simulation did not demonstrate that the Genetic algorithm generally outperforms the AddFor algorithm. Of these four methods, only if the true cutpoint is close to the median, method C2 is superior to the univariable approach F regarding bias and EmpSE. Nevertheless, the difference between these four methods (C1, C2, D1, and D2) is minimal. This can also be somewhat observed in the practical applications.

Table 2 shows that the MCSEs for the bias and EmpSE are so low, that this was no obstacle to meaningfully interpret our obtained values for the bias and EmpSE (Table 2). The EmpSE and MSE do not show the same patterns when compared to the performance measures used in each method. Morris et al. have pointed out that the MSE is more sensitive to n_obs than the EmpSE [23]. That is why we focused primarily on the Bias and EmpSE.

The initial selection of the regression coefficients (β _X, β _Z`) was based on simulation studies from Barrio et al. and Mazumdar et al., which used hazard ratios ranging from 0.5 to 4.0 [19,22]. Before running simulations, we performed pre-tests to ensure that the betas were large enough to generate significant omnibus tests of model coefficients in Cox regressions (p < 0.05), even at a sample size of n = 250. However, the specific beta values chosen may limit the generalisability of our findings, as clinical studies often report hazard ratios closer to 1 for certain biomarkers. With respect to all-cause mortality in cancer patients for example, Lena et al. reported hazard ratios of 0.91 for haemoglobin (per 1 g/dL) and 0.99 for the estimated glomerular filtration rate (GFR) (per 1 mL/min/1.73 m²) [35]. Since betas were held constant in our simulation, we cannot evaluate in what way the bias depends on the beta values.

We used only one cutpoint in our simulation. In contrast to Barrio et al., who investigated the methods C1, C2, D1, and D2 for the case of one, two, and three cutpoints per biomarker, we cannot make a statement for a scenario with several cutpoints [19]. Besides, the existence of one or even several cutpoints in real data sets may be unknown.

Utilising the R-package CatPredi, the computation of a cutpoint employing the C2 or D2 method requires approximately 44 or 87 seconds, if N = 500 and three covariates are included. In contrast to that, under identical conditions and hardware specifications (Windows 11, x64-based, CPU 2.40 GHz, 8 cores), the R-package cutpoint needs approximately 12 seconds for method E1. In particular, the methods C2 and D2 require longer computing times, as Barrio et al. have also pointed out [19]. This makes it difficult to carry out simulations with these methods, as they are associated with long computing times. However, the time component should hardly be relevant for determining individual cutpoints.

Our goal was not to determine the hazard ratios of the dichotomised biomarker X and associated p-values. Method B (max χ² split-sample), used in the simulation study, offers the possibility to determine hazard ratios and the corresponding p-values on the other half of the data. However, method B has a substantially higher bias and negative precision gain in comparison with methods A and E1. Therefore, we recommend, as well as [22], the use of the cross-validation approach, as the entire data set is used to determine the cutpoint.

As the determination of cutpoints in the practical examples and the simulation study has demonstrated, the cutpoints can vary strongly depending on the method used. Therefore, examining spline plots and consulting with a medical specialist can be beneficial. Nevertheless, if the research areas have not been established for long, medical specialists or physicians may have limited knowledge to contribute to the decision in favour of a cutpoint. However, the dichotomisation of a metric variable should only be carried out if it cannot be avoided, as it is associated with a significant loss of information and power [15,16,22].

Conclusions

Our simulation has shown that methods maximising the chi-square statistic or minimising the AIC, respectively, perform better than the univariable method using the minimum p-value approach and outperform methods based on the c-index or CPE. It remains unclear whether these two methods (A and E1) perform just as well when there are two or more cutpoints per biomarker. The method in which the dichotomised variable is used as a strata variable in the Cox regression model, is potentially associated with large bias.

References

1. Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in studies developing prognostic models in cancer: a review. BMC Med. 2010;8:20. pmid:20353578
- View Article
- PubMed/NCBI
- Google Scholar
2. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
- View Article
- Google Scholar
3. Bellavia A, Murphy SA. Cox Regression Model in Clinical Research: Overview of Key Properties and Interpretation. Circulation. 2025;151(6):337–9. pmid:39928715
- View Article
- PubMed/NCBI
- Google Scholar
4. Lo SN, Scolyer RA, Thompson JF. Long-Term Survival of Patients with Thin (T1) Cutaneous Melanomas: A Breslow Thickness Cut Point of 0.8 mm Separates Higher-Risk and Lower-Risk Tumors. Ann Surg Oncol. 2018;25(4):894–902.
- View Article
- Google Scholar
5. Ma SJ, Yu H, Khan M, Gill J, Santhosh S, Chatterjee U, et al. Evaluation of Optimal Threshold of Neutrophil-Lymphocyte Ratio and Its Association With Survival Outcomes Among Patients With Head and Neck Cancer. JAMA Netw Open. 2022;5(4):e227567. pmid:35426920
- View Article
- PubMed/NCBI
- Google Scholar
6. Singhal S, Powles R, Treleaven J, Kulkarni S, Sirohi B, Horton C, et al. A low CD34+ cell dose results in higher mortality and poorer survival after blood or marrow stem cell transplantation from HLA-identical siblings: should 2 x 10(6) CD34+ cells/kg be considered the minimum threshold?. Bone Marrow Transplant. 2000;26(5):489–96. pmid:11019837
- View Article
- PubMed/NCBI
- Google Scholar
7. Tsuruta H, Bax L. Polychotomization of continuous variables in regression models based on the overall C index. BMC Med Inform Decis Mak. 2006;6:41. pmid:17169154
- View Article
- PubMed/NCBI
- Google Scholar
8. Sima CS, Gönen M. Optimal Cutpoint Estimation With Censored Data. Journal of Statistical Theory and Practice. 2013;7(2):345–59.
- View Article
- Google Scholar
9. Altman DG, Royston P. What do we mean by validating a prognostic model?. Stat Med. 2000;19(4):453–73. pmid:10694730
- View Article
- PubMed/NCBI
- Google Scholar
10. Contal C, O’Quigley J. An application of changepoint methods in studying the effect of age on survival in breast cancer. Computational Statistics & Data Analysis. 1999;30(3):253–70.
- View Article
- Google Scholar
11. Fong Y, Di C, Huang Y, Gilbert PB. Model-robust inference for continuous threshold regression models. Biometrics. 2017;73(2):452–62. pmid:27858965
- View Article
- PubMed/NCBI
- Google Scholar
12. Yang G, Zhang B, Haft JW, Hawkins RB, Sturmer D, Likosky DS, et al. Modeling and estimating a threshold effect: An application to improving cardiac surgery practices. Stat Methods Med Res. 2023;32(12):2318–30. pmid:38031434
- View Article
- PubMed/NCBI
- Google Scholar
13. Whelton PK, Carey RM, Aronow WS, Casey DE Jr, Collins KJ, Dennison Himmelfarb C, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension. 2018;71(6):e13–115. pmid:29133356
- View Article
- PubMed/NCBI
- Google Scholar
14. Otten LS, Piet B, van den Haak D, Schouten RD, Schuurbiers M, Badrising SK, et al. Prognostic Value of Nivolumab Clearance in Non-Small Cell Lung Cancer Patients for Survival Early in Treatment. Clin Pharmacokinet. 2023;62(12):1749–54. pmid:37856040
- View Article
- PubMed/NCBI
- Google Scholar
15. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127–41. pmid:16217841
- View Article
- PubMed/NCBI
- Google Scholar
16. Altman DG, Lyman GH. Methodological challenges in the evaluation of prognostic factors in breast cancer. Breast Cancer Res Treat. 1998;52(1–3):289–303. pmid:10066088
- View Article
- PubMed/NCBI
- Google Scholar
17. Williams B, Mandrekar J, Mandrekar S, Cha S, Furth A. Finding optimal cutpoints for continuous covariates with binary and time-to-event outcomes. 2006.
18. Lausen B, Schumacher M. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Computational Statistics & Data Analysis. 1996;21(3):307–26.
- View Article
- Google Scholar
19. Barrio I, Rodríguez-Álvarez MX, Meira-Machado L, Esteban C, Arostegui I. Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies. SORT. 2017;1:73–92.
- View Article
- Google Scholar
20. Chen Y, Huang J, He X, Gao Y, Mahara G, Lin Z, et al. A novel approach to determine two optimal cut-points of a continuous predictor with a U-shaped relationship to hazard ratio in survival data: simulation and application. BMC Med Res Methodol. 2019;19(1):96. pmid:31072334
- View Article
- PubMed/NCBI
- Google Scholar
21. Govindarajulu U, Tarpey T. Optimal partitioning for the proportional hazards model. J Appl Stat. 2020;49(4):968–87. pmid:35707820
- View Article
- PubMed/NCBI
- Google Scholar
22. Mazumdar M, Smith A, Bacik J. Methods for categorizing a prognostic variable in a multivariable setting. Stat Med. 2003;22(4):559–71. pmid:12590414
- View Article
- PubMed/NCBI
- Google Scholar
23. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102. pmid:30652356
- View Article
- PubMed/NCBI
- Google Scholar
24. Brilleman S, Gasparini A. R package simsurv: Simulate Survival Data. CRAN. 2021.
25. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005;24(11):1713–23. pmid:15724232
- View Article
- PubMed/NCBI
- Google Scholar
26. Barrio I, Rodriguez-Alvarez MX, Arostegui I. R package CatPredi: Optimal Categorisation of Continuous Variables in Prediction Models. CRAN. 2022.
27. Hothorn T. R package maxstat: Maximally Selected Rank Statistics. CRAN. 2022.
28. Porthun J. R-package Cutpoint: Estimate cutpoints in the multivariable context of survival or time-to-event data. CRAN. 2025.
29. Rücker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol. 2014;14:129. pmid:25495636
- View Article
- PubMed/NCBI
- Google Scholar
30. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A. Cirrhosis Patient Survival Prediction. Hepatology. 1989.
- View Article
- Google Scholar
31. Fleming TR, Harrington DP. Counting Processes and Survival Analysis. John Wiley & Sons. 2013.
32. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology. 1989;10(1):1–7. pmid:2737595
- View Article
- PubMed/NCBI
- Google Scholar
33. Lisman T, van Leeuwen Y, Adelmeijer J, Pereboom ITA, Haagsma EB, van den Berg AP, et al. Interlaboratory variability in assessment of the model of end-stage liver disease score. Liver Int. 2008;28(10):1344–51. pmid:18482269
- View Article
- PubMed/NCBI
- Google Scholar
34. Meira-Machado L, Cadarso-Suárez C, Gude F, Araújo A. smoothHR: an R package for pointwise nonparametric estimation of hazard ratio curves of continuous predictors. Comput Math Methods Med. 2013;2013:745742. pmid:24454541
- View Article
- PubMed/NCBI
- Google Scholar
35. Lena A, Wilkenshoff U, Hadzibegovic S, Porthun J, Rösnick L, Fröhlich A-K, et al. Clinical and Prognostic Relevance of Cardiac Wasting in Patients With Advanced Cancer. J Am Coll Cardiol. 2023;81(16):1569–86. pmid:37076211
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in studies developing prognostic models in cancer: a review. BMC Med. 2010;8:20. pmid:20353578
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Bellavia A, Murphy SA. Cox Regression Model in Clinical Research: Overview of Key Properties and Interpretation. Circulation. 2025;151(6):337–9. pmid:39928715
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Lo SN, Scolyer RA, Thompson JF. Long-Term Survival of Patients with Thin (T1) Cutaneous Melanomas: A Breslow Thickness Cut Point of 0.8 mm Separates Higher-Risk and Lower-Risk Tumors. Ann Surg Oncol. 2018;25(4):894–902.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Ma SJ, Yu H, Khan M, Gill J, Santhosh S, Chatterjee U, et al. Evaluation of Optimal Threshold of Neutrophil-Lymphocyte Ratio and Its Association With Survival Outcomes Among Patients With Head and Neck Cancer. JAMA Netw Open. 2022;5(4):e227567. pmid:35426920
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref6] 6. Singhal S, Powles R, Treleaven J, Kulkarni S, Sirohi B, Horton C, et al. A low CD34+ cell dose results in higher mortality and poorer survival after blood or marrow stem cell transplantation from HLA-identical siblings: should 2 x 10(6) CD34+ cells/kg be considered the minimum threshold?. Bone Marrow Transplant. 2000;26(5):489–96. pmid:11019837
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref7] 7. Tsuruta H, Bax L. Polychotomization of continuous variables in regression models based on the overall C index. BMC Med Inform Decis Mak. 2006;6:41. pmid:17169154
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Sima CS, Gönen M. Optimal Cutpoint Estimation With Censored Data. Journal of Statistical Theory and Practice. 2013;7(2):345–59.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref9] 9. Altman DG, Royston P. What do we mean by validating a prognostic model?. Stat Med. 2000;19(4):453–73. pmid:10694730
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Contal C, O’Quigley J. An application of changepoint methods in studying the effect of age on survival in breast cancer. Computational Statistics & Data Analysis. 1999;30(3):253–70.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref11] 11. Fong Y, Di C, Huang Y, Gilbert PB. Model-robust inference for continuous threshold regression models. Biometrics. 2017;73(2):452–62. pmid:27858965
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref12] 12. Yang G, Zhang B, Haft JW, Hawkins RB, Sturmer D, Likosky DS, et al. Modeling and estimating a threshold effect: An application to improving cardiac surgery practices. Stat Methods Med Res. 2023;32(12):2318–30. pmid:38031434
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Whelton PK, Carey RM, Aronow WS, Casey DE Jr, Collins KJ, Dennison Himmelfarb C, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension. 2018;71(6):e13–115. pmid:29133356
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Otten LS, Piet B, van den Haak D, Schouten RD, Schuurbiers M, Badrising SK, et al. Prognostic Value of Nivolumab Clearance in Non-Small Cell Lung Cancer Patients for Survival Early in Treatment. Clin Pharmacokinet. 2023;62(12):1749–54. pmid:37856040
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref15] 15. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127–41. pmid:16217841
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref16] 16. Altman DG, Lyman GH. Methodological challenges in the evaluation of prognostic factors in breast cancer. Breast Cancer Res Treat. 1998;52(1–3):289–303. pmid:10066088
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref17] 17. Williams B, Mandrekar J, Mandrekar S, Cha S, Furth A. Finding optimal cutpoints for continuous covariates with binary and time-to-event outcomes. 2006.

[ref18] 18. Lausen B, Schumacher M. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Computational Statistics & Data Analysis. 1996;21(3):307–26.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref19] 19. Barrio I, Rodríguez-Álvarez MX, Meira-Machado L, Esteban C, Arostegui I. Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies. SORT. 2017;1:73–92.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref20] 20. Chen Y, Huang J, He X, Gao Y, Mahara G, Lin Z, et al. A novel approach to determine two optimal cut-points of a continuous predictor with a U-shaped relationship to hazard ratio in survival data: simulation and application. BMC Med Res Methodol. 2019;19(1):96. pmid:31072334
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref21] 21. Govindarajulu U, Tarpey T. Optimal partitioning for the proportional hazards model. J Appl Stat. 2020;49(4):968–87. pmid:35707820
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref22] 22. Mazumdar M, Smith A, Bacik J. Methods for categorizing a prognostic variable in a multivariable setting. Stat Med. 2003;22(4):559–71. pmid:12590414
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref23] 23. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102. pmid:30652356
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref24] 24. Brilleman S, Gasparini A. R package simsurv: Simulate Survival Data. CRAN. 2021.

[ref25] 25. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005;24(11):1713–23. pmid:15724232
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref26] 26. Barrio I, Rodriguez-Alvarez MX, Arostegui I. R package CatPredi: Optimal Categorisation of Continuous Variables in Prediction Models. CRAN. 2022.

[ref27] 27. Hothorn T. R package maxstat: Maximally Selected Rank Statistics. CRAN. 2022.

[ref28] 28. Porthun J. R-package Cutpoint: Estimate cutpoints in the multivariable context of survival or time-to-event data. CRAN. 2025.

[ref29] 29. Rücker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol. 2014;14:129. pmid:25495636
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref30] 30. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A. Cirrhosis Patient Survival Prediction. Hepatology. 1989.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref31] 31. Fleming TR, Harrington DP. Counting Processes and Survival Analysis. John Wiley & Sons. 2013.

[ref32] 32. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology. 1989;10(1):1–7. pmid:2737595
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref33] 33. Lisman T, van Leeuwen Y, Adelmeijer J, Pereboom ITA, Haagsma EB, van den Berg AP, et al. Interlaboratory variability in assessment of the model of end-stage liver disease score. Liver Int. 2008;28(10):1344–51. pmid:18482269
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref34] 34. Meira-Machado L, Cadarso-Suárez C, Gude F, Araújo A. smoothHR: an R package for pointwise nonparametric estimation of hazard ratio curves of continuous predictors. Comput Math Methods Med. 2013;2013:745742. pmid:24454541
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref35] 35. Lena A, Wilkenshoff U, Hadzibegovic S, Porthun J, Rösnick L, Fröhlich A-K, et al. Clinical and Prognostic Relevance of Cardiac Wasting in Patients With Advanced Cancer. J Am Coll Cardiol. 2023;81(16):1569–86. pmid:37076211
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

Figures

Abstract

Introduction

Methods

Results

Conclusions

Introduction

Method A) Maximising the chi-square statistic with a twofold cross-validation approach (max χ2)

Method B) Maximising the chi-square statistic with a split-sample approach (max χ2 split-sample)

Method C) Maximising the c-index with the AddFor- or Genetic algorithm (c-index AddFor/ Genetic)

Method D) Maximizing of the concordance probability estimator (CPE) with the AddFor- or Genetic algorithm (CPE AddFor/ Genetic)

Method E) Minimum of the AIC (min AIC)

Materials and methods

Simulation design

Performance measures

Results

Bias

Empirical standard error (EmpSE) and mean squared error (MSE)

Illustrations and applications with clinical examples of data

Results of the cutpoint estimations

Discussion

Conclusions

References

Method A) Maximising the chi-square statistic with a twofold cross-validation approach (max χ²)

Method B) Maximising the chi-square statistic with a split-sample approach (max χ² split-sample)