Improved Confidence Intervals for the Youden Index

The Youden Index is a summary measurement of the receiver operating characteristic (ROC) curve for the accuracy of a diagnostic test with ordinal or continuous endpoints. The bootstrap confidence interval based on the adjusted proportion estimate was shown to have satisfactory performance among the existing confidence intervals, including the parametric interval via the delta method. In this article, we propose two confidence intervals using the square-and-add limits based on the Wilson score method. We compare the two proposed intervals with the existing interval with extensive simulation studies. The new interval based on the empirical proportion estimate generally has better performance than that based on the adjusted proportion estimate. A real example from a clinical trial of prostate cancer is illustrated for the application of the new intervals.


Introduction
For a diagnostic test with ordinal or continuous endpoints, a receiver operating characteristic (ROC) curve has been widely used to measure the accuracy of the diagnostic test [1,2]. The ROC curve is created by plotting the sensitivity vs 1 minus specificity for various cut points, and the x-axis of the ROC is 1-specificity. The cut point is used to determine the diagnostic results, e.g., positive or negative, diseased or healthy. The range of the cut point is generally from −1 to +1. It is of interest to find the optimal cut point to increase the accuracy of a diagnostic test [3].
The Youden Index (J) [4] is a well known measurement for the ROC curve to measure the clinical diagnostic ability of a test. It is defined as where c is the cut point. Diagnostic tests with higher J values would be preferable. The Youden Index is an optimal trade-off between sensitivity and specificity with an equal weight being assigned to sensitivity and specificity. For given total sample sizes in the diseased group and the non-diseased group, the optimal cut point would lead the maximum number of subjects being correctly diagnosed. Although the theoretical range of the Youden Index is from -1 to 1, the practical range in use is often from 0 to 1 since negative values of the Youden Index do not and z 2 1Àa=2 are added in the numerator and denominator as compared to the estimate of J in Eq (1). When α = 0.05, z 2 1Àa=2 is close to 4. This may be viewed as adding two successes and two failures in the study.
We denote the samples for the non-diseased group and the diseased group with x = (x 1 , x 2 , Á Á Á, x m ) and y = (y 1 , y 2 , Á Á Á, y n ), respectively. The bootstrap samples from each group are obtained to calculate the Youden Index estimate,Ĵ AC . Let the bootstrap samples be x Ã and y Ã , where x Ã are m samples from x and y Ã are n samples from y with replacement. The Youden Index estimate,Ĵ Ã AC , can be computed from the Eq (2) using the bootstrap samples x Ã and y Ã . This resampling procedure is repeated B times to generate B Youden Index estimates, J Ã AC 1 ;Ĵ Ã AC 2 ; Á Á Á ;Ĵ Ã AC B . The bootstrap mean and variance estimates of the Youden Index are calculated asĴ The corresponding Bootstrap confidence interval using the AC estimate (referred to as the BAC confidence interval) is confidence interval can be obtained by invoking the central limit theorem. The Youden Index can also be estimated for Eq (1) with an empirical estimate. The confidence interval construction would be similar to that based on the AC estimate, and the only difference is the estimate of the Youden Index from each resampling step. This confidence interval based onĴ was studied by Schisterman and Perkins [3]. They showed that this bootstrap confidence interval is not as good as the intervals based on parametric approaches with regard to the coverage probability and width.
Later, Zhou and Qin [7] compared the bootstrap confidence interval based on the AC estimate for the Youden Index with the parametric confidence interval via the delta method [3]. In addition to the bootstrap confidence interval based on the AC estimate, Zhou and Qin [7] also considered the percentile bootstrap confidence interval based on the AC estimate for the Youden Index. They concluded the two bootstrap confidence intervals based on the AC estimate for the Youden Index are comparable to the parametric intervals when the distribution assumptions are met, and outperform the parametric intervals when the distributions are misspecified. The BAC interval is generally better than the percentile bootstrap interval. For this reason, the BAC interval is chosen for comparison in this article.
The confidence interval of J is calculated as [13] The estimatesp 1 andp 2 can be obtained by Eqs (1) and (2). Different estimates ofp 1 andp 2 would lead to different confidence intervals for p 1 and p 2 , and further affect the final confidence interval estimates of the Youden Index. We refer to the confidence intervals using Eq (1) as the NP method, and Eq (2) as the NPAC method. Unlike the Wald-type confidence interval, the parameter in the variance is considered as an unknown parameter in the Wilson score method. The Wilson confidence intervals are then obtained by finding the roots of two equations. This method may be able to improve the coverage probability of the confidence interval [14,15]. This method has been successfully applied in many important statistical research areas [15,11].
We first compare the three methods with the same type of underlying distributions for the non-diseased group and the diseased group. The normal distribution is the most commonly used distribution in data analysis. The non-diseased group is assumed to follow a standard normal distribution, and the diseased group follows a normal distribution with parameters Nðm d ; s 2 d Þ, where s 2 d ¼ 0:5; 1; 3; and 5. For each given variance s 2 d in the diseased group, the associated μ d values are computed in order to attain the pre-defined Youden Index values J = 0.4, 0.6, 0.8, and 0.9. There are a total of 16 combinations for the parameter settings considered in this comparison, and the detailed values of each parameter setting can be found in the first column of Table 1. Plots for density functions of the non-diseased group and the diseased group under these 16 parameter combinations, are presented in Fig 1. It can be seen that, for a given standard deviation, the overlapping area between the two distributions decreases as the difference in location between these two groups increases.
Tables 1-4 present the average coverage probabilities and average widths for the three methods when m = 20, 40, 60, and 80, respectively. The coverage probability is defined as the proportion of time that the computed confidence interval contains the pre-defined Youden Index value. Given the parameter setting and sample size in the non-diseased group, the average width decreases as the sample size increases in the diseased group for each method. The coverage probability for the BAC interval is often less than the nominal level. The coverage probability for the BAC interval performs very poorly when the sample sizes are small and the predefined J is large. The two newly proposed methods, the NP method and the NPAC method, generally have better coverage than the BAC method. The NPAC method does not perform as good as the NP method when the pre-defined J is large, such as 0.9. In such cases, the coverage probability based on the NPAC method could be as low as 57.3%. The NPAC method could have higher coverage probablity than the NP method when J is small, but the width of the interval would be much longer for the NPAC method. The NP method has satisfactory performance as compared to the other two methods.
The three methods are also compared under gamma distributions, Γ(κ, θ), with the probability density function Gðx; k; yÞ ¼ y k GðkÞ x kÀ1 e Àyx : The expectation of the gamma distribution is κ/θ. The non-diseased group follows a gamma distribution with parameters κ X = 1.5 and θ X = 1, and the diseased group with κ Y = (1.5, 2, 2.5, 3) and θ Y , where θ Y is calculated in order to achieve the Youdex Index J = 0.4, 0.6, 0.8, and 0.9, for each κ Y . 16 total parameter settings are given in the first column of Table 5. Tables 5-8 show the coverage probabilities and average widths under the gamma distributions for m = 20, 40, 60, and 80, respectively. We observe similar results in normal distributions. It is totally possible that the two groups do not follow the same type of distribution. For this reason, the three methods are compared with different distributions for the non-diseased group and the diseased group. The first case is that the non-diseased group follows a t distribution with df = 5, and the diseased group follows a normal distribution, Nðm Y ; s 2 Y Þ, with σ Y = 1. The mean values μ Y in the diseased group are calculated as 1.08, 1.75, 2.74, and 3.62 in order to attain J = 0.4, 0.6, 0.8, and 0.9, respectively. The coverage probabilities and average widths with various sample sizes are presented in Table 9. The coverage probabilities of the BAC method are much smaller than the nominal level for small J, such as 0.4, even with medium to large sample sizes. The NPAC method is very conservative for small J, and the average widths are longer than the other two methods in such cases. In addition, the NPAC generally has shorter coverage when J is large, e.g., 0.9. Overall, the NP method is robust to the J values and it has much better overall coverage property than the other two methods.
We consider the case that the non-diseased group follows a normal distribution with μ X = 1 and s 2 x ¼ 1, and the diseased group follows a gamma distribution with κ Y = 2, and θ Y = 0.749, 0.481, 0.259, and 0.153. The θ Y values are chosen to achieve J = 0.4, 0.6, 0.8, and 0.9, respectively. The results are shown in Table 10. The findings are similar to the previous case. Once again, the NP method is preferable due to the satisfactory coverage property. The existing methods for confidence intervals all rely on the accuracy of the Youden Index estimate and its variance estimate. The variance can be estimated either from the data or bootstrap samples. The variance of Youden Index estimate is a function of the Youden Index estimate. Therefore, the accuracy of the variance estimate would be significantly affected by the estimate for Youden Index. It has been observed from studies by other researchers [7] that the coverage property is often not satisfied. The Wilson score method is an approach to improve the coverage probability by considering the parameter in the variance as an unknown parameter. The confidence interval from the Wilson score method has to be solved from an inequality. This method has been shown to improve the coverage probability in many statistical problems [15,11].

An example
We consider an example from a prostate cancer study [16] to compare the three methods for constructing the confidence interval of the Youden Index. It is very important in clinical Density functions for a non-diseased group (a standard normal distribution) and a diseased group when the data is generated from two normal distributions as in Table 1.          Confidence Intervals for the Youden Index practice to determine whether the neighboring lymph nodes have been spared or not. The gold standard would be a surgery, which is accurate but very expensive and may be not efficient. In addition, the surgery may cause complications and increase some unnecessary risk for the patient. For this reason, it is of interest to predict the nodal involvement by the level of acid phosphatase in blood serum. 53 total patients were confirmed to have prostate cancer, 20 of them with nodal involvement and the remaining 33 without. For each patient, the level of acid phosphatase in blood serum was measured, and the data can be found in Le [16]  We apply the three methods to the prostate cancer data. The 95% confidence intervals are (0.263, 0.653), (0.253, 0.698), and (0.179, 0.647) for the BAC method, the NP method, and the NPAC method, respectively. The width of the BAC interval is the shortest, followed by the NPAC interval, and the NP interval. As expected, the BAC method has the shortest width since the coverage probability of the BAC method is generally less than the nominal level. The program to conduce this confidence interval calculation is written in R, and is availalbe from the author's website at: https://faculty.unlv.edu/gshan/.

Conclusion
In this article, we propose two confidence intervals for the Youden Index using the squareand-add limits based on the Wilson score method, where the parameter in the variance is not estimated as in the existing Wald-type confidence intervals. By conducting extensive Monte Carlo exact simulation studies, we show that the two new intervals generally have better coverage probabilities than the BAC interval. The BAC could perform very poorly for small sample sizes and low J values. The performance of the proposed NPAC interval depends on the J value, while the NP interval is robust. The coverage of the NP interval is much closer to the nominal level than the other two intervals. In addition, the NP interval is easier to compute than the BAC interval since there is no bootstrap involved. The NP interval is recommended for use in practice.