## Figures

## Abstract

The Youden Index is a summary measurement of the receiver operating characteristic (ROC) curve for the accuracy of a diagnostic test with ordinal or continuous endpoints. The bootstrap confidence interval based on the adjusted proportion estimate was shown to have satisfactory performance among the existing confidence intervals, including the parametric interval via the delta method. In this article, we propose two confidence intervals using the square-and-add limits based on the Wilson score method. We compare the two proposed intervals with the existing interval with extensive simulation studies. The new interval based on the empirical proportion estimate generally has better performance than that based on the adjusted proportion estimate. A real example from a clinical trial of prostate cancer is illustrated for the application of the new intervals.

**Citation: **Shan G (2015) Improved Confidence Intervals for the Youden Index. PLoS ONE 10(7):
e0127272.
https://doi.org/10.1371/journal.pone.0127272

**Editor: **Fabio Rapallo,
University of East Piedmont, ITALY

**Received: **December 19, 2014; **Accepted: **April 14, 2015; **Published: ** July 1, 2015

**Copyright: ** © 2015 Guogen Shan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

**Data Availability: **All relevant data are within the paper.

**Funding: **Guogen Shan's research is supported by a grant (5U54GM104944) from the National Institute of General Medical Sciences at the National Institutes of Health.

**Competing interests: ** The author has declared that no competing interests exist.

## Introduction

For a diagnostic test with ordinal or continuous endpoints, a receiver operating characteristic (ROC) curve has been widely used to measure the accuracy of the diagnostic test [1, 2]. The ROC curve is created by plotting the sensitivity vs 1 minus specificity for various cut points, and the x-axis of the ROC is 1–specificity. The cut point is used to determine the diagnostic results, e.g., positive or negative, diseased or healthy. The range of the cut point is generally from −∞ to +∞. It is of interest to find the optimal cut point to increase the accuracy of a diagnostic test [3].

The Youden Index (J) [4] is a well known measurement for the ROC curve to measure the clinical diagnostic ability of a test. It is defined as
where *c* is the cut point. Diagnostic tests with higher *J* values would be preferable. The Youden Index is an optimal trade-off between sensitivity and specificity with an equal weight being assigned to sensitivity and specificity. For given total sample sizes in the diseased group and the non-diseased group, the optimal cut point would lead the maximum number of subjects being correctly diagnosed. Although the theoretical range of the Youden Index is from -1 to 1, the practical range in use is often from 0 to 1 since negative values of the Youden Index do not have meaningful interpretation in practice. *J* = 1 represents a prefect diagnostic test and *J* = 0 indicates that the diagnostic test is not effective to determine the disease status. The Youden Index has been applied in many statistical and medical applications [3, 1].

Fluss et al. [5] were among the first to introduce nonparametric confidence intervals for the Youden Index. Specifically, the empirical distributions are used to estimate the Youden Index and its associated confidence interval. The coverage probability of the bootstrap based confidence intervals is generally less than the nominal level [3]. Later, Schisterman and Perkins [3] used the delta method [6] to improve the coverage probability of the confidence interval when the underlying distributions are normal or gamma. This parametric approach works well as compared to three bootstrap confidence intervals: the bootstrap percentile interval, the bias corrected and accelerated interval, and the asymptotic bootstrap interval based on bootstrap mean and variance. Very recently, Zhou and Qin [7] proposed two bootstrap intervals based on the adjusted estimate for a binomial proportion by Agresti and Coull [8] (referred to as the AC estimate). They showed that the bootstrap confidence intervals are comparable to the parametric interval via the delta method when the underlying distributions are correctly specified, and have better performance when the the distributions are misspecified. Among the two proposed intervals, the one based on the bootstrap mean and variance has better performance than the bootstrap percentile interval.

The variance of the estimated Youdex Index is estimated by parametric or nonparametric approaches in the existing methods for the confidence interval of the Youden Index. An alternative would be that one treats the parameter in the variance as an unknown quantity, and the confidence interval is then obtained by solving an equality. This method, called the variance profile method, is also known as the Wilson score method [9, 10, 11, 12]. In this article, we utilize the Wilson score method to construct the confidence interval of the Youden Index. The Youden Index can be rewritten as the difference between two independent proportions when the optimal cut point is determined. For each proportion, the Wilson score method will be used to compute the confidence interval of the proportion. The confidence interval of the Youden Index is then constructed by the square-and-add method [13]. The binomial proportion can be estimated by the empirical estimate or the adjusted estimate for the proportion. Therefore, we propose two new confidence intervals using the square-and-add limits based on the Wilson score method. Extensive Monte Carlo simulation studies are conducted for comparing the proposed intervals with the existing intervals.

The rest of this article is organized as follows. In Section 2, we briefly review the bootstrap confidence interval based on the adjusted proportion estimate (the AC estimate), and propose two new confidence intervals using the square-and-add limits based on the Wilson score method. We then conduct Monte Carlo simulation studies to compare the new and existing confidence intervals with regard to coverage probability and width in Section 3. An example from a clinical study on prostate cancer is illustrated to show the usage of the proposed confidence intervals at the end of Section 3. Section 4 is given to remarks.

## Confidence intervals

Suppose *X* and *Y* are diagnostic results for the patients from the non-diseased group and the diseased group, respectively. It is reasonable to assume the independence between *X* and *Y*. For a given cut point *c*, sensitivity and specificity of a diagnostic test are defined as
The Youden Index [4] is expressed as
The *J* is a measurement to assess the sensitivity and specificity simultaneously, and it is obtained by plugging in the optimal cut point *c** such that it maximizes the quantity *P*(*X* ≤ *c*)−*P*(*Y* < *c*).

Let *X*_{1}, *X*_{2}, ⋯, *X*_{m}, and *Y*_{1}, *Y*_{2}, ⋯, *Y*_{n} be the observations from the non-diseased group and diseased group, respectively. The Youden Index can be estimated as
(1)
where *I*(*D*) is an indicator function, with *I*(*D*) = 1 if *D* is true, 0 otherwise. In this article, we focus on the construction of two-sided confidence intervals for the Youdex Index *J*.

### 2.1 Bootstrap confidence interval

Zhou and Qin [7] proposed a bootstrap confidence interval for the Youden Index based on the AC estimate [8] for a binomial proportion. The Youden Index is estimated as
(2)
where *z*_{1−α/2} is the 1−*α*/2 percentile of a standard normal distribution. The quantities and are added in the numerator and denominator as compared to the estimate of *J* in Eq (1). When *α* = 0.05, is close to 4. This may be viewed as adding two successes and two failures in the study.

We denote the samples for the non-diseased group and the diseased group with **x** = (*x*_{1}, *x*_{2}, ⋯, *x*_{m}) and **y** = (*y*_{1}, *y*_{2}, ⋯, *y*_{n}), respectively. The bootstrap samples from each group are obtained to calculate the Youden Index estimate, . Let the bootstrap samples be **x*** and **y***, where **x*** are *m* samples from **x** and **y*** are *n* samples from **y** with replacement. The Youden Index estimate, , can be computed from the Eq (2) using the bootstrap samples **x*** and **y***. This resampling procedure is repeated B times to generate B Youden Index estimates, . The bootstrap mean and variance estimates of the Youden Index are calculated as

The corresponding Bootstrap confidence interval using the AC estimate (referred to as the BAC confidence interval) is (3) The BAC confidence interval is based on the AC estimate for the Youden Index. This confidence interval can be obtained by invoking the central limit theorem. The Youden Index can also be estimated for Eq (1) with an empirical estimate. The confidence interval construction would be similar to that based on the AC estimate, and the only difference is the estimate of the Youden Index from each resampling step. This confidence interval based on was studied by Schisterman and Perkins [3]. They showed that this bootstrap confidence interval is not as good as the intervals based on parametric approaches with regard to the coverage probability and width.

Later, Zhou and Qin [7] compared the bootstrap confidence interval based on the AC estimate for the Youden Index with the parametric confidence interval via the delta method [3]. In addition to the bootstrap confidence interval based on the AC estimate, Zhou and Qin [7] also considered the percentile bootstrap confidence interval based on the AC estimate for the Youden Index. They concluded the two bootstrap confidence intervals based on the AC estimate for the Youden Index are comparable to the parametric intervals when the distribution assumptions are met, and outperform the parametric intervals when the distributions are misspecified. The BAC interval is generally better than the percentile bootstrap interval. For this reason, the BAC interval is chosen for comparison in this article.

### 2.2 Two new confidence intervals

The existing confidence intervals are Wald-type confidence intervals. The variance is estimated by different methods, such as the delta method and the bootstrap method. Based on the existing literature, the coverage property is generally not satisfactory. In addition, only medium to large sample sizes are considered in existing literature.

We consider the square-and-add limits based on the Wilson score method [9], to construct the confidence interval for the Youden Index. When the optimal cut point, *c**, is determined, the Youden Index, *J*, can be expressed as the difference between two independent proportions
For simplicity, let *p*_{1} = *P*(*X* ≤ *c**) and *p*_{2} = *P*(*Y* < *c**). The Wilson confidence intervals for *p*_{1}, (*l*_{1}, *u*_{1}), are the roots of the following equality
It is easy to show that
and

Similarly, (*l*_{2}, *u*_{2}), the Wilson confidence intervals for *p*_{2}, are the roots of
It follows that
and
The confidence interval of *J* is calculated as [13]
where
The estimates and can be obtained by Eqs (1) and (2). Different estimates of and would lead to different confidence intervals for *p*_{1} and *p*_{2}, and further affect the final confidence interval estimates of the Youden Index. We refer to the confidence intervals using Eq (1) as the NP method, and Eq (2) as the NPAC method.

Unlike the Wald-type confidence interval, the parameter in the variance is considered as an unknown parameter in the Wilson score method. The Wilson confidence intervals are then obtained by finding the roots of two equations. This method may be able to improve the coverage probability of the confidence interval [14, 15]. This method has been successfully applied in many important statistical research areas [15, 11].

## Simulation study

We compare the performance of the bootstrap BAC interval, the NP interval, and the NPAC interval with regards to the coverage probability and width by using extensive Monte Carlo simulations. The nominal level of coverage is set as 95% (*α* = 0.05). Sixteen sample size combinations are considered: (*m*, *n*) = (20, 20), (20, 40), (20, 60), (40, 20), (40, 40), (40, 80), (60, 30), (60, 60), (60, 90), (80, 60), (80, 80), and (80, 120). We simulate 5000 samples from the non-diseased population and the diseased population. For each sample, *B* = 500 bootstrap samples are generated to calculate the bootstrap mean and variance in the BAC method. The proposed NP and NPAC intervals do not require bootstrap sampling, therefore, they are computationally easy as compared to the BAC interval.

We first compare the three methods with the same type of underlying distributions for the non-diseased group and the diseased group. The normal distribution is the most commonly used distribution in data analysis. The non-diseased group is assumed to follow a standard normal distribution, and the diseased group follows a normal distribution with parameters , where and 5. For each given variance in the diseased group, the associated *μ*_{d} values are computed in order to attain the pre-defined Youden Index values *J* = 0.4, 0.6, 0.8, and 0.9. There are a total of 16 combinations for the parameter settings considered in this comparison, and the detailed values of each parameter setting can be found in the first column of Table 1. Plots for density functions of the non-diseased group and the diseased group under these 16 parameter combinations, are presented in Fig 1. It can be seen that, for a given standard deviation, the overlapping area between the two distributions decreases as the difference in location between these two groups increases.

Tables 1–4 present the average coverage probabilities and average widths for the three methods when *m* = 20, 40, 60, and 80, respectively. The coverage probability is defined as the proportion of time that the computed confidence interval contains the pre-defined Youden Index value. Given the parameter setting and sample size in the non-diseased group, the average width decreases as the sample size increases in the diseased group for each method. The coverage probability for the BAC interval is often less than the nominal level. The coverage probability for the BAC interval performs very poorly when the sample sizes are small and the pre-defined *J* is large. The two newly proposed methods, the NP method and the NPAC method, generally have better coverage than the BAC method. The NPAC method does not perform as good as the NP method when the pre-defined *J* is large, such as 0.9. In such cases, the coverage probability based on the NPAC method could be as low as 57.3%. The NPAC method could have higher coverage probablity than the NP method when *J* is small, but the width of the interval would be much longer for the NPAC method. The NP method has satisfactory performance as compared to the other two methods.

The three methods are also compared under gamma distributions, Γ(*κ*, *θ*), with the probability density function
The expectation of the gamma distribution is *κ*/*θ*. The non-diseased group follows a gamma distribution with parameters *κ*_{X} = 1.5 and *θ*_{X} = 1, and the diseased group with *κ*_{Y} = (1.5, 2, 2.5, 3) and *θ*_{Y}, where *θ*_{Y} is calculated in order to achieve the Youdex Index *J* = 0.4, 0.6, 0.8, and 0.9, for each *κ*_{Y}. 16 total parameter settings are given in the first column of Table 5. Tables 5–8 show the coverage probabilities and average widths under the gamma distributions for *m* = 20, 40, 60, and 80, respectively. We observe similar results in normal distributions.

It is totally possible that the two groups do not follow the same type of distribution. For this reason, the three methods are compared with different distributions for the non-diseased group and the diseased group. The first case is that the non-diseased group follows a *t* distribution with *df* = 5, and the diseased group follows a normal distribution, , with *σ*_{Y} = 1. The mean values *μ*_{Y} in the diseased group are calculated as 1.08, 1.75, 2.74, and 3.62 in order to attain *J* = 0.4, 0.6, 0.8, and 0.9, respectively. The coverage probabilities and average widths with various sample sizes are presented in Table 9. The coverage probabilities of the BAC method are much smaller than the nominal level for small *J*, such as 0.4, even with medium to large sample sizes. The NPAC method is very conservative for small *J*, and the average widths are longer than the other two methods in such cases. In addition, the NPAC generally has shorter coverage when *J* is large, e.g., 0.9. Overall, the NP method is robust to the *J* values and it has much better overall coverage property than the other two methods.

We consider the case that the non-diseased group follows a normal distribution with *μ*_{X} = 1 and , and the diseased group follows a gamma distribution with *κ*_{Y} = 2, and *θ*_{Y} = 0.749, 0.481, 0.259, and 0.153. The *θ*_{Y} values are chosen to achieve *J* = 0.4, 0.6, 0.8, and 0.9, respectively. The results are shown in Table 10. The findings are similar to the previous case. Once again, the NP method is preferable due to the satisfactory coverage property.

The existing methods for confidence intervals all rely on the accuracy of the Youden Index estimate and its variance estimate. The variance can be estimated either from the data or bootstrap samples. The variance of Youden Index estimate is a function of the Youden Index estimate. Therefore, the accuracy of the variance estimate would be significantly affected by the estimate for Youden Index. It has been observed from studies by other researchers [7] that the coverage property is often not satisfied. The Wilson score method is an approach to improve the coverage probability by considering the parameter in the variance as an unknown parameter. The confidence interval from the Wilson score method has to be solved from an inequality. This method has been shown to improve the coverage probability in many statistical problems [15, 11].

### 3.1 An example

We consider an example from a prostate cancer study [16] to compare the three methods for constructing the confidence interval of the Youden Index. It is very important in clinical practice to determine whether the neighboring lymph nodes have been spared or not. The gold standard would be a surgery, which is accurate but very expensive and may be not efficient. In addition, the surgery may cause complications and increase some unnecessary risk for the patient. For this reason, it is of interest to predict the nodal involvement by the level of acid phosphatase in blood serum. 53 total patients were confirmed to have prostate cancer, 20 of them with nodal involvement and the remaining 33 without. For each patient, the level of acid phosphatase in blood serum was measured, and the data can be found in Le [16] as

Patients without nodal involvement (non-diseased group): 40, 40, 46, 47, 48, 48, 49, 49, 50, 50, 50, 50, 50, 52, 52, 55, 55, 56, 59, 62, 62, 63, 65, 66, 71, 75, 76, 78, 83, 95, 98, 102, 187.

Patients with nodal involvement (diseased group): 48, 49, 51, 56, 67, 67,67, 70, 70, 72, 76, 78, 81, 82, 82, 84, 89, 99, 126, 136.

We apply the three methods to the prostate cancer data. The 95% confidence intervals are (0.263, 0.653), (0.253, 0.698), and (0.179, 0.647) for the BAC method, the NP method, and the NPAC method, respectively. The width of the BAC interval is the shortest, followed by the NPAC interval, and the NP interval. As expected, the BAC method has the shortest width since the coverage probability of the BAC method is generally less than the nominal level. The program to conduce this confidence interval calculation is written in R, and is availalbe from the author’s website at: https://faculty.unlv.edu/gshan/.

## Conclusion

In this article, we propose two confidence intervals for the Youden Index using the square-and-add limits based on the Wilson score method, where the parameter in the variance is not estimated as in the existing Wald-type confidence intervals. By conducting extensive Monte Carlo exact simulation studies, we show that the two new intervals generally have better coverage probabilities than the BAC interval. The BAC could perform very poorly for small sample sizes and low *J* values. The performance of the proposed NPAC interval depends on the *J* value, while the NP interval is robust. The coverage of the NP interval is much closer to the nominal level than the other two intervals. In addition, the NP interval is easier to compute than the BAC interval since there is no bootstrap involved. The NP interval is recommended for use in practice.

## Acknowledgments

The author would like to thank the Editor and three referees for their valuable comments and suggestions that helped to improve this manuscript. Shan’s research is supported by a grant from the National Institute of General Medical Sciences 5U54GM104944 from the National Institutes of Health.

## Author Contributions

Conceived and designed the experiments: GS. Performed the experiments: GS. Analyzed the data: GS. Contributed reagents/materials/analysis tools: GS. Wrote the paper: GS.

## References

- 1.
Pepe MS (2004) The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford Statistical Science Series). Oxford University Press, 1 edition. URL http://www.worldcat.org/isbn/0198565828.
- 2.
Zhou XH, Obuchowski NA, McClish DK (2011) Statistical Methods in Diagnostic Medicine. Wiley, 2 edition. URL http://www.worldcat.org/isbn/0470183144.
- 3. Schisterman EF, Perkins N (2007) Confidence Intervals for the Youden Index and Corresponding Optimal Cut-Point. Communications in Statistics—Simulation and Computation 36: 549–563.
- 4. Youden WJ (1950) Index for rating diagnostic tests. Cancer 3: 32–35. pmid:15405679
- 5. Fluss R, Faraggi D, Reiser B (2005) Estimation of the Youden Index and its associated cutoff point. Bio-metrical journal 47: 458–472.
- 6.
Miller RG (1998) Survival Analysis. Wiley-Interscience, 2 edition. URL http://www.worldcat.org/isbn/0471255483.
- 7. Zhou H, Qin G (2012) New nonparametric confidence intervals for the Youden index. Journal of biopharmaceutical statistics 22: 1244–1257. pmid:23075020
- 8. Agresti A, Coull BA (1998) Approximate is Better than Exact for Interval Estimation of Binomial Proportions. The American Statistician 52: 119–126.
- 9. Wilson EB (1927) Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association 22: 209–212.
- 10.
Newcombe RG (2012) Confidence Intervals for Proportions and Related Measures of E ect Size (Chapman & Hall/CRC Biostatistics Series). CRC Press. URL http://www.worldcat.org/isbn/1439812780.
- 11.
Shan G (2014) A better confidence interval for the sensitivity at a fixed level of specificity for diagnostic tests with continuous endpoints. Statistical methods in medical research.
- 12. Shan G, Wang W (2013) ExactCIdiff: An R Package for Computing Exact Confidence Intervals for the Difference of Two Proportions. The R Journal 5: 62–71.
- 13. Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Statist Med 17: 873–890.
- 14.
Bickel PJ, Doksum KA (1977) Mathematical Statistics. Holden-Day, Inc.
- 15. Lee JJ, Tu ZN (1994) A Better Confidence Interval for Kappa on Measuring Agreement between Two Raters with Binary Outcomes. Journal of Computational and Graphical Statistics 3: 301–321.
- 16. Le CT (2006) A solution for the most basic optimization problem associated with an ROC curve. Statistical methods in medical research 15: 571–584. pmid:17260924