Risk-adjusted monitoring of surgical performance

We propose a nonparametric risk-adjusted cumulative sum chart to monitor surgical outcomes for patients with different risks of post-operative mortality due to risk factors that exist before the surgery. Using varying-coefficient logistic regression models, we accomplish the risk adjustment. Unknown coefficient functions are estimated by global polynomial spline approximation based on the maximum likelihood principle. We suggest a bisection minimization approach and a bootstrap method to determine the chart testing limit value. Compared with the previous (parametric) risk-adjusted cumulative sum chart, a major advantage of our method is that the morality rate can be modeled more flexibly by related covariates, which significantly enhances the monitoring efficiency. Simulations demonstrate nice performance of our proposed procedure. An application to a UK cardiac surgery dataset illustrates the use of our methodology.


Introduction
Monitoring surgical outcomes of clinical trials is a critical task for doctors to timely detect patient deterioration. However, this task becomes complicated for assessing and online monitoring surgical performance. It is known that the surgical performance changes as patient characteristics, such as age, weight, blood pressure, and pulmonary status, varies. Therefore, assessing and monitoring surgical performance should be adjusted according to the patient's characteristics existing prior to the surgery. This process is the so-called risk adjustment.
There are several methods for monitoring outcomes of surgery. Examples include the Shewhart chart, the sequential probability ratio test, the exponential weighted moving average (EWMA), and the cumulative sum control chart (CUSUM). Recent studies have suggested to use such schemes to monitor the performance of clinical practitioners, including surgical and general practitioners. Among all the control charts, CUSUM has received much attention because of its simple formulation, intuitive representation, and capability to detect small persistent changes, since it was originally proposed [1]. CUSUM was first applied [2] for surgical PLOS  performance monitoring. Then it was used [3,4] for monitoring pediatric cardiac surgeries, among others. For an overview, see references [5][6][7]. Consider a cardiac surgery example. During 1992-1998, a UK center for cardiac surgery facilitated 6994 cardiac operations including 5212 males and recorded information such as date, surgeon, and Parsonnet score formed by age, gender, hypertension and diabetic status, renal function and left ventricular mass [8]. The age varies between 11 and 99 with mean 62.5, median 64 and standard deviation 11. According to the records, 461 patients died within 30 days of surgery, corresponding to a mortality rate of 6.6%. Hence, it is demanded to assess and monitor the surgical performance in order to help doctors reduce the mortality rate and adjust the surgical plan in subsequent operations. One can use the CUSUM charts in [2][3][4] for this task. However, there is no patients' preoperative risk considered in these works. Such straightforward applications might lead to a biased assessment of surgical performance because of heterogeneity of patients. As indicated in [9], without risk adjustment for heterogeneity among patients, the control chart shows outcomes confounding with the preexisting risk factors. Hence, it is necessary to develop a risk-adjusted (RA) CUSUM for this example.
There are various works on the RA CUSUM in the literature using two classes of models, respectively for discrete and continous outcomes. For discrete outcomes, examples include the RA CUSUM charts for Down's syndrome which adjusted the risk of the age of mother using logistic regression [10], for shoulder surgery which adjusted for patients' rehabilitation conditions [11], and for binary cardiac surgical outcomes which adjusted the risk using a likelihoodbased scoring method [12]. In a discussion paper [13], advantages and disadvantages of various control charts including the RA CUSUM were discussed for monitoring health-care and public-health surveillance. The incremental advantage of RA CUSUM was further assessed for coronary bypass outcomes [14] using the procedure in [8]. The RA CUSUM was also investigated in [15] for binary outcomes using the logistic regression and the Bayesian method. For continuous outcomes, examples include the RA CUSUMs for survival times using the Cox model [16] and the accelerated failure model [9], among others. However, these cusum charts are all risk adjusted based on parametric models.
To adjust the risk factors in the UK cardiac surgery example, a linear logistic regression was used to model the relationship between the surgical outcome and the Parsonnet score [8]. Let t = 1, 2, . . . be the indexes for the patients undergoing surgery in time order. The RA CUSUM chart is used to monitor their outcomes. Let Y t = 1 if patient t dies and Y t = 0 otherwise. Given the t-th patient's outcome Y t and Parsonnet score P t , the model takes the following form: where p t is the mortality rate defined as P(Y t = 1|P t ), and logit(t) = log(t/(1 − t)) is the logit function. Then the conditional probability mass function of Y t given P t is f ðy; yÞ ¼ p t ðyÞ Y t ½1 À p t ðyÞ 1À Y t ; and the mortality odds of failure for patient t is p t /(1 − p t ). Since different patients have different baseline risk levels, one needs to monitor the change of odds ratio of patients. Let R t be the mortality odds ratio of patient t. It is interesting to use the RA CUSUM to sequentially test [8]: where R 0 is typically the mortality odds determined by the current process performance, and R 1 corresponds to an inferior performance. Mathematically, it can be verified that, the probability of failure P(Y t = 1|P t ) equals R 0 p t /(1 − p t + R 0 p t ) and R A p t /(1 − p t + R A p t ) under H 0 and H A , respectively. Hence, the log likelihood ratio for patient t is [8] Then the RA CUSUM statistics are defined as where Z 0 = 0. When the value of Z t exceeds a certain threshold value h, a change in value has been found, and an alarm is signaled. The model parameter θ and hence p t are estimated by maximizing the likelihood of the incontrol dataset fY t ; P t g n 1 t¼1 : Specifically, θ is estimated by maximizing the likelihood: The threshold h is usually decided by the average run length (ARL). For detail see the monitoring algorithm to be introduced later. Since {Z t } is decided by the underlying process of p t in model (1), success of the RA CUSUM depends on if the model of p t is appropriate. If the null hypothesis is rejected, it indicates a significant increase in the mortality rate. By using this method, the Parsonnet score was found to significantly affect the mortality rate, and it was also claimed [8] that this procedure could detect changes in surgical performance earlier than the non-adjusted CUSUM. However, this approach is based on the assumption in model (1) that the log odds ratio of mortality rate is a linear function of the Parsonnet score. This may create modeling bias if the underlying relationship is nonlinear. In particular, the modeling bias becomes more serious when there are interaction effects among the patient characteristics.
To alleviate the modeling bias problem and to cope with possible interaction effect among the patient characteristics, we propose the following varying-coefficient logistic regression (VCLR) model: , μ is the intercept term, U t is a random variable, for example, any entry of X t , and β(u) = (β 1 (u), β 2 (u), Á Á Á, β q (u)) 0 with β i (u) being unknown functions. When β(u) is a constant vector, model (4) reduces to logistic linear regression which includes model (1). Model (4) allows us to model nonlinear relationship between p t and X t . If U t is one entry of X t , it captures nonlinear interaction among X t . For the UK cardiac surgery example, we take the t-th patient's age as U t and Parsonet score P t as X t . It is believed that the effect of a patient's Parsonet score depends on the age, and it is interesting to investigate if there is a nonlinear interaction effect between the Parsonet score and age. Therefore, model (4) can be used to fit the UK cardiac surgery dataset. We estimate β(Á) by the maximum likelihood principle with a polynomial splines approximation in the next section. Then we adjust CUSUM statistics Z t in Eq (3) for monitoring the change of odds ratio of patients. Since we use nonparametric model (4) to adjust the risk, our proposed RA CUSUM is a nonparametric RA method. We propose a bisection search algorithm and a bootstrap method to determine the nonparametric risk-adjusted CUSUM chart limit value. Through simulations and a real data example, we illustrate nice performance and the use of the proposed methodology.

Polynomial spline approximation
Global polynomial spline approximation has become a popular tool in nonparametric smoothing. It has advantages of nice finite sample performance and fast implementation. For function β(Á), it can be approximated by a linear combination of the basis splines (B-splines).
Assume that random variable U t has finite support [a, b]. Let r be the degree of B spline polynomial, and let ξ 1 with d = r + N the inner knot points. The number of inner knots, N, is a tuning parameter and can be chosen by cross validation [17] or generalized cross validation (GCV) [18,19]. We denote by fB j ðuÞg d j¼1 the Bspline basis functions based on the knot set fx i g 2rþN i¼1 . Then the B-spline basis functions enjoy the following properties [20]: Consequently, for any 1 j d and any real u, where B(u) = (B 1 (u), B 2 (u), Á Á Á, B d (u)) 0 and θ i = (θ i1 , θ i2 , Á Á Á, θ id ) 0 . Let V t = (B(U t ) 0 X 1t , . . ., B(U t ) 0 X qt ,) 0 and y ¼ ðy 0 1 ; . . . ; y 0 q Þ 0 . Then model (4) reduces to Therefore, the maximum likelihood estimation method can be directly used to fit model (6) with the in-control dataset. This estimation method is standard in nonparametric smoothing [19] and can be implemented via some existing programs, for example glm and bs in the R software.

Cusum monitoring
Based on the in-control observations fðY t ; X t ; U t Þg the maximum likelihood estimate of the B-spline coefficient θ and the general mean μ in model (6). Then, the ith functional coefficient β i (u) can be estimated byb i ðuÞ ¼ BðuÞ 0ŷ i . This leads to the estimate of mortality rate p t : wherebðuÞ ¼ ðb 1 ðuÞ 0 ;b 2 ðuÞ 0 ; Á Á Á ;b q ðuÞ 0 Þ 0 : Using Eq (2) we obtain the estimate of log-likelihood W t , denoted byŴ t . By Eq (3), we calculate the estimate of Z t , denoted byẐ t . Note the random properties of the in-control statisticẐ t . Let h be the limit value of this testing procedure. IfẐ t > h, then we conclude our RA CUSUM chart triggers a signal, which indicates that the mortality rate increases.
The limit value h plays a critical role in the CUSUM chart monitoring. It is usually determined by virtue of average run length (ARL) in control, denoted by ARL 0 , the expected run steps from the start of process to the time that the signal is triggered. Optimality of CUSUM in terms of ARL was studied in [21,22]. A better chart has longer ARL 0 and shorter ARL 1 (ARL when the process is out-of-control). Hence, given a large enough ARL 0 , the optimal limit value h can be determined. Usually, the optimal limit value h has no closed form. Thus, we use a numerical search method, bisection, to decide it. This approach needs predetermined upper and lower bounds for h.
Let L be the run length for the monitoring process from start until a signal is triggered. Then L is a random variable, and ARL is its expectation, namely, ARL = E(L). The calculation of ARL is critical in determining h. ARL is a theoretical value, so it should be estimated during monitoring. It is usually not a good idea to get a large in-control sample for estimating the ARL. In the next section we propose a bootstrap resampling method to achieve the goal.
For a given ARL 0 , we can obtain the limit value h through the above procedure. Thus, the CUSUM chart in Eq (3) can be used to monitor the surgical process in the follow up. That is, for the t-th patient, Z t > h indicates an abnormal observation, i.e., the process is out of control. In such a case, surgeons should check where and why the process becomes out of control. In addition, the monitoring efficiency can be assessed by calculating ARL 1 , similar to that of ARL 0 .

Monitoring algorithm
Suppose h 2 [a, b] with a > 0. Given ARL 0 , we propose the following the chart monitoring algorithm: • Phase I. Determination of the optimal limit value h opt .  (v). Repeat steps (i)-(iv) until j d ARL 0m À ARL 0 j M: Then the optimal value can be taken as h opt = h.
With the optimal value h opt , we calculate the RA CUSUM in Eq (3) based on the estimated varying coefficient logistic regression model (4). LetẐ t be the estimate of CUSUM statistic. WhenẐ t > h opt , a signal is triggered and monitoring is stopped.
In our experience, the bootstrapping-based bisection numerical method for the determination of h performs well and stably. Other numerical approaches can be used. For example, one can search proper value on a given grid of points. One can also use a theoretical distribution of ARL to determine h. However, one cannot expect that the theoretical method performs stably, since it depends on the assumption of a theoretical distribution. Hence, we use the above method in our numerical study.
The proposed monitoring method can be modified to detect a decrease in the mortality rate odds ratio by the following RA CUSUM chart Z t ¼ min f0; Z tÀ 1 À W t g: The above algorithm can be updated to this decrease monitoring with a signal triggered aŝ Z t < À h opt .

Numerical studies
In this section, we conduct simulations to demonstrate nice performance of the proposed approach and to use the UK cardiac surgery data to illustrate the use of our RA CUSUM chart.

Simulation
The objective of our simulations is to compare our approach with that of Steiner et al. in [8]. We conduct 500 simulations. For each simulation, we set sample size n = 3000, where the first one-third observations are used as the in-control process with n1 = 1000 and ARL 0 = 900, and the remaining two-third observations are used for the out-of-control process with sample size n 2 = 2000.
We use cubic B splines with N inner knots to approximate β(u). The inner knots are set as equally spaced sample quantiles of fU i g n 1 i¼1 . We regard N as tuning parameter, and it is chosen by minimizing the value of ARL 1 . We evaluate the estimator of β(Á) by its mean squared errors (RMSE): . Example 1 (Varying-coefficient models) We generate fðY t ; X t ; U t Þg n t¼1 from the following varying coefficient logistic regression model: where U t * U(−0.5, 0.5), X t is uniformly distributed on the set {0, 1, Á Á Á, 20}, and β(u) = 0.5 cos(πu). For the in-control process, μ = −3, and for the out-of-control process, μ = −3 + log R A , where R A equals to 0.3, 0.5, 0.8, 1.5, 2, 2.5, 3, 3.5, or 4. These values of R A are used to monitor the decrease (R A < 1) and increase (R A > 1) in the mortality rate odds ratio. , where the typical estimate corresponds to that with median performance in terms of RMSEs in 500 simulations. It indicates that our estimate is quite close to the true curve. Table 1 summarizes the monitoring results of [8] and ours, where "V-C logistic" stands for the results from the RA CUSUM chart based on the varying coefficient logistic regression model (4), and "Linear Logistic" represents  the results from the standard logistic model (1). It is seen that both the mean and variance of ARL 1 based on the varying coefficient logistic model (4) are significantly smaller than those of ARL 1 based on the linear logistic model (1). This indicates that our proposed RA CUSUM chart outperforms the RA CUSUM chart of [8].  conclude from the chart fluctuation that our procedure performs more stably than that of Steiner et al.
Example 2 (Constant coefficient models) Same as in Example 1, but with β(u) = 0.5. In this example, since the underlying process has form of model (1), the CUSUM method of Steiner et al. in [8] should work. Since our model (4) contains model (1), as expected, the two RA CUSUM charts perform similarly. The simulated results are reported in Table 2. Since the two

Real data analysis
We here monitor the performance of the UK cardiac surgery using the RA CUSUM charts of [8] and ours. Some details of this dataset were described in the introduction section. For our procedure, we employ the varying-coefficient logistic model which is an extension to the models previously used [8,14]. Like [8], we treat the data during 1992-1993 as the in-control process and begin monitoring in 1994. We use model (5) to approximate coefficient function β(Á), and the optimal number of knots is calculated as 5. Fig 4 displays the fitted curve of β(Á). It seems that the effect of Parsonnet score nonlinearly depends on Age. In particular, the Parsonnet score strongly correlates with the mortality rate in people aged less than 20 years.
To assess the actual performance of the proposed procedure, we compare it with the method of [8] for detecting 1.5 times the odds of death (R 0 = 1, R A = 1.5) and half of the odds of death (R 0 = 1, R A = 0.5). By bisection and bootstrapping methods, the values of h opt for the RA CUSUM charts based on models (4) and (1) are 3.43, 3.50 for the increasing detection and −4.16, −4.22 for the decreasing detection, respectively. Figs 5 and 6 plot the RA CUSUM charts. As shown in the figures, the step number triggering a signal for the increase monitoring is 1572 for our RA CUSUM and 1584 for that of [8]. For the decrease monitoring, the corresponding step numbers are 2921 and 2998, respectively. In all cases, the resulting values of ARL 0 for both method are 2000. These results show that our proposed procedure can detect an abnormal signal much earlier in the decrease monitoring. Risk-adjusted monitoring

Conclusion
In this paper, we have proposed the nonparametric RA CUSUM chart based on the varying coefficient logistic regression model for monitoring the surgical outcomes. The maximum likelihood and cubic B spline approximation has been used for estimation. The bisection and bootstrap methods have been incorporated to determine the testing limit values. Numerical studies show the advantages of our method over that of [8].
The relationship between the covariates and the mortality rate is usually unknown in applications. Thus, other nonparametric or semiparametric regression may be employed to capture this relationship. The proposed RA monitoring procedure can be extended to other control charts and (or) a mixture of several control charts, such as the charts based on Bayesian approaches [7] and a combination of EWMA and CUSUM charts [23][24][25][26][27][28][29].