The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis

Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making.


Introduction
Regression technique has been widely used in cost-effectiveness analysis (CEA) to control confounding variables in modelling for patient-level data [1][2][3][4]. Ordinary least squares (OLS) estimation, which minimizes the sum of squares of error, is the most common approach used to find a best-line of predicted values because OLS estimation provides a best linear unbiased estimator (BLUE) among the class of linear ones [5]. However, OLS estimation can be affected by the presence of outliers, observations which deviate far from the linear relation of the response variable and the exploratory variables [6]. Though outliers usually bias the OLS predictions towards outliers, they are often embedded in empirical analysis.
In general, outliers can be roughly classified into two types: man-made one and random one [7]. Man-made outliers may be arising because of typographical error, mis-reporting information involving private matters such as salary and drug abuse, incorrect distribution assumption and sampling error; random outliers may be arising because of random chance for drawing sample from a population [8]. Presence of man-made or random outlier, or both, would seriously influence the results of statistical analyses including point and interval estimates, and type I and type II errors [8,9]. Some man-made outliers can be avoided by a strict data entry and rechecking processes before conducting a statistical analysis. Data transformation is another way to reduce the influence of outliers. However it may be not appropriate for hypothesis testing and straightforward interpretation becomes difficult using transformed data [8]. Aside from data transformation, removing outliers from the database directly is a simple practice to avoid the problem. However, arbitrarily removing some data from a database may lead to sample selection bias which can be considered as a specification error in linear regression [10] and potentially threats internal validity [11]. In most cases, outliers are hard to identify particularly when data are multi-dimensional. In addition, some outliers are hard to detect because they are masked by other outliers. That is referred to a masking effect [12]. Instead of data transformation or removing data, robust methods can provide an alternative approach to deal with outliers without deleting them.
OLS estimator is extremely sensitive to multiple outliers in linear regression analysis. It can even be easily biased by just a single outlier because of its low breakdown point [6] which is defined as the percentage of outliers allowed in a dataset for an estimator to remain unaffected [13]. The breakdown point of OLS estimator equals to the inverse of the sample size which would tend to zero as the sample size tends to grow large [6]. Unlike OLS estimator, robust regression provides robust regression estimators even in the presence of multiple outliers. The impact of outliers when using robust regression is minimized by giving smaller weight for outliers in the estimation procedure [14]. So far, several robust regression estimators have been proposed. The simplest robust approach of robust regression is M-estimation and its variant is general M-estimation [15][16][17]. Least trimmed squares (LTS) estimation is a robust method with high breakdown point, which can withstand high proportion of outliers and still maintains its robustness [18]. MM-estimation has both high breakdown point and higher statistical efficiency [19].
In CEA studies, outliers are more frequently observed in cost data than effectiveness data. The conventional strategy to deal with outliers is to estimate incremental cost-effectiveness ratio (ICER) by including and excluding outliers in order to see how they impact ICER [20][21][22][23][24]. In those studies, the estimates of ICER when including outliers were larger than those when excluding them. Therefore, analyzing cost-effective data with and without outliers can lead to different CEA results. Furthermore, the proportions of outliers were reported to be less than 10% and this could be underestimated because of masking effect. Sometimes, the influence of outliers on ICER could only be minor when the proportion of outliers is relatively small, and they may then be excluded directly without much concern. However, it might be questionable to inform decision makers by simply presenting costeffectiveness results by including and excluding outliers. Up to now, only one study has investigated how presence of outliers (3%, 5% and 10% of outliers assumed in the data) in cost data would impact the precision of confidence interval for ICER estimated by both bootstrapping method and Fieller's theorem [25]. The results showed that presence of outliers would affect the coverage probability of the confidence interval of ICER. However, impact of outliers on regression-based CEA and the way to tackle the problem have not been addressed.
The objective of this study is to evaluate the impact of outliers on a net-benefit regression (NBR), a kind of regression-based CEA, using a number of simulated scenarios where cost outliers were generated and a real dataset. The outliers were assumed to occur randomly in the cost variable and to be larger than usual values of the cost variable in the simulation. The different simulation scenarios were considered and described in the following section. An empirical example of antiplatelet therapy in the management of cardiovascular diseases was presented to demonstrate the impact on the probability and critical value of cost-effectiveness, especial on the cost-effectiveness acceptability curve (CEAC), which provides a summary for acceptability of cost-effectiveness with a range of willingness-to-pay (WTP) [26].

Methods
Consider a cost-effectiveness study which compares two arms (Arm 1 vs. Arm 0), data for the effect (E i ), the cost (C i ) and the corresponding covariates (x ji ), j = 1, 2,…, s of each subject i, i = 1, 2,…, n, were collected. Then, net-benefit value (NB i ) for each subject i can be expressed as given a maximum acceptable WTP per unit of effectiveness, l.

The NBR Framework
The relationship between NB i and x ji can be expressed as a linear regression: where z i is an indicator variable (0 for Arm 0 and 1 for Arm 1), b 0 , b 1 ,…, b s and d are regression parameters and e i is the error term. Compared with Arm 0, the incremental net-benefit of Arm 1 is the estimated regression parameter d on the treatment indicator. This model is usually referred to a NBR [4]. In this model, Arm 1 is considered cost-effective if the incremental net-benefit, d, is positive and not cost-effective if d is non-positive. With regard to the sampling uncertainty, the following statistical hypothesis can be tested for cost-effectiveness of Arm 1: The computation of a p-value for this one-sided test and the point estimates and inferences for the NBR are well-documented and the CEAC can be plotted by varying l from 0 to a large value on the horizontal axis and the corresponding probabilities of costeffectiveness on vertical axis. Therefore, the probability of costeffectiveness is calculated as 1 minus the p-value of the above test [4,27].

Robust Estimations for the NBR
A large number of estimation approaches can provide robust estimates for a linear regression including M-estimator and its variants [15][16][17]28,29], least-median estimator [18], LTS estimator [18], MM-estimator [19], least-absolute estimators [30], Sestimator [31], two-stage estimator [32] and so on. Comparative studies of robust estimators and OLS estimator based Monte Carlo simulation or real examples have been published but the results in term of bias, efficiency, test of the null hypothesis and forecast ability of those estimators were inconsistent [9,[33][34][35][36][37][38]. MMestimator was better than OLS estimator and the other robust estimator in relative efficiency, bias and the statistical test [9]; Mestimator and LTS estimator outperformed OLS estimator on predicted valued of the dependent variables [33,34]; M-estimator performed better than LTS estimator and MM-estimator on Rsquare [35]; MM-estimator and LTS estimator provider a higher R-square that OLS estimator and the estimates from MMestimator and LTS estimator were very closed [36]; Tukey's bisquare M-estimator was performed better on effect estimates than Huber M-estimation and OLS estimator for experimental design data [37]; robust estimates showed the better predicted ability [38]. The inconsistent results were possibly caused by the difference in data structures or simulation scenarios. Previous study suggested that the choice of robust estimation would depend on the structure of data and users' discretion [39]. In this study, five robust estimations (Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and LTS) were used to illustrate the impact of outliers on NBR in CEA study compared with OLS estimation. These five estimations were discussed extensively in those comparative studies and supported in the standard statistical packages.
For parsimony purpose, the NBR mentioned above can be reexpressed as follows: and e i is the corresponding residual. This study uses one classical and five robust approaches to estimate the regression parameter b: including ordinary least square estimation, three types of Mestimation [15][16][17], MM-estimation [19] and LTS estimation [18]. The most common robust estimation of a linear regression model is M-estimation [15]. The general M-estimatorb b minimizes the following finite summation where r is a symmetric function which contributes to residual. In this paper, four types of the function r are included: where k~1:345, where k~4:685: The function r in E1 is for ordinary least squares estimation, E2 is for Huber M-estimation [15], E3 is for Hampel M-estimation [16], and E4 is for Tukey's bisquare M-estimation [17], respectively. MM-estimation was based on an M-estimator starting at the coefficients given by S-estimator and with fixed scale given by S-estimator [40]. Least trimming square (LTS) estimation is based on minimizing ð Þ are the ordered squared residuals [18].

Simulation Analysis
We designed a simulation study to illustrate the potential impact of outliers in CEA using NBR on determining cost-effectiveness of Arm 1 based on the comparison of six estimation procedures, i.e. OLS estimation, Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and LTS estimation as detailed in the following section.
Simulation Design. The effect (E i ) and cost (C i ) of the subject i is generated randomly from a bivariate normal distribution as where x 1i is a dummy regressor which is generated from a Bernoulli distribution with probability 1 2 , indicating that the subject belongs to Arm 0 (x 1i~0 ) or Arm 1 (x 1i~1 ) and x 2i is a continuous regressor which is generated from a normal distribution with mean 2 and standard deviation 0.5. The parameters a 0 , a 1 and a 2 are all assumed to be 1; b 0 is assumed to be 50, b 1 is assumed to be 10 and b 2 is assumed to be 1. So, compared with the subjects in Arm 0, the subjects in Arm 1 will benefit one unit of effect (a 1~1 ) but cost 10 more dollars (b 1~1 0). The covariance matrix is set to be Those first n| 1{p ð Þ simulated samples were considered as regular cases (non-outliers), where p was the proportion for outliers in n samples. For outlier sample, we assumed that the outliers just only occur in cost variable (C i ) and last n|p observations were denoted by outliers. Based on previous literatures and potential masking effect, the proportion of outlier p was set to be 0.05, 0.1, 0.2 and 0.3. Outlier samples were generated from three scenarios described as follows: I.
C i was randomly drawn from a normal distribution with mean 150 and variance 1 for i~n 1{p ð Þz1, . . . ,n. II. C i was randomly drawn from a normal distribution with mean 200 and variance 1 for i~n 1{p ð Þz1, . . . ,n. III. C i was randomly drawn from a normal distribution with m e a n 1 5 0 a n d v a r i a n c e 1 f o r i~n 1{p ð Þz1, . . . ,n 1{ p 2 À Á z1 and drawn from a normal distribution with mean 200 and variance 1 for i~n 1{ p 2 À Á z2, . . . ,n.
Performance Comparison. For each set of parameter design, sample size (n = 100, 500 and 1000) and WTP (l = 7, 8, 12 and 13), 500 independent data sets were created and six estimation procedures were applied to analyse each data set. After 500 repetitions, one quantity for each estimation procedure was calculated:

Q~#
of rejecting H 0 : dƒ0 500 where Q was referred to the empirical size for l = 7 and 8 (i.e. H 0 : dƒ0 is true, but rejected) and the empirical power for l = 12 and 13 (i.e. H 0 : dƒ0 is false and rejected). The empirical size and empirical power were used to illustrate type I error and power (1type II error) in 500 repetitions among different estimation procedures respectively. Simulation Results. The results of the empirical size and empirical power were showed in Table 1 and Table 2, respectively. In Table 1, most empirical sizes were below a significance level saying 0.05 except for some cases in 20% and 30% of outliers. Table 2 showed that three M-estimations, MM-estimation and LTS estimation had higher empirical powers than OLS estimation. However, two robust procedures, Huber M-estimation and Hampel M-estimation, had lower empirical power than OLS estimation when the proportion of outliers achieved 30%. In the scenario of small sample size (n = 100), all empirical powers were less than 0.5 among all estimations; in contrast, in scenarios of large sample size (n = 500 or 1000), most empirical powers were over 0.5 except for OLS estimation. In short, as sample size increased, the empirical sizes decreased while empirical powers increased. Among the robust estimations, empirical powers of three M-estimations decreased dramatically as the proportion of outlier increased while the estimated powers of MM-estimation and LTS estimation slightly decreased. Larger WTP would lead to a smaller empirical size and larger empirical power.

Empirical Example: Antiplatelet Therapy
In this section, we used a real example of antiplatelet therapy, which provided prevention of cardiovascular diseases (CVD) to demonstrate different estimation scenarios of the NBR.
Background. Antiplatelet therapy which includes an administration of low-dose aspirin (75-150 mg) and clopidogrel, is effective as a secondary prevention for some CVD. Patients with aspirin treatment may have some level of gastrointestinal (GI) bleeding, and clopidogrel is aimed to reduce the occurrence of GI bleeding. A previous CEA has showed that aspirin plus protonpump inhibitors (PPIs) was more cost-effective than clopidogrel with respect to hospitalization because of GI complications [41]. This study focused on those patients who had a medical history of GI bleeding and compared the cost-effectiveness with respect to  The Impact of Outliers on NBR Model PLOS ONE | www.plosone.org outpatient visit between aspirin plus PPIs and clopidogrel. This study was conducted from Taiwanese healthcare payer perspective. Data Source. The data were drawn from the Taiwan National Health Insurance Research Database (NHIRD) during year 2001 and 2006. Study subjects with one-year follow-up starting from the discharged date were classified into two groups, based on the antiplatelet therapy regimens they received during the 90 days following the hospital discharge due to major GI complications: (1) clopidogrel group: those who have been prescribed clopidogrel alone and (2) aspirin plus PPIs: those who have been prescribed aspirin plus PPIs.
Effect, Cost and Covariates. Effect variable was the number of days between the discharge date and the first time of outpatient service for GI illness including bleeding and perforation after discharge; the unit for the effect was days with maximum of 365 days. Cost variable was defined as the accumulated medical cost during the observation period (time to event for GI cases and 365 days for non-GI cases), including all medical expenses for inpatient and outpatient visits of CVD events and inpatient visits of GI events. The unit for cost variable was NTD (New Taiwan Dollars). Subject's age, gender, medicine use (DDD (define daily dose) for clopidogrel, aspirin plus PPIs), and medical history prior the follow-up (diabetes mellitus, cardiovascular diseases and lungrelated diseases) were included as control variables in NBR.
Cos-effectiveness Analysis. A NBR analysis based on OLS estimation was initially used to compare the cost-effectiveness between the groups of aspirin plus PPIs and the clopidogrel group  13 12 13 12 13 12 13 12 13 12 13 12 13 12 13 12 13  given a set of l (the maximum acceptable WTP per unit of effectiveness). The results of preliminary analysis showed that there were some potential outliers in the cost variable. Because of the presence of those outliers, robust estimation procedures were then used to analyse the data. For each l (50, 100, 150, and 200), the estimates on treatment effect, the corresponding one-sided pvalues, and the probabilities of cost-effectiveness (calculated by both regression and bootstrapping) were summarized for comparison. The CEACs were also conducted for both OLS and robust estimation procedures. Empirical Results. Table 3 showed the baseline characteristics of total sample of 649 subjects. Among them, 564 (87%) subjects used aspirin plus PPIs and 85 (13%) subjects used clopidogrel. In terms of the effects, aspirin plus PPIs group had longer delay on seeking outpatient care for GI illness than clopidogrel group (270.78 days/SD = 117.00 vs 250.84/ SD = 121.75). Regarding the cost, the mean costs for aspirin plus PPIs were 27210 NTD (SD = 99648) and 22384 NTD (SD = 46918) for clopidogrel groups. Over 60% were males, and the mean age was about 72 years among the total sample. There were overall about 15% of study subjects who had medical history of diabetes mellitus, cardiovascular disease or lung-related diseases during one year prior to the entry into the study. Table 4 showed the estimates of NBR using four values of l = 50, 100, 150 and 200. The proportions of outliers were around 4% to 16% given different values of WTP. The probability of costeffectiveness (aspirin plus PPIs vs. clopidogrel) was calculated by regression and bootstrapping approach given different l 0 s. The estimated values of incremental net-benefit (regression coefficient on the treatment indictor) by OLS estimation were the lowest than those by robust estimations. The results generated by OLS have the lowest probabilities of being cost-effective (both based on regression and bootstrapping). All probabilities of cost-effectiveness of OLS estimation were under 0.75, much lower than the probabilities of other five robust regressions. Figure 1 shows the CEAC's estimated by different estimation procedures. Except for extremely low WTP, the CEAC by OLS estimation was below other CEACs by robust approaches. If the probability of costeffectiveness is set to be 0.8, the critical value for aspirin plus PPIs being cost-effective compared to clopidogrel is about 200 NTD by robust methods and would be higher than 200 NTD by OLS estimation.

Discussion
In this study, we presented simulations on different parameters to demonstrate the influence of outliers in estimating NBR for CEA. It was shown that the presence of outliers in cost data can lead to lower empirical powers under various outlier scenarios and higher empirical sizes for some scenarios in 20% and 30% of outliers, hence leading to incorrect decision making.
There were two important features in the simulation. The first feature was the consideration of outlier mechanism. Large outliers in cost variable were assumed to occur randomly. In practice, outliers can be caused by many reasons such as measurement error or data entry mistakes. Under such circumstance, the errors or mistakes could be corrected ad hoc and prevented beforehand. However, when outliers were not man-made and cannot be excluded from the analysis directly, CEA should be conducted with caution on some specific patient populations. In this context, cost or effect outliers are not attributed to the illness or treatment of interest. Instead, they occur primarily because of patients' complicated conditions, other severe medical history or old age which could incur higher medical costs. In such case, the reasons causing outlier become the confounding factors and direct deletion of outliers from the data may bias CEA results. To circumvent Table 3. Patients' baseline characteristics, medical history, and medication use during the follow-up. this, subgroup (subpopulation) analysis may be an alternative to such situation if one can distinguish between outliers and usual case [3]. However, it is often the case that outliers and usual cases are not directly distinguishable in empirical studies. In view of this, robust estimation provides a procedure to avoid the possible influence of outliers. The second feature was that the scenario of hypothesis testing on cost-effectiveness was considered. When true net-benefit is nonpositive, OLS and robust estimations performed almost equally well where chances of making wrong decision (type I error) were less than the statistical significance level 0.05 except when higher proportion of outliers is in the sample. However, the focus of this study was to point out the better performance of robust estimations over OLS one in terms of empirical power, i.e. declaring positive net-benefit while true net-benefit was positive. Specifically, it is worthwhile noting that when the proportion of outliers in the data is large, i.e. over 30%, general M-estimations such as Huber Mestimation, Hampel M-estimation and Tukey's bisquare Mestimation performed equally well as OLS estimation. LTS estimation and MM-estimation still produced robust results with high likelihood of making correct decisions, remaining uninfluenced by the proportion of outliers because of high breakdown point.
In the empirical example of antiplatelet therapy, robust estimations led to higher probability of claiming aspirin plus PPIs as cost-effective than clopidogrel given a set of WTPs. In Figure 1, CEAC of OLS estimation was well below those of robust estimations. As the WTP value increased, the CEAC of OLS estimation only slightly increased from 50% to around 60% while the five robust estimations attained above 80%. Compared to robust estimations, using OLS estimation would require a comparatively larger critical value to conclude that aspirin plus PPIs is cost-effective. This indicated that aspirin plus PPIs was considered more significantly cost-effective than clopidogrel in robust estimations while not in OLS estimation given an appropriate WTP.
One concern using robust estimation for net-benefit data is the issue of sample size. In the simulation, it was shown that the empirical power of all robust estimations were enhanced as sample size increased. Therefore, relatively larger sample size was required to ensure the reliability of CEA results in NBR. In summary, sample size, outlier distribution and proportion all played a major role in testing cost-effectiveness in NBR. Smaller sample size, serious departure of outlier distribution from target population and large outlier proportion would lead to erroneous results. Either increasing sample size or using robust approaches would reduce the impact of outliers. However, if the proportion of the outliers was over 20%, the performance of three types of Mestimation was almost equivalent or sometimes worse to that of OLS estimation. MM-estimation was especially suitable to deal with the outliers derived from extreme distribution and LTS estimation was almost dominant over other estimation in our simulated results. Cautious measures are strongly suggested when handling the case with small sample size, large proportion of outliers and extreme outliers.
In a nutshell, five robust estimations outperformed OLS estimation on hypothesis tests of cost-effectiveness. Among those robust estimations, LTS estimation provided a better result in testing cost-effectiveness and a higher probability of claiming costeffectiveness of an intervention when it is actually cost-effective given a WTP. Tukey's bisquare M-estimation and MM-estimation performed almost as well as LTS estimation when the proportion of outliers was less than 30%. For more extreme outliers, MMestimation performed equally well with LTS estimation. In summary, LTS estimation is recommended in practice when a NBR is applied for CEA. The Impact of Outliers on NBR Model PLOS ONE | www.plosone.org Table 4. Results of net benefit estimates and the probability of cost-effectiveness for six estimation procedures.