Targeting Uric Acid and the Inhibition of Progression to End-Stage Renal Disease—A Propensity Score Analysis

Background The role of uric acid (UA) in the progression of chronic kidney disease (CKD) remains controversial due to the unavoidable cause and result relationship. This study was aimed to clarify the independent impact of UA on the subsequent risk of end-stage renal disease (ESRD) by a propensity score analysis. Methods A retrospective CKD cohort was used (n = 803). Baseline 23 covariates were subjected to a multivariate binary logistic regression with the targeted time-averaged UA of 6.0, 6.5 or 7.0 mg/dL. The participants trimmed 2.5 percentile from the extreme ends of the cohort underwent propensity score analyses consisting of matching, stratification on quintile and covariate adjustment. Covariate balances after 1:1 matching without replacement were tested for by paired analysis and standardized differences. A stratified Cox regression and a Cox regression adjusted for logit of propensity scores were examined. Results After propensity score matching, the higher UA showed elevated hazard ratios (HRs) by Kaplan-Meier analysis (≥6.0 mg/dL, HR 4.53, 95%CI 1.79–11.43; ≥6.5 mg/dL, HR 3.39, 95%CI 1.55–7.42; ≥7.0 mg/dL, HR 2.19, 95%CI 1.28–3.75). The number needed to treat was 8 to 9 over 5 years. A stratified Cox regression likewise showed significant crude HRs (≥6.0 mg/dL, HR 3.63, 95%CI 1.25–10.58; ≥6.5 mg/dL, HR 3.46, 95%CI 1.56–7.68; ≥7.0 mg/dL, HR 2.05, 95%CI 1.21–3.48). Adjusted HR lost its significance at 6.0 mg/dL. The adjustment for the logit of the propensity scores showed the similar results but with worse model fittings than the stratification method. Upon further adjustment for other covariates the significance was attained at 6.5 mg/dL. Conclusions Three different methods of the propensity score analysis showed consistent results that the higher UA accelerates the progression to the subsequent ESRD. A stratified Cox regression outperforms other methods in generalizability and adjusting for residual bias. Serum UA should be targeted less than 6.5 mg/dL.


Introduction
Impact of uric acid (UA) on the progression of chronic kidney disease (CKD) still remains controversial due to inconsistent results of the observational studies [1][2][3][4][5][6][7][8]. The inconsistency may be attributed to the test cohorts which differed in the grades of CKD stages, the presence or absence of other CKD risk factors, and additional comorbidities such as diabetes. In addition, the selection of renal outcome varied from the slight increase in serum creatinine to entering end-stage renal disease (ESRD). Generally, the time-dependent nature of serum UA in the clinical course over the CKD progression has not been taken into consideration; in the early stage of CKD the value is not necessarily high whereas in the later stage the value increases significantly. The magnitude of changes in serum UA was large similar to other time-varying parameters such as hemoglobin, albumin-corrected calcium, phosphorus and proteinuria [9]. As long as using the baseline serum UA, risk analysis for the subsequent ESRD may overlook the impact of serum UA [7,8]. Consequently, a scenario between UA and CKD composes a typical example of "chicken and egg problem" [10][11][12]. To help overcome this problem, we attempted to utilize time-averaged values that may represent the continued impact of time-varying parameters on the progression of CKD [9,13]. However, our recent study failed to show that serum UA, either baseline or time-averaged values over 2 years, was not significant in the timeto-event analysis, probably due to confounding with other stronger covariates such as baseline estimated GFR, serum albumin, hemoglobin and proteinuria [13].
Recently a propensity score analysis is increasingly being used to estimate causal effects in the observational studies because one can replicate the prospective randomized controlled trial by minimizing baseline confounding as much as possible [14]. To solve the cause and result relationship between serum UA and risk for CKD progression, we were prompted to utilize this attractive modality. The modern propensity score analysis includes matching, stratification, covariate adjustment and inverse probability weighting [14]. In the present study the first three methods originally established by Rosenbaum and Rubin [15] were examined to test for the independent significance of serum UA and we compared advantages and disadvantages of the three different methods from the clinical point of view.

Study protocol and ethical statement
We used a retrospective CKD cohort already reported [13] but with observation periods 1 year. Inclusion criteria consisted of CKD stage 3 and 4 and age 20 to 84 years. On the other hand, patients with nephrotic syndrome, malignancy, obstructive nephropathy, acute kidney injury and gout were excluded. All the patients (n = 803) were followed up to 6 years until censoring or reaching the initiation of dialysis. The present study was approved by the institutional review board (IRB) in the Teikyo University Review Board #14-115 and was executed in accordance with the principle of the Helsinki Declaration. Written informed consent was waived after approval of IRB and the patient records and information was anonymized and de-identified prior to analysis.
Blood was tested using hematology autoanalyzer (Sysmex XE-5000, Kobe, Japan) and blood chemistry parameters were measured by routine measurements using autoanalyzer (LABOS-PECT 008, Hitachi High-Technologies Corporation, Tokyo, Japan). Creatinine concentration in serum and urine was measured by an enzymatic method. Serum UA was measured based on uricase method and urinary protein concentration measured by a pyrocatechol violet-metal complex assay method. Serum UA measured at every visit was calculated until censoring or reaching estimated GFR 5 mL/min/1.73 m 2 as time-averaged UA in the follow-up. Estimated GFR was calculated using the Modification of Diet in Renal Disease (MDRD) study equation for Japanese population [16]. And the grade of CKD was classified based on the Kidney Disease Outcomes Quality Initiative (K/DOQI) practice guidelines [17].
Use of antihypertensives including angiotensin converting enzyme inhibitor or angiotensin II receptor blocker (combined as RASi), diuretics, and UA-lowering drugs were recorded as yes (coded as 1) or no (coded as 0). The baseline covariates including the information of drug use became 23 in total and were used for the propensity score estimate modeling.

An end point of renal outcome
A primary end point was defined as the incidence of ESRD (initiation of hemodialysis or peritoneal dialysis). Death was treated as censoring because the present study focused on the effect of UA on the subsequent ESRD rather than the risk of mortality [13]. Before a propensity score analysis, a standard multivariate Cox proportional hazards model was performed using all the 23 baseline covariates to obtain the independent risk factors in our CKD cohort.

A propensity score analysis
The target thresholds of time-averaged UA were set at 6.0, 6.5 and 7.0 mg/dL based on the clinical implication. The probability to reach above the threshold was determined by a multivariate binary logistic regression using the aforementioned 23 baseline covariates. To solve overlap problem, the participants were trimmed 2.5 percentiles from the extreme ends of the cohort then a subsample of 763 patients was re-stratified on the quintiles of the propensity scores [18,19]. Sex difference was not pursued in the present study simply due to the sample size.
Matching. Participants above or below the threshold of time-averaged UA (6.0, 6.5 and 7.0 mg/dL) were matched using a greedy method with a 1:1 pair. The caliper size was set at 0.20 times standard deviation of the logit of the propensity scores [14]. The model of assignment was estimated by c-statistics and the balance between two groups was checked by paired comparison tests and standardized differences of the 23 baseline covariates [20]. A time-toevent survival analysis was examined by a Kaplan-Meier analysis with stratified log-rank test [14,21]. Moreover, hazard ratio, absolute risk reduction and number needed to treat were computed [22][23][24].
Stratification of Cox proportional hazards model. A stratified Cox proportional hazards model was conducted in the substrata on the quintiles of the propensity scores [14]. Then, a pooled hazard ratio of the higher group of the time-averaged UA was obtained as a crude hazard ratio (Model 1). Survival analysis was adjusted for the baseline covariates including age, sex, diabetic nephropathy, baseline estimated GFR and proteinuria (Model 2) and further adjusted for all the covariates which affected the subsequent ESRD extracted by a standard multivariate Cox proportional hazards model (Model 3).
Covariate adjustment. Covariate adjustment was done by adjusting for the logit of the propensity scores (Model 1) and other covariates similar to the stratified multivariate Cox regression models as stated earlier (Models 2 and 3).

Statistical analyses
Values for categorical variables are given as number (percentage) and values for continuous variables are given as mean ± standard deviation or median [interquartile range]. The propensity score model was tested for its accuracy by c-statistics according to the area under the Receiver Operating Characteristic (ROC) curve for the threshold [25] and for its goodness-offit by Hosmer-Lemeshow test. Difference between two groups was examined by unpaired t test and chi-squared test before matching whereas the data after matching were compared by paired t test and McNemar test or Cochran Q test as appropriate [21]. Standardized differences between two groups before and after matching were calculated for each covariate and small absolute values (< 0.1) was regarded as supporting the balance between the groups [25,26]. For a Cox proportional hazards model, any covariate was tested for its proportional hazards assumption using both a time-dependent Cox regression and a Schoenfeld residual plot. A standard multivariate Cox regression without the propensity scores was conducted in a stepwise manner with inclusion of p < 0.05 and exclusion of p > 0.10 but the propensity scorebased analyses did not undertake a stepwise manner. Goodness-of-fit of the proposed model was measured by Akaike information criterion (AIC) [27]. Statistical analyses were performed using SPSS version 22 (IBM, Tokyo) and STATA version 14 (Lightstone, Tokyo). A p value less than 0.05 was considered statistically significant.

Baseline characteristics and time-averaged uric acid in the follow-up
During the follow-up period of 4.0 ±1.6 years, 110 out of 803 patients progressed to ESRD. The demographic characteristics and baseline values were shown in Table A in S1 File. A standard multivariate Cox regression analysis was performed in the stepwise manner; baseline estimated GFR, proteinuria, albumin, Na-Cl, sex (male), age, phosphorus, LDL cholesterol and diabetic nephropathy were extracted as independent predictors of ESRD (Table B in S1 File). Regarding these parameters proportional hazards assumption was not violated or multicollinearity was not observed.
The values of baseline UA and time-averaged UA are normally distributed as depicted in   that the regression line between time-averaged UA vs. baseline UA was less than a unity in the slope and converged at UA of 7.0 mg/dL, indicating that the patients with time-averaged UA > 7.0 mg/dL received more chance of using UA-lowering drugs (61% vs. 39%). On the other hand, the patients with baseline UA < 7.0 mg/dL showed the increase in time-averaged UA in the follow-up due to the advancement of CKD.
A propensity score analysis Propensity score matching. A subsample after trimming consisted of 763 participants among whom 95 or 96 entered into ESRD afterwards. Baseline covariates before and after matching were shown in the threshold of time-averaged UA of 6.0, 6.5 and 7.0 mg/dL in Tables 1-3, respectively. Following matching, all the baseline covariates were well balanced by paired analysis while some of the covariates showed their standardized differences > 0.1. C-statistics estimated by the area under the ROC curve were all greater than 0.8 (Table 4), suggesting the high discrimination accuracy [28]. Hosmer-Lemeshow test validated the model fitting because p values were all greater than 0.05. Then, two groups divided by the threshold of time-averaged UA were subjected to a Kaplan-Meier analysis ( Table 4) that were plotted in Fig 3A-3C. The patients with the higher time-averaged UA showed significantly higher hazard ratios for ESRD irrespective of the thresholds (6.0 mg/dL, HR 4.53, 95%CI 1.79-11.43; 6.5 mg/dL, HR 3.39, 95%CI 1.55-7.42; 7.0 mg/dL, HR 2.19, 95%CI 1.28-3.75). Numbers needed to treat in the three thresholds of time-averaged UA were 8 to 9 (Table 4), regarded as very small numbers [22,29].
Stratified Cox proportional hazards model. A subsample after trimming 5 percentiles was re-stratified on the quintiles of the propensity scores and the distribution of the propensity scores were depicted side by side in box-plots, demonstrating well overlapped distribution between lower and higher time-averaged UA (Fig 4A-4C). Survival analysis was performed by use of a stratified multivariate Cox proportional hazards model. Proportional hazards assumption was not violated regarding baseline covariates tested. Multicollinearity among the covariates was not observed, either. The result showed the significantly higher crude hazard ratios ( 6.0 mg/dL, HR 3.63, 95%CI 1.25-10.58; 6.5 mg/dL, HR 3.46, 95%CI 1.56-7.68; 7.0 mg/dL, HR 2.05, 95%CI 1.21-3.48) (all p < 0.05; Table 5, Model 1). Hazard ratios decreased by adjusting for sex, age, diabetic nephropathy, baseline estimated GFR and proteinuria (Table 5, Model 2), whereas Akaike information criterion dramatically decreased, suggesting of better model fitting. Since a standard multivariate Cox regression unveiled that albumin, Na-Cl, phosphorus, LDL cholesterol were independently significant as predictors of ESRD (Table A in S1 File), the final model was chosen by adjusting for these covariates (Table 5, Model 3). Although hazard ratios of time-averaged UA were virtually the same, Akaike information criterion decreased more, resulting in even better model fitting. However, the timeaveraged UA of 6.0 mg/dL did not show the statistical significance adjusted for other covariates ( Table 5, Model 2 or 3).
Covariate adjustment using a Cox proportional hazards model. Covariate adjustment was performed adjusting for the logit of the propensity scores using a Cox regression model. The results were virtually the same as a stratified Cox regression analysis except that the timeaveraged UA of 7.0 mg/dL adjusted for other covariates lost its statistical significance (Table 6, Model 3). Of note is that Akaike information criterion in any corresponding model turned out higher than those by a stratified Cox regression, indicating the poorer model fitting in this method (Tables 5 and 6). Moreover, some of the baseline covariates and the logit of the propensity scores showed multicollinearity, resulting in inability of further adjustment for the baseline covariates when necessary.

Discussion
In the present study we could show the significant impact of higher UA in the follow-up on the subsequent incidence of ESRD by applying the propensity score analysis. Three different approaches originally established by Rosenbaum and Rubin were implemented; matching, stratification and covariate adjustment [15]. Crude hazard ratios of the higher time-averaged UA were significantly high and ranged somewhere between 2 and 4 depending on the thresholds and the methods whereas adjustment for other covariates only assured the result of UA 6.5 mg/dL. It was probably because the balance between 2 groups after the propensity score matching was not perfect but residual imbalance existed as suggested by standardized difference of some covariates > 0.1 [25,30], which could be adjusted for by multivariate Cox regression-based methods. To the best of our knowledge, the present study is the first to investigate the causal inference of serum UA whether the higher level may link to the incidence of ESRD. Table 4. Kaplan-Meier analysis before and after the propensity score matching and hazard ratio, absolute risk reduction and number needed to treat at 5 years of follow-up.  The target UA in the follow-up seems less than 6.5 mg/dL, which should be confirmed in the future by an ongoing randomized controlled trial in a double-blind manner [31]. An application of a propensity score analysis rapidly increases in the literature because it can approximate randomized controlled trials using retrospective observational cohorts [14,32,33]. The method also enables one to investigate the causal effect which cannot be otherwise executed in a randomized controlled manner such as the effect of smoking on the mortality and the progression of IgA nephropathy [34,35]. The rationale of a propensity score analysis resides in balancing the many baseline confounders between two groups of certain test parameters by binary logistic regression analysis [15]. The propensity score analysis done in the present study could show the robust results but slight differences are worth mentioning when compared among the three methods. A matching method among the three is most intuitive for researchers and readers because of approximation to randomized controlled trials. In addition, one can estimate not only relative risk but also absolute risk and number needed to treat [34], providing more useful information for clinical decision making. In this study, the numbers needed to treat for targeting UA in the follow-up were computed at 7.3 to 8.7 in three different thresholds, which implies that we can rescue one extra patient if serum UA is adequately treated to 8 to 9 patients over 5 years. The number seems pronouncedly low and thus encourages physicians to intervene serum UA in the follow-up [36,37]. Notwithstanding, disadvantages are 2-fold. First, the number of participants decreases appreciably and the generalizability of the cohort loses after matching. Second, residual imbalance may be difficult to remove even if paired analyses do not show the statistical significance. Instead, standardized difference not influenced by the sample size has a strong power to detect the imbalance between 2 groups [20]. In fact, our result expelled the statistical significance when a multivariate Cox regression was employed to control for other covariates, suggesting the presence of some residual imbalance.
Other two methods using a Cox proportional hazards model can be more ideal in which residual bias can undergo further adjustment. There was still some difference between Cox proportional hazards model-based methods; the model fitting was better in a stratified analysis than adjusting for the logit of the propensity scores. It may be due to the fact that the stratification on quintiles is able to remove 90% bias as indicated by the original work of Rosenbaum and Rubin [15]. Moreover, adjusting for the logit of the propensity scores lost the independent significance of time-averaged UA 7 mg/dL. It was also found that multicollinearity existed between the logit of the propensity scores and other baseline covariates in some situation. Taken all together, we found that a stratified multivariate Cox proportional hazards model may be most robust among three methods in an attempt to address the causal effect using observational data.
Uric acid ranks as a candidate risk factor for CKD progression according to many observational studies [4,10,38]. Since UA and CKD constitutes a typical "chicken and egg problem," one should be prudent to conclude the cause and result relationship [10][11][12]. To overcome this issue prospective interventional study is indispensable and the several studies were indeed executed previously. However, consensus remains to be reached because of the relatively small sample sizes and the lack of double-blind style placebo arm, and a time-to-event analysis could not show the positive result [39,40]. Most recently, however, Goicoechea and her associates, clearly showed that although the earlier result over 2 years failed to show the significant result, the renal survival rate significantly increased in allopurinol-treated patients in the extension study over 7 years [40,41]. Of interest is that the difference in serum UA in control (mean 7.2 mg/dL) and in allopurinol-treated patients (mean 6.5 mg/dL) was relatively small [41]. Responding to their result, Bellomo speculated a very interesting hypothesis that to slow kidney damage, UA concentration does not need to be reduced as low as possible but simply maintained below the saturation point of 6.8 mg/dL [42]. Our present results agree with his notion because the threshold UA of 6.0 mg/dL did not reveal the significant effect after adjustment for the residual bias.
We have to mention about some limitations of the present study. The biggest problem is the potential presence of unmeasured confounding which cannot be avoided in any observation study. Secondly, there is a possibility of misspecification of the propensity score model which cannot be asserted by any means, either. We believe the latter problem could be solved by employing a stratified multivariate Cox proportional hazards model as herein demonstrated. Thirdly, the number of the participants was small so that sex difference was not examined. Despite these limitations, propensity score analysis clearly captures its overwhelming strength to freely scrutinize the test conditions such as target threshold and so on. Needless to say, randomized controlled trials remain the gold standard to build evidence while a propensity score analysis may serve complementary approaches in the clinical research on the causal effect.

Conclusion
We have demonstrated that serum UA in the follow-up is independently associated with the risk of ESRD by the propensity score analysis. A stratified multivariate Cox proportional hazards model is deemed superior to other methods such as matching and adjustment for the logit of the propensity scores in generalizability of the cohort and visualization of the residual bias. Target range of serum UA in the follow-up may be less than 6.5 mg/dL to abrogate the progression of CKD to ESRD.
Supporting Information S1 File. Table A. Baseline characteristics of the CKD cohort (n = 803).