A Panel of Novel Biomarkers Representing Different Disease Pathways Improves Prediction of Renal Function Decline in Type 2 Diabetes

Objective We aimed to identify a novel panel of biomarkers predicting renal function decline in type 2 diabetes, using biomarkers representing different disease pathways speculated to contribute to the progression of diabetic nephropathy. Research Design and Methods A systematic data integration approach was used to select biomarkers representing different disease pathways. Twenty-eight biomarkers were measured in 82 patients seen at an outpatient diabetes center in The Netherlands. Median follow-up was 4.0 years. We compared the cross-validated explained variation (R2) of two models to predict eGFR decline, one including only established risk markers, the other adding a novel panel of biomarkers. Least absolute shrinkage and selection operator (LASSO) was used for model estimation. The C-index was calculated to assess improvement in prediction of accelerated eGFR decline defined as <-3.0 mL/min/1.73m2/year. Results Patients’ average age was 63.5 years and baseline eGFR was 77.9 mL/min/1.73m2. The average rate of eGFR decline was -2.0 ± 4.7 mL/min/1.73m2/year. When modeled on top of established risk markers, the biomarker panel including matrix metallopeptidases, tyrosine kinase, podocin, CTGF, TNF-receptor-1, sclerostin, CCL2, YKL-40, and NT-proCNP improved the explained variability of eGFR decline (R2 increase from 37.7% to 54.6%; p=0.018) and improved prediction of accelerated eGFR decline (C-index increase from 0.835 to 0.896; p=0.008). Conclusions A novel panel of biomarkers representing different pathways of renal disease progression including inflammation, fibrosis, angiogenesis, and endothelial function improved prediction of eGFR decline on top of established risk markers in type 2 diabetes. These results need to be confirmed in a large prospective cohort.


Introduction
The growing prevalence of type 2 diabetes is a great global health problem. Type 2 diabetes is the leading cause of chronic kidney disease (CKD) in the United States and is associated with high cardiovascular risk [1,2]. Optimizing treatment has been shown to improve life expectancy, reduce costs, and lower the risk of death in patients with type 2 diabetes [3,4]. Despite important progress in improving therapy, many patients are still at risk for renal disease.
Early identification of patients with type 2 diabetes at risk for progressive renal function loss during the early stages of disease may lead to better patient outcomes. In clinical practice, estimated glomerular filtration rate (eGFR) and albuminuria are used to assess renal function when gold-standard measured GFR is not feasible or practical. The search for novel biomarkers that improve risk prediction models on top of established risk markers has been a priority of many researchers for many years. Various studies have assessed the performance of single biomarkers representing a single, disease-associated pathway to predict progression of renal function loss in type 2 diabetes [5,6]. However, because type 2 diabetes is a multifactorial disease, several pathways involving pro-inflammatory, pro-fibrotic, and angiogenic processes, among others, are activated during the course of the disease [7]. Given the complexity of the multiple pathophysiological processes involved in progression of type 2 diabetes together with the intra-individual variability of biomarkers, it is questionable if a single biomarker may possess useful diagnostic and prognostic power. Alternatively, a combination of biomarkers that capture different pathways of renal damage may provide a more realistic picture of a patient's actual pathophysiological status and hence may yield better assessment of disease prognosis performance. Therefore, we aimed to identify a novel panel of biomarkers representing different disease pathways that are speculated to contribute to the progression of renal disease in type 2 diabetes, and to evaluate their combined predictive performance of accelerated renal function decline.
Organisation for Scientific Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: D de Zeeuw has consultancy agreements with the following companies: Abbvie, Astellas, Bristol-Meyers Squibb, Hemocue, Johnson & Johnson, Merck Sharpe & Dohme, Novartis, Reata Pharmaceuticals and Vitae. All honoraria are paid to his institution. HJ Lambers Heerspink has consultancy agreements with the following companies: Abbvie, Astellas, Johnson & Johnson, Reata Pharmaceuticals and Vitae. All honoraria are paid to his institution. R Goldschmeding has been employed by and has received research support from FibroGen. B Mayer is managing partner, and P Perco and A Heinzel are employees of emergentec biodevelopment GmbH, and have declared that no competing interests exist. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.
albumin:creatinine ratio (UACR), serum creatinine, cholesterol, and glycated hemoglobin (HbA 1c ) was obtained from electronic patient files from visits to the outpatient diabetes clinic during their annual visit to the diabetes specialist.

Ethics Statement
The PREDICTIONS study was approved by the ethical review boards of the medical ethics committees of the Isala Clinics in Zwolle and of the University Medical Center in Groningen, The Netherlands and was conducted in accordance with the guidelines of the Declaration of Helsinki. All patients gave written, informed consent.

Selection of biomarkers, sample collection, preparation, and measurement
Twenty-eight biomarkers were selected for testing using three distinct approaches, namely a literature review [10], identification of molecular processes and pathways [7], and ranking of consolidated Omics signatures [11]. A complete list of biomarkers is presented in Table 1, and the biomarker selection procedure is described in S1 Appendix.
Fasting serum and plasma samples were stored at -80°C. All samples were stored for 4-5 years and did not undergo any freeze-thaw cycles. Biomarkers were assayed on baseline samples by enzyme-linked immunosorbent assay (ELISA) or multiplex assay by Biomarker Design Forschungs GmbH (BDF), in Vienna, Austria, except for connective tissue growth factor (CTGF). CTGF was measured using specific antibodies (FibroGen Inc., San Francisco, USA) directed against distinct epitopes in the amino-terminal fragment of CTGF, as described previously [12]. All assays were used according to manufacturer's instructions. A complete list of assays, and information on stability and determination of limits of detection are available in S2 Appendix. All biomarker analyses were performed blinded, and the results were then reported back to the study center for analysis.

Statistical analysis
Analyses were performed with SAS software (version 9.2; SAS Institute, Cary, NC) and R version 3.0.2 [13] using the packages mice and glmnet [14,15]. Data are presented as mean (standard deviation) or median [1 st , 3 rd quartile] for skewed variables. Graphical techniques were used to detect outliers. The natural logarithm of UACR and the binary logarithm of all biomarkers were used to normalize their distributions. Log transformed variables were used in all regression analysis. Values below the detection limit were set to the detection limit. Variables with missing values were multiply imputed using chained equations [16]. Five of the twentyeight biomarkers had values with >10% missing or >25% below the detection limit were not used in analysis. Details on our implementation of multiple imputation can be found in S3 Appendix. All p-values were two-tailed, and values < 0.05 were considered statistically significant.
The outcome of interest was eGFR decline, defined as the within-patient annual eGFR slope. EGFR decline was calculated using a minimum of 3 serum creatinine measurements during follow-up by fitting a straight line through the eGFR values using linear regression. The eGFR value at each time-point was estimated using the 4-variable Modification of Diet in Renal Disease (MDRD) Study Equation [17].
Statistical modeling consisted of several steps. First, established risk markers were selected as best predictors of eGFR decline using least absolute shrinkage and selection operator (LASSO) selection [18]. The LASSO is advantageous for small samples sizes because it places restrictions on the absolute sizes of the regression coefficients in the model while optimally selecting the subset of variables that best predicts the outcome. This restriction also controls for multicollinearity. LASSO involves the estimation of a tuning parameter controlling the amount of restriction, which was optimized by minimizing the leave-one-out cross-validated mean squared error of prediction. The established risk markers listed in Table 2 were considered as potential predictors of eGFR decline. All established risk markers were first included in a multivariable model using LASSO regression. The best predictors of eGFR decline were then identified from the multivariable model and are reported in the results section. Second, univariate linear models were fit for each of the novel biomarkers to assess a single biomarker association with eGFR decline. Third, multivariable models were then fit by linear regression with single novel biomarkers adjusting for the selected established risk markers. Fourth, a multivariable model including the selected established risk markers and all biomarkers was fit using the LASSO selection in order to find the best subset of predictors. Bootstrap validation was performed to determine the validity of the model to assess the ability of the biomarker panels to predict renal function decline. The bootstrap (N = 1000) was used to evaluate selection probabilities of each biomarker, and to construct 95% confidence intervals and two-sided p-values for the regression coefficients by the percentile method. A global p-value testing the global null hypothesis of no added value of the biomarkers was constructed by counting the number of bootstrap resamples in which the multivariable biomarker model led to a smaller cross-validated mean squared error (MSE) than a model based on the established risk markers alone. In a simple bootstrap validation, LASSO models were fit to the 1000 bootstrap resamples, each time optimizing the cross-validated MSE as described above. These models were then applied to the original data without modification. The resulting MSE was calculated by averaging the squared average difference between the original outcome and the predicted outcome for each patient. This was done for models only considering the established risk markers, and for models considering clinical and biomarker predictors. From the MSEs, R 2 measures were finally derived in order to determine whether the biomarkers significantly improved prediction.
The added value of the biomarker panel was also evaluated using the discriminative index (C-index) by dichotomizing the observed outcome variables into accelerated or non-accelerated renal function decline (eGFR decline <-3 or >-3 mL/min/1.73m 2 /year, respectively) and comparing this with predicted probabilities of eGFR decline (see S3 Appendix). The C-index was also calculated using the simple bootstrap validation scheme, and the differences in the C-index between a model of only established risk markers and a model of established risk markers plus biomarkers were assessed. The threshold of -3 mL/min/1.73m 2 was based on prior studies and its concurrence with the high quartile of eGFR decline [19,20].

Baseline characteristics and association with eGFR decline
Baseline characteristics are presented in Table 2. The average age of the cohort was 63.5 (SD 9.4) years and 53.7% were male. Type 2 diabetes was well established in the study population with average diabetes duration of 15.7 (SD 7.3) years. Renal function was relatively preserved in the cohort with an average eGFR of 77.9 (SD 22.6) mL/min/1. The following best predictors of eGFR decline were selected from the LASSO selection: baseline UACR, current vs. never smoker, sex, systolic and diastolic blood pressure, use of oral diabetic medication, and baseline eGFR (S1 Fig).
When these three biomarkers were modeled on top of the established risk markers, they did not improve the explained variability (R 2 ) of eGFR decline (35.7% compared to 37.7% of the reference model; p = 0.988). The three biomarkers also did not increase the C-index for prediction of accelerated renal function decline (0.860 compared to 0.835 of the reference model; p = 0.262).

Selection of optimal combination of established risk markers and biomarkers
Although most individual biomarkers were not found to be independently associated with eGFR decline, we hypothesized that the combination of biomarkers representing different disease pathways may improve prediction of eGFR decline. In a multivariable LASSO selection, the optimal model for prediction of eGFR decline was achieved after inclusion of 19 variables (Fig 1). The model included a subset of 13 novel biomarkers representing fibrosis, angiogenesis, inflammation, mineral metabolism, and endothelial function that, when added to the established risk markers, more accurately predicted the rate of eGFR decline ( Table 3). The explained variability of the model (R 2 ) markedly increased from 37.7% to 54.6% (p = 0.018) and predicted a higher probability of accelerated renal function decline (Fig 2). There was also a significant improvement in the C-index of the optimal model for prediction of accelerated renal function decline (0.896 compared to 0.835 of the reference model; p = 0.008) (Fig 3).
To investigate the importance of each of the predictors in the optimal model, we omitted, one by one, variables from the full model. If a variable was omitted from the model, the other predictor variables could be selected instead. Only the omission of UACR or systolic blood pressure resulted in relevant inclusions of other novel biomarkers (S1 Table).

Discussion
In this study, we established that a combination of different biomarkers representing different pathways that are speculated to be involved in the progression of renal disease improves prediction of eGFR decline. Although some biomarkers were not independently associated with eGFR decline, when combined into a multi-biomarker model, the combination of biomarkers improved renal risk stratification, suggesting that these biomarkers may possess synergistic effects in predicting renal function loss.
Diabetic kidney disease is characterized by the functional impairment and structural remodeling of the kidney and is linked to the changes in the kidney. Diabetic nephropathy is well characterized by glomerular hypertrophy and hyperfiltration, inflammation of glomeruli and tubuliointerstitial regions, and reduction of cell number by apoptosis and accumulation of extracellular matrix (ECM). Each of the biomarkers selected in the optimal model has been associated with one of these pathophysiological processes involved in diabetic nephropathy.
First, chronic inflammation has long been identified in the pathogenesis of type 2 diabetes and progression of diabetic nephropathy, and inflammation is well represented by the biomarkers included in the optimal model. Tumor necrosis factor alpha is a key mediator of inflammation and plays a role in apoptosis. It mediates its signal via two distinct receptors, TNFR1 and TNFR2. Circulating forms of both TNF receptors were recently shown to predict ESRD in type 2 diabetes [6]. Monocyte chemoattractant protein-1 (CCL2), another marker of inflammation, is a potent C-C chemokine for monocyte/macrophages and T cells. Increased amounts of CCL2 have been detected in renal biopsies and urine from patients with diabetic nephropathy [21], and CCL2 has been shown to be a marker of late stage diabetic nephropathy [22]. Currently there are a couple of clinical trials ongoing that target CCL2 receptor as a means to delay progression of diabetic nephropathy (www.clinicaltrials.gov identifier NCT01712061, NCT01752985). Results of these studies will provide more insight whether CCL2 is a causal factor or consequence of renal function loss. Additionally, YKL-40, a proinflammatory marker, has been identified as an independent factor associated with albuminuria in early stage nephropathy in type 2 diabetes and might have a useful role as a noninvasive marker for the early diabetic nephropathy detection [5,23]. High YKL-40 levels have been shown to predict mortality in patients with type 2 diabetes [24]. Future mechanistic studies exploring the interplay between different inflammatory markers will help determine which markers are causal factors or consequences in the progression of diabetic kidney disease. Second, the optimal model included several biomarkers linked to pro-fibrotic processes. Fibrosis, resulting from expansion and change in composition of ECM in the kidney, is a well-known pathologic feature of diabetic complications. Altered expression of matrix metalloproteinases (MMPs) have been implicated in the progression of diabetic nephropathy by affecting the breakdown and turnover of ECM. In mice, the overexpression of MMP-9 has been shown to induce podocyte dedifferentiation, interrupt podocyte cell integrity, and promote podocyte monolayer permeability to albumin and extracellular matrix protein synthesis [25]. In humans, serum MMP7 has been shown to be increased in diabetic renal disease and diabetic diastolic dysfunction [26]. In support of this, our study showed that higher concentrations of MMP7 were independently associated with eGFR decline. CTGF is another well investigated pro-fibrotic biomarker that was included in the optimal model. CTGF, which is upregulated in diabetic nephropathy and contributes to extracellular matrix accumulation, has been associated with both early and late stage diabetic nephropathy [12,22]. Down-regulation of CTGF and vascular endothelial growth factor-A (VEGF-A) in diabetic nephropathy is speculated to be a result of podocyte loss [27]. Our data, in conjunction with data from literature, support the importance of fibrotic pathways in the initiation and progression of diabetic kidney disease. Third, we included a marker representing angiogenesis. Angiogenesis is the formation of new blood vessels from pre-existing vasculature. Neovascularization has been implicated in the genesis of diverse diabetic complications such as retinopathy, impaired wound healing, neuropathy, and diabetic nephropathy. In both physiological and pathological angiogenesis, tyrosine kinase (TEK) plays a key role. TEK is principally expressed in endothelial cells and inhibits vascular permeability and tightens preexisting vessels [28]. Additionally, TEK plays a critical role in the angiogenesis of endothelial cells via binding to angiopoietin [29].
Finally the model included a marker representing endothelial function. Endothelial dysfunction is considered an initial step of the atherosclerotic process because diabetes substantially impairs vasodilating properties of the endothelium which leads to impaired vasodilation and ultimately endothelial dysfunction [30]. C-type natriuretic peptide (CNP), a member of the natriuretic peptide family, is produced in vascular endothelium. Our study implies that natriuretic (NT)-proCNP, the N-terminal fragment of the C-type natriuretic peptide precursor, contributes to prediction of eGFR decline. NT-proCNP has been shown to be associated with arterial stiffness, endothelial dysfunction, and early atherosclerosis [31], however the link of NT-proCNP to type 2 diabetes and nephropathy is still under investigation.
In our study, most biomarkers were not able to individually predict eGFR decline after adjustment for established risk markers, and the model of 3 biomarkers did not statistically improve prediction. Rather, the optimal model of 13 biomarkers yielded best and significant improvements in the C-index. Advancing laboratory techniques allowing simultaneous measurement of many biomarkers are becoming more and more realistic in clinical practice. Whether the biomarkers identified are either involved in the causal pathway contributing to CKD progression, or are markers of its risk, or are merely the end-product of existing pathological processes, remains an important and unresolved question that requires further exploration. A future study on etiology to examine the causal relationship between these biomarkers as risk factors of renal disease would be appropriate, and issues of confounding could then be addressed. Testing for confounding was beyond the scope of this prediction study; however, we were able to investigate the importance of each of the predictors in the optimal model. Baseline UACR was found to have the largest impact on eGFR decline, and only the omission of baseline UACR or systolic blood pressure allowed inclusions of other novel biomarkers into the model. The combination of multiple biomarkers in the final, optimal model appears to be more accurate in risk stratification for accelerated renal function decline in patients with type 2 diabetes.
There are some studies in literature that use a multi-biomarker approach for risk prediction in CKD. A recent study showed that the combination of a panel of biomarkers including inflammation, fibrosis, and cardiac stretch and injury improved prediction of death in a Canadian CKD cohort; however, this study was conducted in a cohort with different CKD etiology [32]. Additionally, in another study of multiple protein biomarkers, 17 urinary and 7 plasma biomarkers were evaluated to predict progression. C-terminal FGF-23 and VEGF-A were found to be associated with the end point independent of urine albumin/creatinine. In that study many biomarkers were tested one by one, but did not use a combined biomarker approach to predict renal disease progression [33]. Furthermore, a panel of multiple urinary cytokines was found to predict rapid renal function decline in overt diabetic nephropathy [34]. However, that study included a heterogeneous population of patients with both type 1 and type 2 diabetes. Finally, in a post-hoc study from the IRMA-2 trial showed that multiple biomarkers of endothelial dysfunction and possibly inflammation were predictors of progression to diabetic nephropathy in patients with type 2 diabetes and microalbuminuria independent of traditional risk markers [35].
Advances in high throughput analytical methods has fueled novel biomarker discovery. Two such platforms, namely proteomics and metabolomics, have shown promise in multi-biomarker discovery for the diabetic CKD. A urinary peptide classifier, consisting of 273 defined urinary peptides, was recently discovered as a good classifier in patients with CKD [36] and validated in an independent cohort as a predictor of albuminuria progression in patients with type 2 diabetes [37]. Furthermore, a panel of 13 metabolites linked with mitochondrial metabolism was significantly reduced in CKD patients with diabetes compared to healthy controls [38], and the combination of plasma metabolites butenoylcarnitine and histidine, and urine metabolites glutamine, tyrosine, and hexoses were able to predict the progression from microto macroalbuminuria in patients with type 2 diabetes [39].
Interestingly in our study, HbA 1c and duration of diabetes were not strong predictors of eGFR decline, whereas albuminuria was identified as the strongest predictor. The exclusion of HbA 1c and duration of diabetes from the reference model may be due to small variations in these parameters within this population. Regarding albuminuria, there is evidence that demonstrates albuminuria as a strong risk predictor of renal function loss in patients with type 2 diabetes [40][41][42][43]. Moreover, experimental data show that increased albumin exposure to the tubuli causes tubulo-interstitial damage through activation of pro-inflammatory mediators, which leads to a progressive decline in glomerular and tubular function, ultimately culminating in end-stage renal disease [44,45]. Our data on albuminuria as a strong predictor of eGFR decline are in line with this and highlight the importance of screening for high albuminuria to identify individuals at risk of progressive renal function loss. At the same time, it may be interesting to explore the predictive ability of urine biomarkers alongside albuminuria for renal disease progression as urine is considered quite a suitable substrate to measure biomarkers linked to kidney disease due to the practical advantages of collecting urine compared to blood samples. Since our study measured biomarkers in blood, we are unable to speculate if urine biomarkers, or the combination of both blood and urine markers, would yield similar predictive capabilities.
There are strengths and limitations to this study. A clear strength is the use of a multimarker, multi-pathway approach for identifying and testing biomarkers in a population of patients with type 2 diabetes over approximately 4 years of follow-up. The clear limitation is the measurement of multiple biomarkers in a small sample size. However, as advancing laboratory techniques generate larger amounts of data, methods of data analysis to accommodate "big data" with smaller sample sizes are needed. The rigorous statistical method of the LASSO regression allowed for modeling many biomarkers in the small sample size, and multiple imputation was used to avoid truncating observations due to missing data. The true predictive capacity of the model could have been overestimated due to the prediction model being developed and tested in the same sample, and we do agree that external validation is necessary. In the absence of external validation, we performed internal bootstrap validation in an attempt to minimize this limitation [46]. GFR was estimated using a serum creatininebased equation instead of by direct measurement, which may have contributed to misclassification bias. However, this could have only resulted in an underestimation of the strength of the reported associations. We chose to omit five biomarkers from our analysis due to many missing or below LOD values. While the exclusion of these biomarkers from our analysis may have resulted in an underrepresentation of pathways, the omission of biomarkers could have only underestimated the predictive ability of the biomarker panel. Additional limitations include the lack of information concerning insulin use, diet, and renin-angiotensinaldosterone system medication type and dose, which clearly represent unmeasured confounders in our study.
In conclusion, novel biomarkers may provide deeper understanding into the pathophysiology of CKD or diabetic nephropathy but identification of progression-associated molecular pathways via biomarkers as proxy may also help to identify novel therapeutic targets. We identified a novel panel of biomarkers representing different pathways of renal damage, including inflammation, fibrosis, angiogenesis, and endothelial function. This combined panel improved prediction of accelerated renal function decline in patients with type 2 diabetes on top of established risk markers. The results of this study need to be validated in a large, prospective cohort to validate and assess its applicability in a broad type 2 diabetes population.
Supporting Information S1 Appendix. Selection of biomarkers.