Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction models used in the progression of chronic kidney disease: A scoping review

  • David K. E. Lim ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    David.K.Lim@curtin.edu.au

    Affiliation Curtin School of Population Health, Curtin University, Perth, WA, Australia

  • James H. Boyd,

    Roles Supervision, Writing – review & editing

    Affiliations Curtin School of Population Health, Curtin University, Perth, WA, Australia, La Trobe University, Melbourne, Bundoora, VIC, Australia

  • Elizabeth Thomas,

    Roles Conceptualization, Investigation, Supervision, Writing – review & editing

    Affiliations Curtin School of Population Health, Curtin University, Perth, WA, Australia, Medical School, The University of Western Australia, Perth, WA, Australia

  • Aron Chakera,

    Roles Investigation, Supervision, Writing – review & editing

    Affiliations Medical School, The University of Western Australia, Perth, WA, Australia, Renal Unit, Sir Charles Gairdner Hospital, Perth, WA, Australia

  • Sawitchaya Tippaya,

    Roles Investigation, Writing – review & editing

    Affiliation Curtin Institute for Computation, Curtin University, Perth, WA, Australia

  • Ashley Irish,

    Roles Writing – review & editing

    Affiliation WA Country Health Service, Perth, WA, Australia

  • Justin Manuel,

    Roles Writing – review & editing

    Affiliation WA Country Health Service, Perth, WA, Australia

  • Kim Betts,

    Roles Writing – review & editing

    Affiliation Curtin School of Population Health, Curtin University, Perth, WA, Australia

  • Suzanne Robinson

    Roles Conceptualization, Funding acquisition, Investigation, Supervision, Writing – review & editing

    Affiliations Curtin School of Population Health, Curtin University, Perth, WA, Australia, Deakin Health Economics, Deakin University, Burwood, VIC, Australia

Abstract

Objective

To provide a review of prediction models that have been used to measure clinical or pathological progression of chronic kidney disease (CKD).

Design

Scoping review.

Data sources

Medline, EMBASE, CINAHL and Scopus from the year 2011 to 17th February 2022.

Study selection

All English written studies that are published in peer-reviewed journals in any country, that developed at least a statistical or computational model that predicted the risk of CKD progression.

Data extraction

Eligible studies for full text review were assessed on the methods that were used to predict the progression of CKD. The type of information extracted included: the author(s), title of article, year of publication, study dates, study location, number of participants, study design, predicted outcomes, type of prediction model, prediction variables used, validation assessment, limitations and implications.

Results

From 516 studies, 33 were included for full-text review. A qualitative analysis of the articles was compared following the extracted information. The study populations across the studies were heterogenous and data acquired by the studies were sourced from different levels and locations of healthcare systems. 31 studies implemented supervised models, and 2 studies included unsupervised models. Regardless of the model used, the predicted outcome included measurement of risk of progression towards end-stage kidney disease (ESKD) of related definitions, over given time intervals. However, there is a lack of reporting consistency on details of the development of their prediction models.

Conclusions

Researchers are working towards producing an effective model to provide key insights into the progression of CKD. This review found that cox regression modelling was predominantly used among the small number of studies in the review. This made it difficult to perform a comparison between ML algorithms, more so when different validation methods were used in different cohort types. There needs to be increased investment in a more consistent and reproducible approach for future studies looking to develop risk prediction models for CKD progression.

Introduction

Chronic Kidney Disease (CKD) is a global health burden with an estimated 5 to 10 million annual deaths worldwide due to kidney disease [1, 2]. Current data predict CKD will be the fifth leading cause of death worldwide by the year 2040 [3]. CKD is characterised by a gradual loss of the kidney’s ability to remove wastes from the blood, and the severity of the disease is determined by the individual’s estimated glomerular filtration rate (eGFR) [4]. CKD is arbitrarily categorised into five progressive stages with stage five often referred as end-stage kidney disease (ESKD), and its progression often leads to multiple overlapping complications [5, 6]. There is a spectrum of pathological, hereditary, and sociodemographic factors known to contribute to a decline in kidney function [511]. These factors include age (≥60 years), smoking, low socioeconomic status, diabetes, hypertension, cardiovascular disease, body mass index (≥30 kg/m2), family history of kidney disease and use of pain-reliving medications [911].

The global nephrology community recognises that current models of care are insufficient to curb the growing CKD burden and that new care models are required to improve patient outcomes [1214]. It has been suggested that the management framework for CKD needs to consider the disease across the entire life course of each individual [13]. New care models also need to consider improvements in areas such as disease surveillance, mitigation of risk factors, expanding research knowledge, and developing novel clinical interventions to slow the progression of CKD [13]. Despite having identified a number of risk factors associated with the onset of CKD, gaps remain in the methods for predicting the risk of CKD progression and interventions to slow CKD progression [13, 15, 16]. In addition, a large number of patients with CKD remain undetected through health systems [16] and clinicians have the challenge of managing the growing number of cases with limited tools for triaging patients.

Predictive modelling techniques

Predictive modelling techniques applied to the growing number of clinical datasets have shown promise in accurately predicting the progression of chronic disease in the population [1723]. Previous attempts have employed a wide range of prediction models, from well-established generalised linear models to more recent Machine Learning (ML) techniques [1723]. Renal clinicians and researchers recognise the significant potential in developing risk prediction models that can improve our ability to identify individuals at risk, in addition to potentially improving our understanding of the natural history of disease progression and contribute to the clinical management of CKD [22, 24, 25]. The application of ML models provides capacity to tap into the information contained in large and complex datasets and exploit the complex non-linear dependencies [18, 21, 23, 2628]. The application of these analytical techniques promises to improve our understanding of CKD progression and inform key interventions to help slow progression and reduce the burden of CKD [11, 2931]. Moreover, it can help inform clinicians with regards to treatment options by increasing confidence in the patient’s likely prognostic course [32, 33].

Whilst the use of predictive modelling is gaining traction in CKD research, efforts are beset by the lack of a uniform approach to the reporting of important methodological advancements and developments of prediction models for CKD progression [2325, 34]. This lack of consistent reporting of key characteristics and the evaluation of model performance has likely impeded uptake and support of prediction models by clinicians, while undermining reproducibility of research and clinical utility [24]. An example of a standardised reporting guidelines can be seen with the Equator Network who published the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement that consists of a checklist considered vital by healthcare professionals, methodologists and journal editors, for the transparent reporting of multivariable prediction model studies [35, 36]. By implementing such a checklist, reporting can be standardised and reproducibility improved while facilitating progress towards cross-validation between different health settings and populations globally.

With inconsistency in the advancements of predictive modelling used in CKD progression analysis, this paper provides timely evidence from a scoping review about prediction models used in the progression of CKD. The review aims to 1) Identify and outline existing models used in predicting CKD progression; 2) To understand what measured outcome(s) and selected significant variables were chosen when building a prediction model for CKD progression. Its results will help inform clinical and scientific developments in this area and provide a better understanding of CKD progression.

Classification of predictive models

Predictive modelling techniques can be generally classified into four broad categories; supervised, unsupervised, semi-supervised and reinforcement learning; with supervised and unsupervised being the most commonly applied in the medical field [18, 22, 25, 33]. This was also reflected in this scoping review where only supervised and unsupervised techniques were found in the studies that were assessed for full-text and will be discussed in later sections.

Supervised techniques can be further divided by the type of outcome they predict, with the two major groupings including continuous outcomes and categorical outcomes [37]. The regression technique is utilised when output variables are continuous data, such as values for weight or height [37]. On the other hand, classification techniques are commonly used for simpler data such as nominal or categorical data, where a simple binary outcome or a few predetermined categorical responses are required [37]. Supervised techniques have their own challenges and require sufficiently large volumes of correctly labelled data initially to perform accurately [26]. Some examples of commonly used supervised machine learning algorithms are linear or logistic regression, artificial neural networks, decision trees, k-nearest neighbours (KNN), random forest for classification, gradient boosting and support vector machines (SVM) [37, 38].

Unsupervised techniques can be further grouped into 2 types, clustering or association. Clustering is the process of segregating data into groups according to similar characteristics, whereas association is the process of identifying newer relationships within datasets based on certain selected attributes of the data. Additionally, unsupervised algorithms do not need manual labelling of datasets, as they can group data into clusters or identifying associations by themselves [26, 38]. The end result of these methods is to provide a simplified interpretation of a complex dataset, and often to sort observations into groups [38]. These groups can then be inspected for their ability to predict the outcome of interest. Some common examples of unsupervised ML algorithms include K-means clustering, mixture models, distribution models, dimensionality reduction, independent component analysis and principal component analysis.

Methods

A scoping review was selected as it allows identification and mapping of existing evidence and to investigate and determine the knowledge gaps surrounding the topic [39]. This method is suitable for examining emerging evidence across a broad field of study and was guided by the PRISMA extension for Scoping Reviews (PRISMA-ScR), following a standardised approach to search, screen, and report articles [40].

Data sources and searches

This scoping review was performed in the context of a larger study that investigates improving chronic kidney disease outcomes using linked routine records. With this context in mind, an initial concept grid was developed to address the objectives of the scoping review, together with the subsequent search histories that can be found in S1 Appendix. The review included studies in the past 10 years that developed or utilised any type of predictive modelling to predict the progression of CKD towards more severe stage of the disease. Articles included were published in peer-reviewed journals from any country, in the English language, between 1st January 2011 to 17th February 2022 inclusive. The which addresses the objectives of the scoping review. Four electronic databases, Medline, EMBASE, CINAHL, and Scopus were chosen for their bibliographic peer-reviewed publications that covers a broad range of medical life sciences, allied health, nursing and healthcare. Fig 1 illustrates the overall flow diagram of the literature review.

Study selection and search

Four main overarching concepts, as described in the concept grid, were selected for the development of the search strategy, they were: kidney disease; disease progression; techniques; outcomes. The initial search strategy was developed for use in Medline and subsequently adapted for the other databases- keywords and sub headers were amended to reflect search terms used in each respective database. The steps used in Medline are as follows:

  1. (chronic kidney disease* or chronic renal disease* or CKD or kidney disease* or kidney failure).ti,ab.
  2. Renal Insufficiency, Chronic/ or Kidney Failure, Chronic/ or Diabetic Nephropathies/
  3. 1 or 2
  4. (progress* adj7 (CKD or disease)).ti,ab.
  5. Disease Progression/
  6. 4 or 5
  7. (deep learning or machine learning or artificial intelligence or algorithms or prediction model* or statistic* model*).ti,ab.
  8. Artificial Intelligence/ or Big data/ or machine learning/ or algorithms/ or models, statistical/
  9. 7 or 8
  10. (End stage renal disease or ESRD or Transplant* or Hemodialysis or Hospitali?ation or Mortality or Morbidity or Heart failure or Stroke).ti,ab.
  11. Dialysis/ or Peritoneal Dialysis/ or Renal Dialysis/ or Kidney Transplantation/ or Cardiovascular Diseases/ or Hypertension/ or Coronary Artery Disease/ or Coronary Disease/ or Hospitalization/ or Heart failure/ or Stroke/
  12. 10 or 11
  13. 3 and 6 and 9 and 12
  14. limit 13 to (english language and yr = "2011 -Current")

The first key concept for kidney disease included keywords and MeSH terms used in steps 1 and 2, to capture different types of chronic kidney diseases, such as diabetic nephropathies or similar diseases, since it is chronic disease with multiple overlapping manifestations with associated comorbidities and risk factors [39]. The type of model used was not limited and included either statistical or ML algorithms used to predict CKD progression towards a wide range of clinical outcomes. A clear distinction was made that the study should examine prediction models for CKD progression, rather than models that predicted the onset of CKD.

Title and abstract screening

All articles were exported into EndNote and duplicate articles were removed. Two independent reviewers performed title and abstract screening by applying inclusion and exclusion criteria. Studies included in the review were based on inclusion criteria, which included an implementation of a predictive model that was developed through analysis of health records; and they also had to include a reported outcome on the progression of CKD. The list of exclusion criteria can be found in Table 1.

thumbnail
Table 1. Exclusion criteria during title and abstract screening.

https://doi.org/10.1371/journal.pone.0271619.t001

The authors recognised that CKD is a very broad topic and did not place restrictions on the type of predictive model that was developed, the population of interest, the source of health data records, the predictive variables that were used, or a specific outcome. If there were any disagreements to the exclusion of articles, it was resolved through a discussion between the two reviewers—if required, a third reviewer for adjudication.

Data extraction and quality assessment

The researchers wanted to better understand the significant considerations taken into account when developing a prediction model for CKD progression, and to explore how these studies measured CKD progression [35]. The information extracted followed the items listed on the TRIPOD statement such as the article’s title, author(s), publication year, year of study period, study locations and population size, study design (retrospective or prospective), predicted outcome(s), type of prediction model, predictors in the model, validation assessment, limitations, implications, eGFR formula and data balancing. Corresponding authors were contacted by email if full text was not available and were excluded if unobtainable.

Results

The initial search had a combined total of 516 articles across Medline, EMBASE, CINAHL and Scopus, of which 188 duplicates were removed. 328 articles were then screened for their title and abstract, of which 245 articles were excluded based on exclusion criteria. 83 articles were then assessed for full-text eligibility by inclusion criteria, and subsequently 33 articles remained and were included in final qualitative review. Table 2 summarises the final articles that were included for full-text review.

Predicted outcomes

It was generally observed that regardless of the model used, the predicted outcome included measurement of risk towards ESKD which were defined as [41, 43, 44, 4850, 53, 5660, 64, 69, 70, 72]:

  1. when the eGFR value is <15 mL/kg/min/1.73 m2 and / or
  2. the initiation of kidney replacement therapies (KRTs) such as dialysis or kidney transplantation.

This risk of ESKD was generally predicted for specified time intervals of 1, 2, 3, and 5 years for supervised models, and shorter time intervals of 3, 6, 12 and 18 months for unsupervised models. There were very few studies that had predicted outcomes such as progression from an earlier stage to a more severe stage of CKD, for example from stage 1 to stages 3 or 4 [54, 56], and other predicted endpoints of stated percentage decline in eGFR levels [9, 41]. Depending on the quality of the available dataset [59, 60], the predicted outcome could also be combined with other variables such death, comorbidities, the type of dialysis and the time of diagnosis [46, 59, 68]. Some examples of outcomes that integrated these additional variables include, predicting the chances of future KRT at the time of CKD diagnosis [70]; a ≥ 50% decline in the eGFR from baseline [50] or an eGFR decline ≥30% from baseline [41]; the 5-year risk of KRT in CKD stage 3 and 4 [56]; the mortality and progression to ESKD over five years [65].

Type of predictive model

Fig 1 shows that 31 studies implemented supervised models, and only 2 studies included unsupervised models with 1 of these 2 studies being a comparison study between supervised and unsupervised models. Of the studies that used supervised models, 21 studies implemented cox proportional hazards regression [4161]. Seven studies used machine learning (ML) methods [9, 6771], and one compared the performance among a number of ML techniques [70]. One study developed a model using Random Forest regression [68], and another study implemented a disease2disease model by first learning the International Classification of Diseases and then clustering the data into groups by considering the variables within the dataset [69]. A multistate marginal structural model (MS-MSM) was also developed in one study that considers an estimated effect of time-dependent variables towards the predicted outcome [72]. Other ML algorithms that were tested include, neural networks, decision tree, random forest, XGBoost, Gaussian Naïve Bayes and logistic regression [57, 70, 71].

Three studies [49, 64, 65] performed an evaluation of the Kidney Failure Risk Equation (KFRE), and three other studies developed their own unique scoring algorithm that predicted ESKD [55, 63, 66].

Significant variables in the model

Common predictors used in studies included age, sex, eGFR, urinary albumin to creatinine ratio (ACR), serum creatinine (SCr), diabetes, cardiovascular disease, body mass index (BMI), and high blood pressure. Each predictive model was unique and incorporated different combinations of variables, and slightly different definitions of variables, such as high blood pressure. A recent paper by Xu et al. [61] published in 2021 highlighted that there are currently no robust biomarkers to predict progressive CKD, but rather relied on multiple longitudinal kidney measurements, such as eGFR and proteinuria.

The eGFR formula was also not consistent across studies, 13 studies used the CKD Epidemiology Collaboration (CKD-EPI) equation [9, 41, 42, 44, 46, 51, 5456, 58, 63, 65, 66] and 9 studies used the Modification of Diet in Renal Disease (MDRD) equation [43, 45, 47, 49, 53, 60, 62, 64, 71]. Two studies used unique equations customised for their specific cohort [48, 69]. There were also 9 studies that did not specify the formula that they used to calculate the eGFR.

Study population

The smallest study [50] had 43 participants and the largest study included over 300,000 patient records [54]. Included study populations were from the United States [45, 5256, 60, 68, 69, 72], Canada [41, 46, 58, 59], Taiwan [9, 42, 70], Germany [64, 66], Japan [48, 50, 61, 67], France [43], Croatia [44], Korea [49, 51], United Kingdom [62], Iran [71], Romania [65], Spain [63], Netherlands [47] and China [57].

These studies that investigated on CKD progression used data records that were sourced from all levels of healthcare. Data records ranged from single medical facilities at a local level [58], to tertiary hospitals [50, 58, 59, 62, 67, 71], and to databases that were linked nationwide [42, 60, 69, 72]. The populations were also selected based on a particular comorbidity of interest, for example, polycystic kidney disease [41, 43, 63], ANCA associated vasculitis (AAV) [44], diabetes [46] or other cardiovascular conditions [53, 56, 67].

Validation assessment

30 papers reported on the performance of their respective predictive models (regardless of the type of prediction model used) with 25 studies assessing the performance of their model by measuring the Area Under the Curve (AUC) [9, 41, 43, 4554, 5662, 64, 6668, 70]. Both supervised and unsupervised techniques were shown to have used AUC to validate their prediction model, each having a relatively high value that indicated that their model was reliable in predicting their defined outcome. Relative performance of the prediction model was indicated using a variety of methods including sensitivity analysis, specificity, discrimination index and a goodness of fit analysis. However, only three studies were externally validated on an external population dataset [46, 56, 64].

Four studies explored the KFRE [41, 49, 64, 65] as a variable to try and improve the performance of their prediction model. Only one study reported using the F-score with confidence intervals [67], and there were a range of alternative measures that were used including the mean square error, mean absolute error, normalised mean square error, positive predictive values, negative predicted values, Harrell bootstrap resampling method, D-statistic and various confusion matrices [56, 68, 71].

Missing data & imbalanced data

The most common limitation reported was missing or limited data, potentially due to the quality and availability of the data collected. Studies tried to overcome this issue by filling in the missing data using imputation techniques and internal validation techniques to help justify the dataset [42, 60]. There were also studies that reported having unbalanced data and outlined the methods applied to re-balance the data before initiating model development [9, 42, 45, 62, 70]. Knowing these limitations, studies recognised that their prediction models would only be applicable to their own given study population and would require external validation to allow generalisation of their model to other populations [41, 4446, 49, 50, 53, 56, 59, 61, 6466, 69, 73, 74].

Discussion

The arrival of big data and data science techniques have supported better analytics using data from a variety of sources. However, many healthcare systems around the world are yet to fully utilise healthcare data for research purposes. Many of the data challenges within health relate to missing data, inconsistencies in recorded data and privacy concerns for linking data across organisations [75]. Despite these challenges, the application of health data is critical to support clinical decision making [31, 76, 77].

The success of disease management for conditions like CKD is dependent upon a clinician’s ability to identify the risk of disease progression and poor outcomes. By utilising big data analytics, healthcare professionals may be able to predict disease progression in a timely manner, allowing the potential for better treatment for patients and reduced health costs.

Our review identified studies that had developed models to predict patient outcomes for CKD that measured the risk of progression towards ESKD over given time intervals. There was no single gold standard model identified, with each study producing its own unique prediction model, dependent on cohort’s characteristics and quality of the available data. While Cox regression modelling was the predominant method; the burgeoning research on the use of ML techniques to improve the prediction of CKD progressing towards ESKD [23]. However, the decision to use a particular modelling technique should depend on finding the most suitable model based on the type of data available, size and dimensionality [19].

The application of both traditional and ML techniques have been explored as a way of determining the most significant variables or features for inclusion in the model [56, 70]. Studies that combined the use of both regression and ML techniques, first identified significant variables through regression prior to their inclusion into the development of a risk prediction model [56, 70, 78]. However, the practicality of determining significant features can be highly dependent on the availability and the quality of data. It is clear that the performance of a model is degraded if there is a lack of significant variables or if it includes irrelevant features [7880]. Therefore, it is also recommended that future studies attempt to obtain whole population datasets that can help reduce the risk of missing data within the dataset and overcome the limitation of small study populations that are not generalisable to whole populations.

The study by Norouzi et al. demonstrated that an unsupervised adaptive neuro-fuzzy inference system (ANFIS), a type of neural network, was able to accurately predict GFR at sequential 6, 12 and 18-month intervals [71]. Other supervised non-ML models such as the KFRE and the ERBP algorithms, also produced results with high accuracy [6365].

The comparison study by Dovgan et al. also demonstrated that features which correlated with a time approach produced the best results [70]. While the study did not include pathology results when developing their model, it produced the highest AUC via logistic regression, with XGBoost and Simple Gradient Descendent as a close second.

The MS-MSMs developed by Stephens-Shields et al. [72] developed a model that accounts for varying windows of time associated with different states while describing the effect of different exposures have on between states or endpoints. This is particularly applicable to the slow progression of CKD patients who enter the health system at different points in time and at various stages of the disease. In addition, each patient will have acquired different comorbidities and medical histories at different stages of their life.

Since the application of unsupervised and ML models are still in their exploratory stages, further research is required to investigate how these less explainable models manipulate very large and complex datasets that contain multi-dimensional and continuous variables [37, 78] and reflecting their application to predict CKD progression.

The review revealed a lack of consistent reporting of the methodology used for development and validation of prediction models. This often led to under reporting of model development, which hinders the ability of researchers to do a true comparison and externally validate their predictive models against existing models. This was emphasised when almost one third of studies reviewed did not report on the eGFR formula used, and is a significant limitation towards the development of this area of research. The development of a standardised reporting statement has yet to be widely implemented among CKD progression research which may be due to its relative novelty in the area of predictive modelling and statistical research [35].

Few studies explained how they attempted to re-balance their data, and methods differed for each study including log transformations, data resampling techniques, running simulation studies, and applying inversely proportional weights to class frequencies [9, 42, 62, 70]. The predictive models that have been developed are often difficult to implement locally as they lack information that allows clinicians to validate them. Limitations on data linkage within and between health organisations also contribute to the challenge of implementing this research, where siloed datasets are unlikely to be representative of whole populations. It is also recommended that future studies should include clear reporting of model development including any balancing of skewed datasets, steps to validate the model, and a description of how significant variables were chosen, which should theoretically at least include age, sex, eGFR (using a formula that provides reliable estimates for the study population), details on the population’s characteristics, ACR, BMI and time-related variables if available.

A reliable risk prediction model for CKD progression would not only provide clinicians with earlier identification of CKD patients at greatest risk of progression, it would also enhance consultations and help clinicians determine suitable treatment options to improve patient outcomes [81, 82].

Conclusions

Nephrology researchers are working towards producing an effective model to assist the detection of the risk of chronic kidney disease progression. The review highlights that supervised techniques, and more specifically, cox regression is the predominant model that is used to predict the progression of CKD. There were only a small number of studies in the review that used unsupervised and ML models, with the limited numbers making it very difficult to perform a comparison between these models. A more consistent and reproducible approach is required for future studies looking to develop risk prediction models for CKD progression. This would improve international collaborations and build upon the existing research to overcome the challenges to improve the effectiveness and reliability of these prediction models. Subsequently, this would also translate into enhanced health system planning, allocation of resources and improved health outcomes for CKD patients.

Supporting information

S1 Checklist. Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist.

https://doi.org/10.1371/journal.pone.0271619.s001

(DOCX)

S1 Appendix. Detailed search strategy per database.

Concept grids and search histories for Medline, EMBASE, CINAHL and Scopus.

https://doi.org/10.1371/journal.pone.0271619.s002

(DOCX)

Acknowledgments

This project is part of a larger 4-year collaborative partnership between Curtin University, La Trobe University, WA Department of Health, WA Country Health Service, WA Primary Health Alliance, and the DHCRC. All authors declare no conflict of interest and received no additional funding towards this manuscript. All authors also consent for publication and the supplementary data that supports the findings of this review will be available upon submission and publication.

References

  1. 1. Bikbov B, Purcell CA, Levey AS, Smith M, Abdoli A, Abebe M, et al. Global, regional, and national burden of chronic kidney disease, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2020;395(10225):709–33. pmid:32061315
  2. 2. Luyckx VA, Tonelli M, Stanifer JW. The global burden of kidney disease and the sustainable development goals. Bull World Health Organ. 2018;96(6):414–22D. Epub 2018/04/20. pmid:29904224.
  3. 3. Foreman KJ, Marquez N, Dolgert A, Fukutaki K, Fullman N, McGaughey M, et al. Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016–40 for 195 countries and territories. Lancet. 2018;392(10159):2052–90. Epub 2018/10/21. pmid:30340847; PubMed Central PMCID: PMC6227505.
  4. 4. Levey AS, Coresh J, Bolton K, Culleton B, Harvey KS, Ikizler TA, et al. K/DOQI clinical practice guidelines for chronic kidney disease: Evaluation, classification, and stratification. American Journal of Kidney Diseases. 2002;39(2 SUPPL. 1):i-ii+S1–S266. PubMed Central PMCID: PMC11904577.
  5. 5. Obrador GT, Schultheiss UT, Kretzler M, Langham RG, Nangaku M, Pecoits-Filho R, et al. Genetic and environmental risk factors for chronic kidney disease. Kidney international supplements. 2017;7(2):88–106. Epub 2017/09/20. pmid:30675423.
  6. 6. Chartier MJ, Tangri N, Komenda P, Walld R, Koseva I, Burchill C, et al. Prevalence, socio-demographic characteristics, and comorbid health conditions in pre-dialysis chronic kidney disease: Results from the Manitoba chronic kidney disease cohort. BMC Nephrology. 2018;19(1). pmid:30305038
  7. 7. Cisek K, Krochmal M, Klein J, Mischak H. The application of multi-omics and systems biology to identify therapeutic targets in chronic kidney disease. Nephrol Dialysis Transplantation. 2016;31(12):2003–11. Epub 2015/10/22. pmid:26487673.
  8. 8. Silva Junior GBD, Oliveira JGR, Oliveira MRB, Vieira L, Dias ER. Global costs attributed to chronic kidney disease: a systematic review. Rev Assoc Med Bras (1992). 2018;64(12):1108–16. Epub 2018/12/21. pmid:30569987.
  9. 9. Cheng LC, Hu YH, Chiou SH. Applying the Temporal Abstraction Technique to the Prediction of Chronic Kidney Disease Progression. Journal of Medical Systems. 2017;41(5):85. pmid:28401396
  10. 10. Haroun MK, Jaar BG, Hoffman SC, Comstock GW, Klag MJ, Coresh J. Risk Factors for Chronic Kidney Disease: A Prospective Study of 23,534 Men and Women in Washington County, Maryland. Journal of the American Society of Nephrology. 2003;14(11):2934. pmid:14569104
  11. 11. Jha V, Garcia-Garcia G, Iseki K, Li Z, Naicker S, Plattner B, et al. Chronic kidney disease: global dimension and perspectives. The Lancet. 2013;382(9888):260–72. pmid:23727169
  12. 12. Liyanage T, Ninomiya T, Jha V, Neal B, Patrice HM, Okpechi I, et al. Worldwide access to treatment for end-stage kidney disease: a systematic review. The Lancet. 2015;385(9981):1975–82. pmid:25777665
  13. 13. Levin A, Tonelli M, Bonventre J, Coresh J, Donner J-A, Fogo AB, et al. Global kidney health 2017 and beyond: a roadmap for closing gaps in care, research, and policy. The Lancet. 2017;390(10105):1888–917. https://doi.org/10.1016/S0140-6736(17)30788-2.
  14. 14. Wong LY, Liew AST, Weng WT, Lim CK, Vathsala A, Toh MPHS. Projecting the Burden of Chronic Kidney Disease in a Developed Country and Its Implications on Public Health. International Journal of Nephrology. 2018:1–9. pmid:30112209. Language: English. Entry Date: 20180707. Revision Date: 20180709. Publication Type: Article.
  15. 15. Lopez-Vargas PA, Tong A, Phoon RKS, Chadban SJ, Shen Y, Craig JC. Knowledge deficit of patients with stage 1–4 CKD: A focus group study. Nephrology. 2014;19(4):234–43. pmid:24428274
  16. 16. Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: a systematic review. PLoS medicine. 2012;9(11):e1001344. pmid:23185136
  17. 17. Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. The use of machine learning in rare diseases: a scoping review. Orphanet Journal of Rare Diseases. 2020;15(1):145. pmid:32517778
  18. 18. Du AX, Emam S, Gniadecki R. Review of Machine Learning in Predicting Dermatological Outcomes. Frontiers in Medicine. 2020;7(266). pmid:32596246
  19. 19. Stafford IS, Kellermann M, Mossotto E, Beattie RM, MacArthur BD, Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. npj Digital Medicine. 2020;3(1):30. pmid:32195365
  20. 20. McDougall RJ. Computer knows best? The need for value-flexibility in medical AI. Journal of Medical Ethics. 2019;45(3):156. pmid:30467198
  21. 21. Buch VH, Ahmed I, Maruthappu M. Artificial intelligence in medicine: current trends and future possibilities. Br J Gen Pract. 2018;68(668):143–4. pmid:29472224.
  22. 22. Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, et al. Potential value and impact of data mining and machine learning in clinical diagnostics. Critical Reviews in Clinical Laboratory Sciences. 2021;58(4):275–96. pmid:33739235.
  23. 23. Chaudhuri S, Long A, Zhang H, Monaghan C, Larkin JW, Kotanko P, et al. Artificial intelligence enabled applications in kidney disease. Seminars in dialysis. 2021;34(1):5–16. pmid:32924202.
  24. 24. Collins GS, Omar O, Shanyinde M, Yu LM. A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. Journal of Clinical Epidemiology. 2013;66(3):268–77. pmid:23116690
  25. 25. Muse ED, Topol EJ. A brighter future for kidney disease? The Lancet. 2020;395(10219):179. pmid:31954451
  26. 26. Myszczynska MA, Ojamies PN, Lacoste AMB, Neil D, Saffari A, Mead R, et al. Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nat Rev Neurol. 2020;16(8):440–56. Epub 2020/07/17. pmid:32669685.
  27. 27. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE Journal of Biomedical and Health Informatics. 2018;22(5):1589–604. pmid:29989977
  28. 28. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association. 2017;24(1):198–208. pmid:27189013
  29. 29. Inaguma D, Kitagawa A, Yanagiya R, Koseki A, Iwamori T, Kudo M, et al. Increasing tendency of urine protein is a risk factor for rapid EGFR decline in patients with CKD: A machine learning-based prediction model by using a big database. PLoS ONE. 2020;15(9 September). pmid:32941535
  30. 30. Nichols GA, Ustyugova A, Déruaz-Luyet A, O’Keeffe-Rosetti M, Brodovicz KG. Health Care Costs by Type of Expenditure across eGFR Stages among Patients with and without Diabetes, Cardiovascular Disease, and Heart Failure. Journal of the American Society of Nephrology. 2020;31(7):1594. pmid:32487562
  31. 31. Zeng X-X, Liu J, Ma L, Fu P. Big Data Research in Chronic Kidney Disease. Chinese Medical Journal. 2018;131(22):2647–50. 00029330-201811200-00001.
  32. 32. Xie G, Chen T, Li Y, Chen T, Li X, Liu Z. Artificial Intelligence in Nephrology: How Can Artificial Intelligence Augment Nephrologists’ Intelligence? Kidney Diseases. 2020;6(1):1–6. pmid:32021868
  33. 33. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8. Epub 2017/01/25. pmid:28117445.
  34. 34. Cao J, Singh K. Integrating risk prediction models into chronic kidney disease care. Current opinion in nephrology and hypertension. 2020;29(3):339–45. pmid:32205582
  35. 35. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. Epub 2015/01/07. pmid:25560714.
  36. 36. Heus P, Damen J, Pajouheshnia R, Scholten R, Reitsma JB, Collins GS, et al. Uniformity in measuring adherence to reporting guidelines: the example of TRIPOD for assessing completeness of reporting of prediction model studies. BMJ Open. 2019;9(4):e025611. Epub 2019/04/27. pmid:31023756; PubMed Central PMCID: PMC6501951.
  37. 37. Osisanwo F, Akinsola J, Awodele O, Hinmikaiye J, Olakanmi O, Akinjobi J. Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT). 2017;48(3):128–38.
  38. 38. Mahesh B. Machine Learning Algorithms-A Review. International Journal of Science and Research (IJSR)[Internet]. 2020;9:381–6.
  39. 39. Munn Z, Peters MDJ, Stern C, Tufanaru C, McArthur A, Aromataris E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Medical Research Methodology. 2018;18(1):143. pmid:30453902
  40. 40. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Annals of Internal Medicine. 2018;169(7):467–73. pmid:30178033
  41. 41. Akbari A, Tangri N, Brown PA, Biyani M, Rhodes E, Kumar T, et al. Prediction of Progression in Polycystic Kidney Disease Using the Kidney Failure Risk Equation and Ultrasound Parameters. Canadian Journal of Kidney Health and Disease. 2020;7. pmid:32215214
  42. 42. Chang HL, Wu CC, Lee SP, Chen YK, Su W, Su SL. A predictive model for progression of CKD. Medicine (United States). 2019;98(26):e16186. pmid:31261555
  43. 43. Cornec-Le Gall E, Audrezet M-P, Rousseau A, Hourmant M, Renaudineau E, Charasse C, et al. The PROPKD Score: A New Algorithm to Predict Renal Survival in Autosomal Dominant Polycystic Kidney Disease. Journal of the American Society of Nephrology: JASN. 2016;27(3):942–51. pmid:26150605
  44. 44. Crnogorac M, Horvatic I, Toric L, Galesic Ljubanovic D, Tisljar M, Galesic K. Clinical, serological and histological determinants of patient and renal outcome in ANCA-associated vasculitis with renal involvement: an analysis from a referral centre. International urology and nephrology. 2017;49(8):1419–31. pmid:28646482
  45. 45. Dai D, Alvarez PJ, Woods SD. A predictive model for progression of chronic kidney disease to kidney failure using a large administrative claims database. ClinicoEconomics and Outcomes Research. 2021;13:475–86. pmid:34113139.
  46. 46. Dunkler D, Gao P, Lee SF, Heinze G, Clase CM, Tobe S, et al. Risk Prediction for Early CKD in Type 2 Diabetes. Clinical journal of the American Society of Nephrology: CJASN. 2015;10(8):1371–9. pmid:26175542
  47. 47. Halbesma N, Jansen DF, Heymans MW, Stolk RP, de Jong PE, Gansevoort RT, et al. Development and validation of a general population renal risk score. Clinical journal of the American Society of Nephrology: CJASN. 2011;6(7):1731–8. pmid:21734089
  48. 48. Hasegawa T, Sakamaki K, Koiwa F, Akizawa T. Clinical prediction models for progression of chronic kidney disease to end stage kidney failure under predialysis nephrology care: Results from the chronic kidney disease Japan cohort study. Nephrology Dialysis Transplantation. 2016;31(SUPPL. 1). http://dx.doi.org/10.1093/ndt/gfw189.2.
  49. 49. Kang MW, Tangri N, Kim YC, An JN, Lee J, Li L, et al. An independent validation of the kidney failure risk equation in an Asian population. Scientific reports. 2020;10(1):12920. pmid:32737361
  50. 50. Kataoka H, Ohara M, Suzuki T, Inoue T, Akanuma T, Kawachi K, et al. Time series changes in pseudo-R2 values regarding maximum glomerular diameter and the Oxford MEST-C score in patients with IgA nephropathy: A long-term follow-up study. PloS one. 2020;15(5):e0232885. pmid:32379841
  51. 51. Kim HW, Park JT, Joo YS, Kang SC, Lee JY, Lee S, et al. Systolic blood pressure and chronic kidney disease progression in patients with primary glomerular disease. Journal of Nephrology. 2021;34(4):1057–67. pmid:33555575.
  52. 52. Li L, Luo S, Hu B, Greene T. Dynamic Prediction of Renal Failure Using Longitudinal Biomarkers in a Cohort Study of Chronic Kidney Disease. Statistics in Biosciences. 2017;9(2):357–78. pmid:29250207
  53. 53. Maziarz M, Chertow GM, Himmelfarb J, Hall YN. Homelessness and Risk of End-stage Renal Disease. Journal of Health Care for the Poor & Underserved. 2014;25(3):1231–44. pmid:25130236. Language: English. Entry Date: 20140827. Revision Date: 20200708. Publication Type: Journal Article.
  54. 54. Palant C, Chawla L, Faselis C, Li P, Kimmel PL, Amdur R. The association of serum creatinine variability and progression to ckd. Nephrology Dialysis Transplantation. 2015;30(SUPPL. 3). http://dx.doi.org/10.1093/ndt/gfv175.4.
  55. 55. Park KJ, Benuzillo JG, Keast E, Thorp ML, Mosen DM, Johnson ES. Predicted risk of renal replacement therapy at arteriovenous fistula referral in chronic kidney disease. The Journal of Vascular Access. 2020:1129729820947868. pmid:32772799
  56. 56. Schroeder EB, Yang X, Thorp ML, Arnold BM, Tabano DC, Petrik AF, et al. Predicting 5-year risk of RRT in stage 3 or 4 CKD: Development and external validation. Clinical Journal of the American Society of Nephrology. 2017;12(1):87–94. pmid:28028051
  57. 57. Sun L, Shang J, Xiao J, Zhao Z. Development and validation of a predictive model for end-stage renal disease risk in patients with diabetic nephropathy confirmed by renal biopsy. PeerJ. 2020;2020(2):e8499. pmid:32095345
  58. 58. Tangri N, Inker LA, Hiebert B, Wong J, Naimark D, Kent D, et al. A Dynamic Predictive Model for Progression of CKD. American journal of kidney diseases: the official journal of the National Kidney Foundation. 2016;69(4):514–20. pmid:27693260
  59. 59. Tangri N, Stevens LA, Griffith J, Tighiouart H, Djurdjev O, Naimark D, et al. A predictive model for progression of chronic kidney disease to kidney failure. JAMA. 2011;305(15):1553–9. pmid:21482743
  60. 60. Xie Y, Maziarz M, Tuot DS, Chertow GM, Himmelfarb J, Hall YN. Risk prediction to inform surveillance of chronic kidney disease in the US Healthcare Safety Net: a cohort study. BMC nephrology. 2016;17(1):57. pmid:27276913
  61. 61. Xu Q, Wang Y, Fang Y, Feng S, Chen C, Jiang Y. An easy-to-operate web-based calculator for predicting the progression of chronic kidney disease. Journal of Translational Medicine. 2021;19(1) (no pagination). pmid:34217324
  62. 62. Diggle PJ, Sousa I, Asar O. Real-time monitoring of progression towards renal failure in primary care patients. Biostatistics (Oxford, England). 2015;16(3):522–36. pmid:25519432
  63. 63. Furlano M, Loscos I, Martí T, Bullich G, Ayasreh N, Rius A, et al. Autosomal Dominant Polycystic Kidney Disease: Clinical Assessment of Rapid Progression. American Journal of Nephrology. 2018;48(4):308–17. pmid:30347391. Language: English. Entry Date: 20190820. Revision Date: 20201104. Publication Type: journal article.
  64. 64. Lennartz CS, Pickering JW, Seiler-Musler S, Bauer L, Untersteller K, Emrich IE, et al. External Validation of the Kidney Failure Risk Equation and Re-Calibration with Addition of Ultrasound Parameters. Clinical journal of the American Society of Nephrology: CJASN. 2016;11(4):609–15. pmid:26787778
  65. 65. Nastasa A, Apetrii M, Onofriescu M, Nistor I, Hussien H, Popa C, et al. Risk prediction for death and end-stage renal disease does not parallel real-life trajectory of older patients with advanced chronic kidney disease-a Romanian center experience. Nephrology Dialysis Transplantation. 2020;35(SUPPL 3). http://dx.doi.org/10.1093/ndt/gfaa142.P0184.
  66. 66. Zacharias HU, Altenbuchinger M, Schultheiss UT, Samol C, Kotsis F, Poguntke I, et al. A Novel Metabolic Signature To Predict the Requirement of Dialysis or Renal Transplantation in Patients with Chronic Kidney Disease. Journal of proteome research. 2019;18(4):1796–805. pmid:30817158
  67. 67. Makino M, Yoshimoto R, Ono M, Itoko T, Katsuki T, Koseki A, et al. Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Scientific reports. 2019;9(1):11862. pmid:31413285
  68. 68. Zhao J, Gu S, McDermaid A. Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression. Mathematical Biosciences. 2019;310:24–30. pmid:30768948
  69. 69. Zhou F, Gillespie A, Gligorijevic D, Gligorijevic J, Obradovic Z. Use of disease embedding technique to predict the risk of progression to end-stage renal disease. Journal of Biomedical Informatics. 2020;105:N.PAG-N.PAG. pmid:32304869. Language: English. Entry Date: In Process. Revision Date: 20210108. Publication Type: journal article.
  70. 70. Dovgan E, Gradisek A, Lustrek M, Uddin M, Nursetyo AA, Annavarajula SK, et al. Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney disease patients. PloS one. 2020;15(6):e0233976. pmid:32502209
  71. 71. Norouzi J, Yadollahpour A, Mirbagheri SA, Mazdeh MM, Hosseini SA. Predicting Renal Failure Progression in Chronic Kidney Disease Using Integrated Intelligent Fuzzy Expert System. Computational and mathematical methods in medicine. 2016;2016:6080814. pmid:27022406
  72. 72. Stephens-Shields AJ, Spieker AJ, Anderson A, Drawz P, Fischer M, Sozio SM, et al. Blood pressure and the risk of chronic kidney disease progression using multistate marginal structural models in the CRIC Study. Statistics in medicine. 2017;36(26):4167–81. pmid:28791722
  73. 73. Fraccaro P, van der Veer S, Brown B, Prosperi M, O’Donoghue D, Collins GS, et al. An external validation of models to predict the onset of chronic kidney disease using population-based electronic health records from Salford, UK. BMC Medicine. 2016;14(1):104. pmid:27401013
  74. 74. Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B. A machine learning methodology for diagnosing chronic kidney disease. IEEE Access. 2020;8:20991–1002.
  75. 75. Hagar Y, Albers D, Pivovarov R, Chase H, Dukic V, Elhadad N. Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining: The ASA Data Science Journal. 2014;7(5):385–403. pmid:33981381
  76. 76. Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. Journal of Big Data. 2019;6(1):54.
  77. 77. Mendu ML, Ahmed S, Maron JK, Rao SK, Chaguturu SK, May MF, et al. Development of an electronic health record-based chronic kidney disease registry to promote population health management. BMC Nephrology. 2019;20(1). pmid:30823871
  78. 78. Krishnamurthy S, Kapeleshh KS, Dovgan E, Luštrek M, Gradišek Piletič B, Srinivasan K, et al. Machine learning prediction models for chronic kidney disease using national health insurance claim data in Taiwan. Healthcare (Basel). 2021;9(5). pmid:34067129
  79. 79. Kotsiantis S. Feature selection for machine learning classification problems: A recent overview. Artificial Intelligence Review—AIR. 2011;42.
  80. 80. Sharma A, Dey S. A comparative study of feature selection and machine learning techniques for sentiment analysis. Proceedings of the 2012 ACM Research in Applied Computation Symposium; San Antonio, Texas: Association for Computing Machinery; 2012. p. 1–7.
  81. 81. Schell JO, Patel UD, Steinhauser KE, Ammarell N, Tulsky JA. Discussions of the kidney disease trajectory by elderly patients and nephrologists: a qualitative study. American journal of kidney diseases: the official journal of the National Kidney Foundation. 2012;59(4):495–503. Epub 2012/01/04. pmid:22221483.
  82. 82. Li L, Astor BC, Lewis J, Hu B, Appel LJ, Lipkowitz MS, et al. Longitudinal progression trajectory of GFR among patients with CKD. American journal of kidney diseases: the official journal of the National Kidney Foundation. 2012;59(4):504–12. Epub 2012/01/26. pmid:22284441.