Skip to main content
Advertisement
  • Loading metrics

Accuracy and clinical effectiveness of risk prediction tools for pressure injury occurrence: An umbrella review

  • Bethany Hillier,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliations Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, United Kingdom, NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom

  • Katie Scandrett,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Writing – original draft, Writing – review & editing

    Affiliation Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, United Kingdom

  • April Coombe,

    Roles Conceptualization, Data curation, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliations Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, United Kingdom, NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom

  • Tina Hernandez-Boussard,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Medicine, Stanford University, Stanford, California, United States of America

  • Ewout Steyerberg,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands

  • Yemisi Takwoingi,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    Affiliations Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, United Kingdom, NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom

  • Vladica M. Veličković,

    Roles Conceptualization, Funding acquisition, Methodology, Writing – review & editing

    Affiliations Evidence Generation Department, HARTMANN GROUP, Heidenheim, Germany, Institute of Public Health, Medical, Decision Making and Health Technology Assessment, UMIT, Hall, Tirol, Austria

  • Jacqueline Dinnes

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    j.dinnes@bham.ac.uk

    Affiliations Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, United Kingdom, NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom

Abstract

Background

Pressure injuries (PIs) pose a substantial healthcare burden and incur significant costs worldwide. Several risk prediction tools to allow timely implementation of preventive measures and a subsequent reduction in healthcare system burden are available and in use. The ability of risk prediction tools to correctly identify those at high risk of PI (prognostic accuracy) and to have a clinically significant impact on patient management and outcomes (effectiveness) is not clear. We aimed to evaluate the prognostic accuracy and clinical effectiveness of risk prediction tools for PI and to identify gaps in the literature.

Methods and findings

The umbrella review was conducted according to Cochrane guidance. Systematic reviews (SRs) evaluating the accuracy or clinical effectiveness of adult PI risk prediction tools in any clinical settings were eligible. Studies on paediatric tools, sensor-only tools, or staging/diagnosis of existing PIs were excluded. MEDLINE, Embase, CINAHL, and EPISTEMONIKOS were searched (inception to June 2024) to identify relevant SRs, as well as Google Scholar (2013 to 2024) and reference lists. Methodological quality was assessed using adapted AMSTAR-2 criteria. Results were described narratively. We identified 26 SRs meeting all eligibility criteria with 19 SRs assessing prognostic accuracy and 11 assessing clinical effectiveness of risk prediction tools for PI (4 SRs assessed both aspects). The 19 SRs of prognostic accuracy evaluated 70 tools (39 scales and 31 machine learning (ML) models), with the Braden, Norton, Waterlow, Cubbin-Jackson scales (and modifications thereof) the most evaluated tools. Meta-analyses from a focused set of included SRs showed that the scales had sensitivities and specificities ranging from 53% to 97% and 46% to 84%, respectively. Only 2/19 (11%) SRs performed appropriate statistical synthesis and quality assessment. Two SRs assessing machine learning-based algorithms reported high prognostic accuracy estimates, but some of which were sourced from the same data within which the models were developed, leading to potentially overoptimistic results. Two randomised trials assessing the effect of PI risk assessment tools (within the full test-intervention-outcome pathway) on the incidence of PIs were identified from the 11 SRs of clinical effectiveness; both were included in a Cochrane SR and assessed as high risk of bias. Both trials found no evidence of an effect on PI incidence. Limitations included the use of the AMSTAR-2 criteria, which may have overly focused on reporting quality rather than methodological quality, compounded by the poor reporting quality of included SRs and that SRs were not excluded based on low AMSTAR-2 ratings (in order to provide a comprehensive overview). Additionally, diagnostic test accuracy principles, rather than prognostic modelling approaches were heavily relied upon, which do not account for the temporal nature of prediction.

Conclusions

Available systematic reviews suggest a lack of high-quality evidence for the accuracy of risk prediction tools for PI and limited reliable evidence for their use leading to a reduction in incidence of PI. Further research is needed to establish the clinical effectiveness of appropriately developed and validated risk prediction tools for PI.

Author summary

Why was this study done?

  • Pressure injuries (PIs) are injuries to and below the skin caused by prolonged pressure, especially on bony areas, and people who spend extensive periods in a bed or chair are particularly vulnerable.
  • The majority of pressure injuries are preventable if appropriate preventive measures are put into place, but it is crucial to conduct risk stratification of individuals in order to appropriately allocate preventive measures.
  • Numerous tools that give patients a score (or probability) to signify their risk of developing a PI exist. However, there is a lack of clarity on how accurate the risk scores are and how effective the scores are at improving patient outcomes (the clinical effectiveness) when patient management is subsequently changed for patients classified as high risk.

What did the researchers do and find?

  • We conducted an umbrella review (an overview of existing systematic reviews), identifying 26 systematic reviews which included 70 risk prediction tools.
  • Of these 70 risk prediction tools, 31 were developed using machine learning (ML) methods, while the remainder were derived from statistical modelling and/or clinical expertise.
  • Risk prediction tools demonstrated moderate to high accuracy, as measured by a variety of metrics. However, there were concerns regarding the quality of both the systematic reviews, and the primary studies included in these reviews, as reported by the systematic review authors.
  • There were only 2 randomised controlled trials that investigated the clinical effectiveness of risk prediction tools and subsequent changes in PI management, and neither trial found that use of the tools had an impact on the incidence of PIs.

What do these findings mean?

  • While an abundance of risk prediction tools exists, it is unclear how accurate they are due to poor-quality evidence and poor reporting, so it is difficult to recommend a particular tool/tools.
  • Even if the tools are shown to be accurate, they are not useful unless they lead to improvement in patient outcomes. There is very limited evidence to determine whether the tools are clinically effective and the evidence that does exist suggests that the tools did not lead to improved patient outcomes.
  • More research into the clinical effectiveness of appropriately developed and evaluated tools, when they are adopted within the clinical pathway, is needed.
  • The main limitations of this study are the use of the quality assessment tool, the poor reporting quality of the included reviews, and the reliance on particular statistical methods to assess the accuracy of the risk prediction tools, which do not consider the time interval between use of the tool (prediction) and the onset of a PI occurring (outcome).

Introduction

Pressure injuries (PI), also known as pressure ulcers or decubitus ulcers, have an estimated global prevalence of 12.8% among hospitalised adults [1] and place a significant burden on healthcare systems (estimated at $26.8 billion per year in the United States (US) alone [2]). PIs are most common in individuals with reduced mobility, limited sensation, poor circulation, or compromised skin integrity, and can affect those in community settings and long-term care as well as hospital settings. Effective prevention of PI requires multicomponent preventive strategies such as mattresses, overlays and other support systems, nutritional supplementation, repositioning, dressings, creams, lotions, and cleansers [3,4]. Health economic models have suggested that providing baseline preventive interventions for all with daily risk assessments is more cost-effective than either a less standardised prevention protocol or a targeted risk-stratified prevention strategy [5]. Nevertheless, the stratification of patients by risk could further improve outcomes by allowing timely and targeted implementation of additional or greater intensity preventive measures in those most at risk, to reduce harm and consequently burden to healthcare systems [6].

Numerous clinical assessment scales and statistical risk prediction models for assessing the risk of PI are available. However, the methodology underlying their development is not always explicit, with scales in routine clinical usage apparently based on epidemiological evidence and clinical judgement about predictors that may not meet accepted principles for the development and reporting of risk prediction models [7]. The Braden [8,9], Norton [10], and Waterlow [11] scales are recommended by the National Institute for Health and Care Excellence (NICE) guidelines [12] in the United Kingdom (UK) and referenced in international guidelines for PI prevention [13]. In some hospitals and long-term care settings in the US, healthcare professionals must conduct mandatory risk assessments for PI for all patients for the purposes of risk stratification and clinical triage. The Braden scale, developed in 1987 using a sample of 102 older hospital patients in the US includes sensory perception, moisture, activity, mobility, nutrition, friction, and shear as predictors [8,9]. The Norton scale, based on a sample of 250 older hospital patients in the UK and published in 1962, includes physical condition, mental status, activity, mobility, and continence domains [10]. The Waterlow scale was published in 1985 for use by Waterlow’s nursing students in the UK [14], and assesses body mass index (BMI), assessment of the skin, sex, age, malnutrition, incontinence, mobility, tissue malnutrition, neurological deficits, major surgery or trauma, and medication [11].

There is a considerable body of evidence evaluating the psychometric properties and clinical utility of available risk prediction tools, much of which has been synthesised in systematic reviews and meta-analyses [7]. However, there is an apparent lack of reporting of now standard methods for development and validation of risk prediction tools. Clinical utility includes both prognostic accuracy and clinical effectiveness. Prognostic accuracy is estimated by applying a numeric threshold above (or below) which there is a greater risk of PI, with study results presented using accuracy metrics such as sensitivity, specificity or the area under the receiver operating characteristic (ROC) curve [15]. Resulting accuracy is driven not only by the nominated threshold for defining participants as at low or high risk for PI but by other study factors including population and setting [16]. Clinical effectiveness, or the ability of a tool to ultimately impact on health outcomes such as the incidence or severity of PI, is related both to the accuracy of the tool (or its ability to correctly identify those most likely to develop PI), to the uptake and implementation of the tool in practice and to the consequential changes in PI management based on tool predictions. Demonstrating a change in health outcomes as a result of use of a risk prediction tool is vital to encourage implementation [17].

Using an umbrella review approach, we aimed to provide a comprehensive overview of available systematic reviews that consider the prognostic accuracy and clinical effectiveness of PI risk prediction tools.

Methods

Protocol registration and reporting of findings

We followed Cochrane guidance for conducting umbrella reviews [18], and “Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies” (PRISMA-DTA) reporting guidelines [19] (see Appendix A in S1 Supporting Information). The protocol was registered on Open Science Framework (https://osf.io/tepyk).

Literature search

Electronic searches of MEDLINE, Embase via Ovid, and CINAHL Plus EBSCO from inception to June 2024 were developed and conducted by an experienced information specialist (AC), employing well-established systematic review and prognostic search filters [2022], combined with appropriate keywords related to PIs. Simplified supplementary searches in EPISTEMONIKOS and Google Scholar were also undertaken, with the latter covering the years 2013 to June 2024 (see Appendix B in S1 Supporting Information for further details). Screening of search results and full texts were conducted independently and in duplicate by any 2 from a group of 4 reviewers (BH, JD, YT, and KS), with arbitration by a third reviewer where necessary (any one of the 4 reviewers not involved in the independent screening).

Eligibility criteria for this umbrella review

Published English-language systematic reviews of risk prediction tools developed for adult patients at risk of PI in any setting were included. We understand the term “adult” to refer to individuals aged 18 and over, but accepted individual study definitions of adult and also included studies in which “adult” was not defined. Studies focused on tools developed for paediatric populations, as defined by tool developers, were excluded. Clinical risk assessment scales and models developed using statistical or machine learning (ML) methods were eligible (models exclusively using pressure sensor data were not considered). Risk prediction tools could be applied by any healthcare professional using any threshold for classifying patients as high or low risk and using any PI classification system [13,2325] as a reference standard. For prognostic accuracy, we required accuracy metrics, such as sensitivity and specificity, to be presented but did not require full 2 × 2 classification tables to be reported. Reviews on diagnosing or staging suspected or existing PIs were excluded.

To be considered “systematic,” reviews were required to report a thorough search of at least 2 electronic databases and at least one other indication of systematic methods (e.g., explicit eligibility criteria, formal quality assessment of included studies, adequate data presentation for reproducibility of results, or review stages (e.g., search screening) conducted independently in duplicate).

Data extraction and quality assessment

Data extraction forms (Appendix C in S1 Supporting Information) were informed by the CHARMS checklist (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) and Cochrane Prognosis group template [26,27]. Data extraction items included review characteristics, number of studies and participants, study quality, and results.

The methodological quality of included systematic reviews was assessed using an adapted version of AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews) [28]. For example, for reviews evaluating the prognostic accuracy of risk prediction tools we assessed eligibility criteria using the PIRT framework (Population, Index test, Reference standard, Target condition) [29] and POII framework (Population, Outcome to be predicted, Intended use of model, Intended moment in time) [30] and required methodological quality assessment to be conducted using validated and appropriate tools such as QUADAS (Quality Assessment of Diagnostic Accuracy Studies) [31], QUADAS-2 [32], or PROBAST (Prediction model Risk of Bias Assessment) [33]. We omitted the AMSTAR-2 item relating to publication bias (Item 15) because of the lack of empirical evidence for the effect of publication bias on test accuracy estimates and limitations in statistical methods for identifying publication bias [19,34]. Our adapted AMSTAR-2 contains 6 critical items, and limitations in any of these items reduces the overall validity of a review [28]. Full details can be found in Appendix D in S1 Supporting Information. Quality assessment and data extraction were conducted by one reviewer and checked by a second (BH, JD, KS), with disagreements resolved by consensus.

Synthesis methods

Reviews about prognostic accuracy and clinical effectiveness of risk prediction tools were considered separately. Review methods and results were tabulated, and a narrative synthesis provided. Prognostic accuracy results from reviews including a statistical synthesis were tabulated according to risk prediction tool.

Considerable overlap in risk prediction tools and included primary studies was noted between reviews. For risk prediction tools that were included in multiple meta-analyses, we focused our synthesis on the review(s) with the most recent search date or most comprehensive (based on number of included studies) and most robust estimate of prognostic accuracy (judged according to the appropriateness of the meta-analytic method used, e.g., use of recommended hierarchical approaches for test accuracy data [35]). The prognostic accuracy of risk prediction tools that were included in 3 or fewer reviews was reported only if an appropriate method of statistical synthesis [18] was used.

For clinical effectiveness results, reviews with the most recent search date or most comprehensive overview of available studies that assessed PI incidence outcomes and that at least partially met more of the AMSTAR-2 criteria [28] were prioritised for narrative synthesis.

Results

Characteristics of included reviews

A total of 118 records were selected for full-text assessment from 7,200 unique records. We could obtain the full text of 111 publications, of which 26 reviews met all eligibility criteria (Fig 1), 19 reported accuracy data [3654] and 11 reported clinical effectiveness data [38,42,43,49,5561] (4 reported both accuracy and effectiveness data [38,42,43,49]). Table 1 and Fig 2 provide an overview of the characteristics, methods, and methodological quality of all 26 reviews (see Appendix E in S1 Supporting Information for full details).

thumbnail
Fig 1. PRISMA flowchart: Identification, screening, and selection process.

List of full-text articles excluded, with reasons, is given in Appendix E in S1 Supporting Information.

https://doi.org/10.1371/journal.pmed.1004518.g001

thumbnail
Fig 2. Summary of AMSTAR-2 assessment results.

Item 1 –Adequate research question/inclusion criteria?; Item 2 –Protocol and justifications for deviations?; Item 3 –Reasons for study design inclusions?; Item 4 –Comprehensive search strategy?; Item 5 –Study selection in duplicate?; Item 6 –Data extraction in duplicate?; Item 7 –Excluded studies list (with justifications)?; Item 8 –Included studies description adequate?; Item 9 –Assessment of RoB/quality satisfactory?; Item 10 –Studies’ sources of funding reported?; Item 11 –Appropriate statistical synthesis method?; Item 12 –Assessment of impact of RoB on synthesised results?; Item 13 –Assessment of impact of RoB on review results?; Item 14 –Discussion/investigation of heterogeneity?; Item 15 –Conflicts of interest reported?; N/A–Not Applicable; RoB–Risk of Bias. Further details on AMSTAR items are given in Appendix D in S1 Supporting Information, and results per review are given in Appendix E in S1 Supporting Information.

https://doi.org/10.1371/journal.pmed.1004518.g002

thumbnail
Table 1. Summary of included systematic review characteristics.

https://doi.org/10.1371/journal.pmed.1004518.t001

Reviews were published between 2006 and 2024. Over half (15/26, 58%) restricted inclusion to adult populations (Table 1), 2 (2/26, 8%) included any age group, and 9 (9/26, 35%) did not report any age restrictions. Six reviews (6/26, 23%) only included study populations with no PI at baseline. Acute care was the most frequent setting across both review questions (7/19 (37%) accuracy reviews and 3/11 (27%) effectiveness reviews). Quality assessment tools included QUADAS-2 (n = 8) or QUADAS (n = 2) in more than half of reviews of accuracy (10/19, 53%). One review [47] utilised and reported PROBAST assessments for risk of bias. Another review [48] reported using QUADAS-2 and PROBAST tools in their methods, but only reported QUADAS-2 results.

Reviews of accuracy either included studies evaluating any tool (5/19, 26%) or prespecified tools (10/19, 53%); 2 [47,48] included only ML-based prediction models, while the remaining 2 [49,50] did not specify the tools to be included. A total of 70 risk prediction tools were reported across the reviews (from one [37,40,41,46,51,52] to 28 [39] tools included per review), including 31 ML models. Only 2 reviews (2/19, 11%) reported eligibility criteria related to the development or validation of the risk prediction tools. One [43] excluded evaluation studies that used the same data that was used to develop the tool and the other [38] included only “validated risk assessment instruments” with no further definition (yet included studies reporting original tool development).

The majority (15/19, 79%) of accuracy reviews conducted a meta-analysis, but only 2 utilised currently recommended hierarchical approaches for the meta-analysis of test accuracy data [41,53]. Eight reviews conducted univariate meta-analysis of individual accuracy measures (e.g., sensitivity and specificity separately, or area under the curve (AUC) [50], risk ratios (RRs) [39], or odds ratio [43]) and 5 did not clearly report the type of analysis approach used.

Of the 11 systematic reviews evaluating clinical effectiveness, 2 only considered the reliability of risk assessment scales [49,58], 1 considered reliability and other “psychometric” properties [42], and 8 considered effects on patient outcomes (one of which also considered tool reliability [55]). More than half of reviews (6/11, 55%) compared use of PI risk assessment scales to clinical judgement alone or “standard care.” The number of included studies ranged from 1 [56] to 20 [60], with sample sizes ranging from 1 (1 subject and 110 raters, in an inter-rater reliability study [62]) to 4,137 patients. Reported outcomes included the incidence of PIs (7/11, 64%), preventive interventions prescribed (5/11, 45%) and inter-rater reliability (4/11, 36%), internal consistency, measurement error and convergent validity (1/11, 9%) (latter 4 properties reported in Appendix E in S1 Supporting Information). One review [61] used the Cochrane risk of bias (RoB) tool for quality assessment of included studies, 2 reviews [59,60] used Joanna Briggs Institute (JBI) tools and another review [43] used the Critical Appraisal Skills Programme (CASP) checklist. Due to heterogeneity in study design, risk prediction tools, and outcomes evaluated, none of the included reviews provided any form of statistical synthesis of study results.

Methodological quality of included reviews

The quality of included reviews was generally poor (Table 2; Appendix E in S1 Supporting Information). The AMSTAR-2 items that were most consistently met (yes or partial yes) were: comprehensiveness of the search (21/26, 81%), study selection independently in duplicate (17/26, 65%), data extraction in independently in duplicate (15/26, 58%), and conflicts of interest reported (20/26, 77%).

thumbnail
Table 2. Findings related to prognostic accuracy, by model: Characteristics and quality of studies included within reviews.

https://doi.org/10.1371/journal.pmed.1004518.t002

Six (6/19, 32%) accuracy reviews [36,40,41,47,48,53] and 2 (2/11, 18%) effectiveness reviews used an appropriate method of quality assessment of included studies (i.e., QUADAS or QUADAS-2 dependent on publication year, or PROBAST for accuracy and the Cochrane tool for assessing risk of bias [64] and criteria consistent with AHRQ (Agency for Healthcare Research and Quality) Methods Guide for Effectiveness and Comparative Effectiveness Reviews [65] for effectiveness reviews) and also presented judgements per study. Five reviews either reported quality assessment results per study (n = 4 [42,5860]) or were considered to use an appropriate quality assessment tool (n = 1 [43]) (AMSTAR-2 criterion partially met).

Of the accuracy reviews that included a statistical synthesis, 25% (4/16) [39,41,50,53] used an appropriate meta-analytic method and investigated sources of heterogeneity. Two reviews [41,53] used recommended hierarchical approaches to meta-analysis of test accuracy data (the bivariate model [41] and hierarchical summary ROC (HSROC) model [53]) and 2 reviews calculated summary estimates of individual measures, using random effects meta-analyses (AUC [50] or RR [66]).

Compared to the reviews of accuracy, reviews of effectiveness more commonly provided adequate descriptions of primary studies (8/11, 73% versus 1/19, 5%) and adequately defined their inclusion criteria (5/11, 45% versus 1/19, 5%) (Fig 2). No other major differences across review questions were noted.

Results from reviews evaluating the prognostic accuracy of risk prediction tools

Seven of 19 accuracy reviews were prioritised for narrative synthesis (Tables 2 and 3) and are reported below according to risk prediction tool. Five of the 7 reviews did not include development study estimates within their meta-analyses, one review of ML models did not report this information [48] and one [47] restricted inclusion to studies reporting model development studies. The latter review was the only one to consider the effect of study quality in their statistical syntheses.

thumbnail
Table 3. Summary estimates of accuracy parameters (main results from statistical syntheses), by prediction tool.

https://doi.org/10.1371/journal.pmed.1004518.t003

Braden and modified Braden scales.

The most recent and largest review [41] of the Braden scale (60 studies, including 49,326 patients), which used hierarchical bivariate meta-analysis, reported an overall summary sensitivity of 0.78 (95% confidence interval (CI) [0.74, 0.82]; 15,241 patients) and specificity of 0.72 (95% CI [0.66, 0.78]; 34,085 patients) across all reported thresholds (range ≤10 to ≤20). Summary sensitivities and specificities ranged from 0.79 (95% CI [0.76, 0.82]) and 0.66 (95% CI [0.55, 0.75]) at the lowest cut-offs for identification of high-risk patients (≤15 in 15 studies) to 0.82 (95% CI [0.73, 0.89]) and 0.70 (95% CI [0.62, 0.77]) using a cut-off of 18 (15 studies), respectively. Heterogeneity investigations suggested higher accuracy for predicting PI risk in patients with a mean age of 60 years or less, in hospitalised patients (compared to long-term care facility residents) and in Caucasian populations (compared to Asian populations) [41]. The review noted a high risk of bias for the “index test” section of the QUADAS-2 assessment in approximately a third of included studies, but failed to provide further details.

Two modified versions of the Braden scale [67,68] were included in another review [44]. Summary sensitivities were 0.97 (95% CI [0.92, 0.99]; 125 patients from 4 studies) [67] and 0.89 (95% CI [0.71, 0.98]; 27 patients from 2 studies) [68], and summary specificities were 0.70 (95% CI [0.66, 0.73]; 563 patients) [67] and 0.71 (95% CI [0.67, 0.75]; 599 patients) [68]. The review was rated critically low on the AMSTAR-2 assessment, with only 1/15 (13%) criteria fulfilled. QUADAS-2 was reportedly used but results not reported in any detail, other than to indicate that none of the included studies were considered at high risk of bias.

Cubbin and Jackson scale.

The most recent and comprehensive review [36] of the Cubbin and Jackson scale (9 studies, including 7,684 patients) reported summary sensitivity of 0.81 (95% CI [0.51, 0.95]; 1,558 patients) and specificity of 0.76 (95% CI [0.58, 0.88]; 6,126 patients). However, this review scored critically low on AMSTAR-2 (3/15, 20%, criteria fulfilled), with authors utilising inappropriate methods for statistical synthesis, not investigating causes of heterogeneity and poor reporting of results throughout. Their meta-analysis approach was also not clearly reported, but it appears that univariate meta-analyses were conducted separately for sensitivity and specificity, across studies with different Cubbin and Jackson thresholds.

Zhang and colleagues [53] included 6 studies evaluating the original Cubbin and Jackson scale [69] (800 patients). Summary sensitivity and specificity were both reported as 0.84 (95% CIs [0.59, 0.95] and [0.66, 0.93], respectively) [53] suggesting that this represents the point on the HSROC curve where sensitivity equals specificity, particularly as reported thresholds ranged from 24 to 34. The review authors concluded that although the accuracy of the Cubbin and Jackson scale was higher than the EVARUCI scale and the Braden scale, low quality of evidence and significant heterogeneity limit the strength of conclusions that can be drawn.

Norton scale.

Park and colleagues [44] synthesised data from 7 studies (2,899 participants) evaluating the Norton scale, across thresholds ranging from <14 to <16. They reported summary sensitivity of 0.75 (95% CI [0.70, 0.79]) and specificity 0.57 (95% CI [0.55, 0.59]). A further 4 reviews presented statistically synthesised results for the Norton scale (Appendix E in S1 Supporting Information), including one review by Chou and colleagues [38] which included 9 studies (5,444 participants) but only reported median values for accuracy parameters.

Waterlow scale.

Although Zhang and colleagues [53] included the fewest participants (4 studies; 1,000 participants) of all 6 reviews that conducted a statistical synthesis of the accuracy of the Waterlow scale [11], they provided the most recent review. It was rated highest on AMSTAR-2 criteria and appropriately used the HSROC model for meta-analysis across thresholds ranging from 12 to 25. Summary sensitivity was 0.63 (95% CI [0.48, 0.76]) and summary specificity 0.46 (95% CI [0.22, 0.71]) (Table 3). A second review [44] reported contrasting results with summary sensitivity of 0.55 (95% CI [0.49, 0.62]) and specificity 0.82 (95% CI [0.80, 0.85]) (6 studies; 1,268 participants); however, authors synthesised data across multiple thresholds without utilising hierarchical methods.

Machine learning algorithms.

Pei and colleagues [47] included 18 ML models, 7 of which were not covered by any other included review. Accuracy measures were combined across all models that provided 2 × 2 data (n = 14 models). The summary AUC across the 14 models was 0.94, summary sensitivity was 0.79 (95% CI [0.78, 0.80]), and summary specificity was 0.87 (95% CI [0.88, 0.87]) (Table 3). Meta-regression found no significant effect by ML algorithm or data type. Clinical heterogeneity was not investigated. The majority of studies (16/18, 89%) were considered at high risk of bias based on PROBAST. Our confidence in the review was critically low, with only 6/15 (40%) AMSTAR-2 criteria fulfilled. One critical flaw was the use of inappropriate meta-analysis methods (failing to use a hierarchical model for synthesising sensitivity and specificity).

Qu and colleagues [48] conducted separate meta-analyses of 25 studies by ML algorithm type using Bayesian hierarchical methods (Table 2). The review rated critically low on AMSTAR-2 items, with only 6/15 (40%) criteria fulfilled. The review did not restrict inclusion to external evaluations of the models, and the authors did not report which estimates were sourced from development data or external data. The summary AUC for the 5 algorithms ranged from 0.82 (95% CI [0.79, 0.85]; 9 studies with 97,815 participants) for neural network-based models to 0.95 (95% CI [0.93, 0.97]; 7 studies with 161,334 participants) for random forest models (Table 3).

The latter approach also had the highest summary specificity 0.96 (95% CI [0.80, 0.99]), with sensitivity 0.72 (95% CI [0.26, 0.95]). The highest summary sensitivity was observed for support vector machine models (0.81, 95% CI [0.69, 0.90]) with summary specificity 0.81 (95% CI [0.59, 0.93]) (9 studies, 152,068 participants). The remaining algorithms had summary sensitivities ranging from 0.66 (decision tree models) to 0.73 (neural network models) (Table 3). Two additional ML algorithms evaluated in the included studies (Bayesian networks and LOS (abbreviation not explained)) had too few studies to allow meta-analysis (Appendix E in S1 Supporting Information).

Other scales.

In addition to the risk prediction tools reported above, Zhang and colleagues [53] reported on the EVARUCI scale [70], presenting summary sensitivity and specificity of 0.84 (95% CI [0.79, 0.89]) and 0.68 (95% CI [0.66, 0.70]), respectively (3 studies; 3,063 participants). These results were synthesised across thresholds, 11 and 11.5 (one not reported).

Additional statistical syntheses covering 3 further modifications of the Braden scale (Braden modified by Kwong [71], the 4-factor model [72], and “extended Braden” [72]), 2 modified versions of the Norton scale (by Ek [73] and by Bienstein [74]), a revised “Jackson & Cubbin” [75], and the EMINA [76] and PSPS [77] tools) were also identified [38,39,49]. These analyses showed variable performance, often with high uncertainty. Full details can be found in Table D in S1 Supporting Information.

Table E in S1 Supporting Information reports data for another 17 risk prediction tools, each associated with a single primary study (therefore not covered in detail in the text above), and another 2 tools, Sunderland [78] and RAPS [79], which are assessed in 2 primary studies each.

Results from reviews evaluating the clinical effectiveness of risk prediction tools

The 11 reviews reporting clinical effectiveness used a range of eligibility criteria and a number of different quality assessment tools, leading to varying conclusions about the methodological quality of the same studies across reviews. Given the overlap in study inclusion between reviews Table 4 provides an overview of results from 4 [38,57,59,61] of the 11 reviews, and a summary of the included comparative studies is provided below.

thumbnail
Table 4. Systematic reviews evaluating clinical effectiveness.

https://doi.org/10.1371/journal.pmed.1004518.t004

Two randomised controlled trials (RCTs) of risk prediction tools [83,84] were identified, both of which were considered at high risk of bias in the Cochrane review (assessed using the Cochrane RoB tool [64]). One of the trials (an individually randomised study [83]) was included in a further 3 reviews which considered it to be “good quality” [38], “valid” [56], or “high quality” [59]. The trial was conducted in 1,231 hospital inpatients and the only intervention was that the staff must use the tool that was allocated to them, with no other protocol prescribed changes made to routine care. However, no evidence of a difference in PI incidence was found between patients assessed with either the Waterlow scale or Ramstadius tool compared with clinical judgement alone (RR 1.10 (95% CI [0.68, 1.81]) and RR 0.79 (95% CI [0.46, 1.35]), respectively. The trial further showed no evidence of a difference in patient management or in PI severity when using a risk assessment tool compared to clinical judgement.

A further cluster randomised trial [84] was considered to be of poor methodological quality both in the Cochrane review [38] and one other review [61]. The trial included 521 patients at a military hospital and compared nurse training with mandatory use of the Braden scale, to nurse training and optional use of the Braden scale, to no training. No evidence of a difference in PI incidence was observed between the 3 groups: incidence rates were 22%, 22%, and 15% (p = 0.38), respectively.

Two reviews by Lovegrove and colleagues [59,60] included an uncontrolled comparison study [85] rated as high quality [59]. The study compared the clinical effectiveness of the Maelor scale [86] used in an Irish hospital (121 patients) with nurses’ clinical judgement at a Norwegian hospital (59 patients). A higher rate of preventive strategies, as well as a lower PI prevalence (12% versus 54%), was reported for the Irish hospital. However, these results are likely to be highly confounded by inherent differences in population and setting.

A non-randomised study by Gunningberg and colleagues [87] included in 2 reviews [43,57] was considered by review authors to be of relatively high quality. The study was conducted in 124 patients in emergency and orthopaedic units and compared the use of a PI risk alarm sticker for patients with a modified Norton Score of <21 (indicating high-risk patients) to standard care. No significant difference in the incidence of PIs between the Norton scale and standard care groups was observed.

A non-randomised study [88] conducted in 233 hospice inpatients was included in three reviews [38,43,57], one of which is reported in Table 4 [57]. The study met 6 of 8 quality criteria used by Health Quality Ontario [57]. Use of a modified version of the Norton scale (Norton modified by Bale), in conjunction with standardised use of preventive interventions based on risk score, was found to be associated with lower risk of PIs when compared with nurses’ clinical judgement alone (RR 0.11, 95% CI [0.03, 0.46]). The lack of randomisation limits the reliability of this result, and review authors report that the modified Norton scale had not been validated.

Finally, a “before-and-after” study [89] of 181 patients in various hospital settings was included in 2 reviews [43,57], one of which considered the study to meet all quality criteria [57]. Use of the Norton scale with additional training for staff was associated with significant differences in the number of preventive interventions prescribed compared to standard care (18.96 versus 10.75, respectively). Preventive interventions were also introduced earlier in the intervention group (on day 1, 61% versus 50%, p < 0.002 for Norton and usual care, respectively). However, no significant difference in the incidence of PIs was detected between the groups.

Discussion

This umbrella review summarises data from a total of 26 systematic reviews of studies evaluating the prognostic accuracy and clinical effectiveness of a total of 70 PI risk prediction tools. Despite the large number of available reviews, quality assessment using an adaptation of AMSTAR-2 suggested that the majority were conducted to a relatively poor standard or did not meet reporting standards for systematic reviews [19,90]. Of the 15 AMSTAR-2 items assessed, only 2 (for accuracy reviews) and 4 (for effectiveness reviews) criteria were more consistently met (more than 60% of reviews scoring “Yes”). While AMSTAR-2 Item 6 (data extraction independent in duplicate) was fulfilled by over half of all reviews (15/26, 58%), and Item 14 (adequate heterogeneity investigation) was fulfilled by around half of the accuracy reviews (10/19, 53%), all other criteria were fully met by less than half of the reviews. The primary studies included in the reviews were particularly poorly described in the accuracy reviews, making it difficult to determine exactly what was evaluated and in whom. The extent to which we could reliably describe and comment on the content of the reviews is limited and high-quality evidence for the accuracy and clinical effectiveness of PI risk prediction tools may be lacking.

Of the 19 reviews reporting the accuracy of included tools, only 2 used appropriate methods for both quality assessment and statistical synthesis of accuracy data [41,53], one of which [41] evaluated only the Braden scale. Only 2 reviews [42,43] prespecified the exclusion of studies reporting accuracy data from tool development studies, one review restricted to “validated risk assessment instruments” only [38] and one review [47] was limited to development studies only. This was the only review [47] that discussed the importance of appropriate validation of prediction tools. Only 2 reviews conducted meta-analyses at different cut-offs for determination of high risk [38,41]; the remaining reviews combined data regardless of the threshold used. Combining data across different thresholds to estimate summary sensitivity and specificity yields clinically uninterpretable and non-generalisable estimates that do not relate to a particular threshold [35]. Only 1 review [38] considered timing in their inclusion criteria or in the description of primary studies. It is important to interpret the findings below with these limitations in mind.

The included meta-analyses suggested that risk prediction scales have moderate sensitivities and somewhat lower specificities, typically in the range of around 60% to 85% for sensitivity and 50% to 80% for specificity. Although these ranges in sensitivities and specificities would be considered on the lower end of acceptable within a diagnostic test accuracy (DTA) paradigm, they may have greater utility in a prognostic context. Without a detailed review of the primary study publications for these tools, it is not possible to assess which, if any, of these risk assessment scales might outperform the others. It seems that limited comparative studies comparing the accuracy of different tools are available.

For the ML-based models, one review [47] combined multiple ML models into 1 meta-analysis and another [48] meta-analysed accuracy data by algorithm type. The results of the latter meta-analyses are not informative for clinical practice but may be a useful way of identifying which ML algorithms may be more suited to the data. Results suggested that specificities for random forest or decision tree models could reach 90% or above with associated sensitivities in the range of 66% to 72%; however, relatively wide confidence intervals around these summary estimates reflect considerable variation in model performance. Moreover, some of these estimates came from internal validations within model development studies and may not be transferable to other settings [91]. Authors should make it clear where accuracy estimates are derived from to avoid overinterpretation of results.

Diagnostic accuracy studies are typically cross-sectional in the sense that there should be no, or only minimal delay between application of the test and the reference standard [92,93]. For prognostic accuracy however, there is a time delay between the application of the test and the outcome that the tool aims to predict. If the use of an accurate PI risk prediction tool is combined with effective and appropriate preventive measures in those identified as most at risk, the incidence of PI would decline, reducing the positive predictive value of the original risk assessment and potentially the sensitivity of the tool [94]. Sensitivity and specificity can be optimised by methods which directly consider the cost of misclassification, including both the harms associated with applying more intensive prevention in those with a false positive result and the benefits of preventive measures in those with a true positive result. One solution to determine the preventive treatment threshold risk is through net benefit calculations [95,96], which can be visualised in decision curves and are common in prognostic research. These calculations can assist in providing a balanced use of resources while maximising positive health outcomes, such as lowering incidence of PI.

It is important to also consider that not all predictors have a causal relationship with the outcome, therefore, not every predictor will be a clinical risk modifier. Risk assessment tools that allow a more personalised-risk approach, i.e., that identify and flag predictors that are risk modifiers to the end-users of the tool would make predictions more interpretable and actionable. Some such developments exist [97,98], but future validation of these methods is needed. Where risk assessment tools are developed for enriching study design (for example, as a means of recruiting only high-risk patients to studies evaluating preventive measures), a different approach and optimisation of performance metrics would be needed. Risk prediction models should therefore prespecify their intended application before development to allow their clinical utility for a given context to be addressed [99].

Prediction models, like any test used for diagnostic or prognostic purposes, require evaluation in the care pathway to identify the extent to which their use can impact on health outcomes [100]. Of the 11 reviews assessing clinical effectiveness of PI risk prediction tools, the only primary studies suggesting potential patient benefits from the use of risk prediction tools [85,88,89] were non-randomised and are likely to be at high risk of bias. In contrast, 2 randomised trials [83,84] (both considered at high risk of bias by the Cochrane review [61]) suggest that use of structured risk assessment tools does not ultimately lead to the reduction in incidence of PIs. We should recognise that effectiveness outcomes from using a risk prediction tool depend on the timely implementation of effective preventive measures, a step that is frequently poorly described in studies evaluating the effectiveness of risk assessment tools, restricting the conclusions that can be drawn from the limited evidence available. One possible explanation for the lack of differences in PI incidence is the implementation of preventive measures that have not been proven effective in preventing PIs, such as alternating air mattresses [4]. All reviews included studies that assessed the use of risk assessment scales developed by clinical experts, and no evidence is available evaluating the clinical effectiveness of empirically derived prediction models or ML algorithms.

We have separately reviewed [7] available evidence for the development and validation of risk prediction tools for PI occurrence. Almost half (60/124, 48%) of available tools were developed using ML methods (as defined by review authors), 37% (46/124) were based on clinical expertise or unclear methods, and only 15% (18/124) were identified as having used statistical modelling methods. The reviews varied in methodological quality and reporting; however, the reporting of prediction model development in the original primary studies appears to be poor. For example, across all prediction tools identified, the internal validation approach was unclear and unidentifiable for 72% (89/124) of tools, and only 2 reviews [47,101] identified and included external validation studies (n = 9 studies).

ML-based models may have potential for identifying those at risk of PI, as suggested by 2 reviews [47,48] included in this umbrella review. However, it is important to consider the lack of transparency in reporting of model development methods and model performance, and the concerning lack of model validation in populations outside of the original model development sample [7].

We have conducted the first umbrella review that summarises the prognostic accuracy and clinical effectiveness of prediction tools for risk of PI. We followed Cochrane guidance [18], with a highly sensitive search strategy designed by an experienced information specialist. Although we excluded non-English publications due to time and resource constraints, where possible these publications were used to identify additional eligible risk prediction tools.

To some extent, our review is limited by the use of AMSTAR-2 for quality assessment of included reviews. AMSTAR-2 was not designed for assessing systematic reviews of diagnostic or prognostic studies. Although we made some adaptations, many of the existing and amended criteria relate to the quality of reporting of the reviews as opposed to methodological quality. There is scope for further work to establish criteria for assessing systematic reviews of prediction tools. Additionally, we chose not to exclude reviews based on low AMSTAR-2 ratings to provide a comprehensive overview of all available evidence. However, by doing so, we acknowledge that many included reviews are of poor quality (with critically low confidence in 81%, 21/26, reviews), reducing the reliability of the evidence presented and the ability to make conclusions or recommendations based on this evidence.

The primary limitation of our study lies in the limited detail available on risk prediction tools and their performance within the included systematic reviews. To ensure comprehensive model identification, we adopted a broad definition of “systematic,” potentially influencing the depth of information provided in the reviews, and the reporting quality in many primary studies contributing to these reviews may be suboptimal.

Although standards for reporting of test accuracy studies have been available since the year 2000 [92], standards for reporting risk prediction models were not published until 2015 [102]. Similarly, quality assessment tools highlighting important areas for consideration in primary studies have been available for DTA studies since 2003, with an adaption to prognostic accuracy published in 2022 [103] and PROBAST for prediction model studies in 2019 [33]. This lag in methodological developments for studies and systematic reviews of risk prediction tools has likely contributed to the observed emphasis on the application of DTA principles in our set of reviews, without sufficient consideration of the prognostic context and effect on accuracy of intervening and effective preventive interventions.

While 95% (18/19) accuracy reviews aimed to evaluate the “predictive” validity of PI risk assessment tools, the majority (16/19, 84%) relied on DTA principles without any consideration of the time interval between the test and the outcome, i.e., occurrence of PI. This approach does not account for the prognostic nature of these tools or address longitudinal questions, such as censoring and competing events [103]. Another fundamental flaw in these accuracy assessments is that risk scales may actually appear to perform worse in settings where risk prediction and preventive care are most effective, as accurate risk prediction combined with effective preventive measures may prevent patients classified as “high-risk” from developing PIs [94].

In conclusion, this umbrella review comprehensively summarises the prognostic accuracy and clinical effectiveness of risk prediction tools for developing PIs. The included systematic reviews used poor methodology and reporting, limiting our ability to reliably describe and evaluate their content. ML-based models demonstrated potential, with high specificity reported for some models. Wide confidence intervals highlight the variability in current evaluations, and external validation of ML tools may be lacking. The prognostic accuracy of clinical scales and statistically derived prediction models has a substantial range of specificities and sensitivities, motivating further model development with high-quality data and appropriate statistical methods.

Regarding clinical effectiveness, a reduction of PI incidence is unclear due the overall uncertainty and potential biases in available studies. This underscores the need for further research in this critical area, once promising prediction tools have been developed and appropriately validated. In particular, the clinical impact of newer ML-based models currently remains largely unexplored. Despite these limitations, our umbrella review provides valuable insights into the current state of PI risk prediction tools, emphasising the need for robust research methods to be used in future evaluations.

Acknowledgments

We would like to thank Mrs. Rosie Boodell (University of Birmingham, UK) for her help in acquiring the publications necessary to complete this piece of work.

Paul Hartmann AG is part of HARTMANN GROUP. The contract with the University of Birmingham was agreed on the legal understanding that the authors had the freedom to publish results regardless of the findings.

The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

References

  1. 1. Li Z, Lin F, Thalib L, Chaboyer W. Global prevalence and incidence of pressure injuries in hospitalised adult patients: A systematic review and meta-analysis. Int J Nurs Stud. 2020;105:103–546. pmid:32113142
  2. 2. Padula WV, Delarmente BA. The national cost of hospital-acquired pressure injuries in the United States. Int Wound J. 2019;16(3):634–640. pmid:30693644
  3. 3. Sullivan N, Schoelles K. Preventing In-Facility Pressure Ulcers as a Patient Safety Strategy. Ann Intern Med. 2013;158(5.2):410–416. pmid:23460098
  4. 4. Qaseem A, Mir TP, Starkey M, Denberg TD. Risk Assessment and Prevention of Pressure Ulcers: A Clinical Practice Guideline From the American College of Physicians. Ann Intern Med. 2015;162(5):359–369. pmid:25732278
  5. 5. Padula WV, Pronovost PJ, Makic MBF, Wald HL, Moran D, Mishra MK, et al. Value of hospital resources for effective pressure injury prevention: a cost-effectiveness analysis. BMJ Qual Saf. 2019;28(2):132. pmid:30097490
  6. 6. Institute for Quality and Efficiency in Health Care (IQWiG). Preventing pressure ulcers. Cologne, Germany 2006 [updated 2022 Aug 19; accessed 2023 Feb]. Available from: https://www.ncbi.nlm.nih.gov/books/NBK326430/?report=classic.
  7. 7. Hillier B, Scandrett K, Coombe A, Hernandez-Boussard T, Steyerberg E, Takwoingi Y, et al. Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Diagn Progn Res. 2025 Jan 14; 9(1):2. pmid:39806510
  8. 8. Braden B, Bergstrom N. A Conceptual Schema for the Study of the Etiology of Pressure Sores. Rehabil Nurs. 1987;12(1):8–16. pmid:3643620
  9. 9. Bergstrom N, Braden BJ, Laguzza A, Holman V. The Braden Scale for Predicting Pressure Sore Risk. Nurs Res. 1987;36(4):205–210. pmid:3299278
  10. 10. Norton D. Geriatric nursing problems. Int Nurs Rev. 1962;9:39–41. pmid:14480428
  11. 11. Waterlow J. Pressure sores: a risk assessment card. Nurs Times. 1985;81:49–55. pmid:3853163
  12. 12. NICE. Pressure ulcers: prevention and management. Clinical guideline [CG179]. 2014 [accessed: Aug 2024]. Available from: https://www.nice.org.uk/guidance/cg179.
  13. 13. Haesler E. European Pressure Ulcer Advisory Panel, National Pressure Injury Advisory Panel and Pan Pacific Pressure Injury Alliance. Prevention and Treatment of Pressure Ulcers/Injuries: Clinical Practice Guideline. 2019 [accessed: 2023 Feb]. Available from: https://internationalguideline.com/2019.
  14. 14. Scott K, Longstaffe S. Judy Waterlow. 2020 [accessed: 2024 Aug]. Available from: https://litfl.com/judy-waterlow/.
  15. 15. Šimundić AM. Measures of Diagnostic Accuracy: Basic Definitions. EJIFCC. 2009;19(4):203–211. pmid:27683318
  16. 16. Leeflang MM, Rutjes AW, Reitsma JB, Hooft L, Bossuyt PM. Variation of a test’s sensitivity and specificity with disease prevalence. CMAJ. 2013;185(11):E537–E544. pmid:23798453
  17. 17. Maiga A, Farjah F, Blume J, Deppen S, Welty VF, D’Agostino RS, et al. Risk Prediction in Clinical Practice: A Practical Guide for Cardiothoracic Surgeons. Ann Thorac Surg. 2019;108(5):1573–1582. pmid:31255609
  18. 18. Pollock M, Fernandes RM, Becker LA, Pieper D, L H. Chapter V: Overviews of Reviews. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions version 6–3 (updated 2022 Feb). Available from: https://training.cochrane.org/handbook/archive/v6.3/chapter-v. Cochrane, 2022 [accessed: 2023 Feb].
  19. 19. McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, the PRISMA-DTA Group. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388–396. pmid:29362800
  20. 20. Ingui BJ, Rogers MA. Searching for clinical prediction rules in MEDLINE. J Am Med Inform Assoc. 2001;8(4):391–397. pmid:11418546
  21. 21. Wilczynski NL, Haynes RB. Optimal Search Strategies for Detecting Clinically Sound Prognostic Studies in EMBASE: An Analytic Survey. J Am Med Inform Assoc. 2005;12(4):481–485. pmid:15802476
  22. 22. Geersing G-J, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. Search Filters for Finding Prognostic and Diagnostic Prediction Studies in Medline to Enhance Systematic Reviews. PLoS ONE. 2012;7(2):e32844. pmid:22393453
  23. 23. NHS. Pressure ulcers: revised definition and measurement. Summary and recommendations. 2018 [accessed: Feb 2023]. Available from: https://www.england.nhs.uk/wp-content/uploads/2021/09/NSTPP-summary-recommendations.pdf.
  24. 24. Agency for Health Care Policy and Research (AHCPR). Pressure ulcer treatment. Clin Pract Guidel Quick Ref Guide Clin. 1994;15(15):1–25.
  25. 25. Harker J. Pressure ulcer classification: the Torrance system. J Wound Care. 2000;9(6):275–277. pmid:11933341
  26. 26. Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014;11(10):e1001744. pmid:25314315
  27. 27. Cochrane. Example data extraction form for scoping reviews of prognostic models: The Cochrane Collaboration: The Prognosis Methods Group; [accessed 2023 Feb]. Available from: https://methods.cochrane.org/prognosis/tools.
  28. 28. Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. pmid:28935701
  29. 29. World Health Organization. WHO handbook for guideline development: Chapter 17: developing guideline recommendations for tests or diagnostic tools. 2nd ed. Geneva: World Health Organization; 2014.
  30. 30. Whiting P, Savović J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–234. pmid:26092286
  31. 31. Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(25). pmid:14606960
  32. 32. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–536. pmid:22007046
  33. 33. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170(1):51–58. pmid:30596875
  34. 34. Macaskill P, Gatsonis C, Deeks J, Harbord R, Takwoingi Y. Chapter 10: Analysing and Presenting Results. In: Deeks J, Bossuyt , Gatsonis , editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. 1st ed. Available from: http://srdta.cochrane.org/. The Cochrane Collaboration, 2010 [accessed 2023 Feb].
  35. 35. Macaskill P, Takwoingi Y, Deeks J, Gatsonis C. Chapter 9: Understanding meta-analysis (updated July 2023). In: Deeks J, Bossuyt P, Leeflang M, Takwoingi Y, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. 2nd ed. Cochrane, 2023.
  36. 36. Chen X, Diao D, Ye L. Predictive validity of the Jackson–Cubbin scale for pressure ulcers in intensive care unit patients: A meta-analysis. Nurs Crit Care. 2023;28(3):370–378.
  37. 37. Chen HL, Shen WQ, Liu P. A Meta-analysis to Evaluate the Predictive Validity of the Braden Scale for Pressure Ulcer Risk Assessment in Long-term Care. Ostomy Wound Manage. 2016;62(9):20–28. pmid:27668477
  38. 38. Chou R, Dana T, Bougatsos C, Blazina I, Starmer AJ, Reitel K, et al. Pressure ulcer risk assessment and prevention: a systematic comparative effectiveness review. Ann Intern Med. 2013;159(1):28–38. pmid:23817702
  39. 39. García-Fernández FP, Pancorbo-Hidalgo PL, Agreda JJS. Predictive Capacity of Risk Assessment Scales and Clinical Judgment for Pressure Ulcers: A Meta-analysis. J Wound Ostomy Continence Nurs. 2014;41(1):24–34. pmid:24280770
  40. 40. He W, Liu P, Chen HL. The Braden Scale cannot be used alone for assessing pressure ulcer risk in surgical patients: a meta-analysis. Ostomy Wound Manage. 2012;58:34–40.
  41. 41. Huang C, Ma Y, Wang C, Jiang M, Yuet Foon L, Lv L, et al. Predictive validity of the braden scale for pressure injury risk assessment in adults: A systematic review and meta-analysis. Nurs Open. 2021;8:2194–2207. pmid:33630407
  42. 42. Mehicic A, Burston A, Fulbrook P. Psychometric properties of the Braden scale to assess pressure injury risk in intensive care: A systematic review. Intensive Crit Care Nurs. 2024;83:103686. pmid:38518454
  43. 43. Pancorbo-Hidalgo PL, Garcia-Fernandez FP, Lopez-Medina IM, Alvarez-Nieto C. Risk assessment scales for pressure ulcer prevention: a systematic review. J Adv Nurs. 2006;54(1):94–110. pmid:16553695
  44. 44. Park SH, Lee HS. Assessing Predictive Validity of Pressure Ulcer Risk Scales- A Systematic Review and Meta-Analysis. Iran J Public Health. 2016;45(2):122–133. pmid:27114977
  45. 45. Park SH, Lee YS, Kwon YM. Predictive Validity of Pressure Ulcer Risk Assessment Tools for Elderly: A Meta-Analysis. West J Nurs Res. 2016;38:459–483. pmid:26337859
  46. 46. Park SH, Choi YK, Kang CB. Predictive validity of the Braden Scale for pressure ulcer risk in hospitalized patients. J Tissue Viability. 2015;24:102–113. pmid:26050532
  47. 47. Pei J, Guo X, Tao H, Wei Y, Zhang H, Ma Y, et al. Machine learning-based prediction models for pressure injury: A systematic review and meta-analysis. Int Wound J. 2023. pmid:37340520
  48. 48. Qu C, Luo W, Zeng Z, Lin X, Gong X, Wang X, et al. The predictive effect of different machine learning algorithms for pressure injuries in hospitalized patients: A network meta-analyses. Heliyon. 2022;8(11):e11361. pmid:36387440
  49. 49. Tayyib NAH, Coyer F, Lewis P. Pressure ulcers in the adult intensive care unit: a literature review of patient risk factors and risk assessment scales. J Nurs Educ Pract. 2013;3(11):28–42.
  50. 50. Wang N, Lv L, Yan F, Ma Y, Miao L, Foon Chung LY, et al. Biomarkers for the early detection of pressure injury: A systematic review and meta-analysis. J Tissue Viability. 2022;31:259–267. pmid:35227559
  51. 51. Wei M, Wu L, Chen Y, Fu Q, Chen W, Yang D. Predictive Validity of the Braden Scale for Pressure Ulcer Risk in Critical Care: A Meta-Analysis. Nurs Crit Care. 2020;25:165–170. pmid:31985893
  52. 52. Wilchesky M, Lungu O. Predictive and concurrent validity of the Braden scale in long-term care: A meta-analysis. Wound Repair Regen. 2015;23:44–56. pmid:25682792
  53. 53. Zhang Y, Zhuang Y, Shen J, Chen X, Wen Q, Jiang Q, et al. Value of pressure injury assessment scales for patients in the intensive care unit: Systematic review and diagnostic test accuracy meta-analysis. Intensive Crit Care Nurs. 2021;64:103009. pmid:33640238
  54. 54. Zimmermann GS, Cremasco MF, Zanei SSV, Takahashi SM, Cohrs CR, Whitaker IY. Pressure injury risk prediction in critical care patients: an integrative review. Texto & Contexto—Enfermagem. 2018;27(3).
  55. 55. Baris N, Karabacak BG, Alpar SE. The Use of the Braden Scale in Assessing Pressure Ulcers in Turkey: A Systematic Review. Adv Skin Wound Care. 2015;28:349–357. pmid:26181859
  56. 56. Gaspar S, Peralta M, Marques A, Budri A, Gaspar de Matos M. Effectiveness on hospital-acquired pressure ulcers prevention: a systematic review. Int Wound J. 2019;16(5):1087–1102. pmid:31264345
  57. 57. Health Quality Ontario. Medical Advisory Secretariat. Pressure ulcer prevention: an evidence-based analysis. Ont Health Technol Assess Ser. 2009;9(2):1–104.
  58. 58. Kottner J, Dassen T, Tannen A. Inter- and intrarater reliability of the Waterlow pressure sore risk scale: A systematic review. Int J Nurs Stud. 2009;46:369–379. pmid:18986650
  59. 59. Lovegrove J, Ven S, Miles SJ, Fulbrook P. Comparison of pressure injury risk assessment outcomes using a structured assessment tool versus clinical judgement: A systematic review. J Clin Nurs. 2021. pmid:34854158
  60. 60. Lovegrove J, Miles S, Fulbrook P. The relationship between pressure ulcer risk assessment and preventative interventions: a systematic review. J Wound Care. 2018;27(12):862–875. pmid:30557105
  61. 61. Moore ZEH, Patton D. Risk assessment tools for the prevention of pressure ulcers. Cochrane Database Syst Rev. 2019. pmid:30702158
  62. 62. Kelly J. Inter-rater reliability and Waterlow’s pressure ulcer risk assessment tool. Nurs Stand. 2005;19(32):86–7, 90–2. pmid:15875591
  63. 63. Munoz N, Posthauer ME. Nutrition strategies for pressure injury management: Implementing the 2019 International Clinical Practice Guideline. Nutr Clin Pract. 2022;37(3):567–582. pmid:34462964
  64. 64. Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928. pmid:22008217
  65. 65. AHRQ Methods for Effective Health Care. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Rockville (MD): Agency for Healthcare Research and Quality (US); 2008.
  66. 66. Zahia S, Garcia Zapirain MB, Sevillano X, González A, Kim PJ, Elmaghraby A. Pressure injury image analysis with machine learning techniques: A systematic review on previous and possible future methods. Artif Intell Med. 2020;102:101742. pmid:31980110
  67. 67. Song M, Choi KS. Factors predicting development of decubitus ulcers among patients admitted for neurological problems. Kanho Hakhoe Chi (The Journal of Nurses Academic Society). 1991;21(1):16–26. pmid:1812306
  68. 68. Pang SM, Wong TK. Predicting pressure sore risk with the Norton, Braden, and Waterlow scales in a Hong Kong rehabilitation hospital. Nurs Res. 1998;47(3):147–153. pmid:9610648
  69. 69. Cubbin B, Jackson C. Trial of a pressure area risk calculator for intensive therapy patients. Intensive Care Nurs. 1991;7(1):40–44. pmid:2019734
  70. 70. González-Ruiz J, Carrero AG, Blázquez MH, Vera RDV, Ortiz BG, Pulido M, et al. Factores de riesgo de las úlceras por presión en pacientes críticos. Enferm Clin. 2001;11(5):184–190.
  71. 71. Kwong E, Pang S, Wong T, Ho J, Shao-ling X, Li-jun T. Predicting pressure ulcer risk with the modified Braden, Braden, and Norton scales in acute care hospitals in Mainland China. Appl Nurs Res. 2005;18(2):122–128. pmid:15991112
  72. 72. Halfens R, Van Achterberg T, Bal R. Validity and reliability of the Braden scale and the influence of other risk factors: a multi-centre prospective study. Int J Nurs Stud. 2000;37(4):313–319. pmid:10760538
  73. 73. Ek AC. Prediction of pressure sore development. Scand J Caring Sci. 1987;1(2):77–84. pmid:3134685
  74. 74. Bienstein C. Risikopatienten erkennen mit der erweiterten Nortonskala [Risk patients detected with the extended Norton scale]. Pflege Aktuell (Verlag Krankenpflege). 1991;2(Dekubitus—Prophylaxe und Therapie).
  75. 75. Jackson C. The revised Jackson/Cubbin Pressure Area Risk Calculator. Intensive Crit Care Nurs. 1999;15(3):169–175. pmid:10595057
  76. 76. Fuentelsaz C. Validation of the EMINA scale: tool for the evaluation of risk of developing pressure ulcers in hospitalized patients. Enferm Clin. 2001;11(3):97–103.
  77. 77. Lowthian P. The practical assessment of pressure sore risk. Care: science and practice. 1987;5(4):3–7.
  78. 78. Lowery MT. A pressure sore risk calculator for intensive care patients: ’the Sunderland experience’. Intensive Crit Care Nurs. 1995;11(6):344–353. pmid:8574087
  79. 79. Lindgren M, Unosson M, Krantz AM, Ek AC. A risk assessment scale for the prediction of pressure sore development: reliability and validity. J Adv Nurs. 2002;38(2):190–199. pmid:11940132
  80. 80. Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 2002;21(9):1237–1256. pmid:12111876
  81. 81. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993;12(14):1293–1316. pmid:8210827
  82. 82. Littenberg B, Moses LE. Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method. Med Decis Making. 1993;13(4):313–321. pmid:8246704
  83. 83. Webster J, Coleman K, Mudge A, Marquart L, Gardner G, Stankiewicz M, et al. Pressure ulcers: effectiveness of risk-assessment tools. A randomised controlled trial (the ULCER trial). BMJ Qual Saf. 2011;20(4):297. pmid:21262791
  84. 84. Saleh M, Anthony D, Parboteeah S. The impact of pressure ulcer risk assessment on patient outcomes among hospitalised patients. J Clin Nurs. 2009;18(13):1923–1929. pmid:19374691
  85. 85. Moore Z, Johansen E, Mv E, Strapp H, Solbakken T, Smith BE, et al. Pressure ulcer prevalence and prevention practices: a cross-sectional comparative survey in Norway and Ireland. J Wound Care. 2015;24(8):333–339. pmid:26562375
  86. 86. Moore Z, Pitman S. Towards establishing a pressure sore prevention and management policy in an acute hospital setting. The all Ireland journal of nursing & midwifery. 2000;1(1):7–11.
  87. 87. Gunningberg L, Lindholm C, Carlsson M, Sjödén PO. Implementation of risk assessment and classification of pressure ulcers as quality indicators for patients with hip fractures. J Clin Nurs. 1999;8(4):396–406. pmid:10624256
  88. 88. Bale S, Finlay I, Harding KG. Pressure sore prevention in a hospice. J Wound Care. 1995;4(10):465–468. pmid:8548573
  89. 89. Hodge J, Mounter J, Gardner G, Rowley G. Clinical trial of the Norton Scale in acute care settings. Aust J Adv Nurs. 1990;8(1):39–46. pmid:2091682
  90. 90. Moher D, Liberati A, Tetzlaff J, Altman DG, The PG. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009;6(7):e1000097. pmid:19621072
  91. 91. Steyerberg EW, Harrell FE Jr. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol. 2016;69:245–247. pmid:25981519
  92. 92. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138(1):W1–W12. pmid:12513067
  93. 93. Reitsma J, Rutjes A, Whiting P, Yang B, Leeflang M, Bossuyt P, et al. Chapter 8: Assessing risk of bias and applicability (updated July 2023). In: Deeks J, Bossuyt P, Leeflang M, Takwoingi Y, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. 2nd ed. Cochrane, 2023.
  94. 94. Deeks JJ, Dealey C. Pressure sore prevention: using and evaluating risk assessment tools. Br J Nurs. 1996;5(5):313–320. pmid:8715749
  95. 95. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016;352:i6. pmid:26810254
  96. 96. Trikalinos TA, Siebert U, Lau J. Decision-Analytic Modeling to Evaluate Benefits and Harms of Medical Tests: Uses and Limitations. Med Decis Making. 2009;29(5):E22–E29. pmid:19734441
  97. 97. Dweekat OY, Lam SS, McGrath L. Machine Learning Techniques, Applications, and Potential Future Opportunities in Pressure Injuries (Bedsores) Management: A Systematic Review. Int J Environ Res Public Health. 2023;20(1). pmid:36613118
  98. 98. Berlowitz DR, VanDeusen Lukas C, Parker V, Niederhauser A, Silver J, Logan C. 3F: Care Plan. 2014 [accessed 2024 Aug]. In: Preventing pressure ulcers in hospitals: A toolkit for improving quality of care [Internet]. Agency for Healthcare Research and Quality; [140–2]. Available from: https://www.ahrq.gov/sites/default/files/publications/files/putoolkit.pdf.
  99. 99. Hingorani AD, Windt DA, Riley RD, Abrams K, Moons KGM, Steyerberg EW, et al. Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ. 2013;346:e5793. pmid:23386361
  100. 100. Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691–698. pmid:22397946
  101. 101. Shi C, Dumville JC, Cullum N. Evaluating the development and validation of empirically-derived prognostic models for pressure ulcer risk assessment: A systematic review. Int J Nurs Stud. 2019;89:88–103. pmid:30352322
  102. 102. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–W73. pmid:25560730
  103. 103. Lee J, Mulder F, Leeflang M, Wolff R, Whiting P, Bossuyt PM. QUAPAS: An Adaptation of the QUADAS-2 Tool to Assess Prognostic Accuracy Studies. Ann Intern Med. 2022;175(7):1010–1018. pmid:35696685