Variation in detected adverse events using trigger tools: A systematic review and meta-analysis

  Luisa C. Eggenschwiler

    Contributed equally to this work with: Luisa C. Eggenschwiler, Anne W. S. Rutjes

    Roles Conceptualization, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland

  Anne W. S. Rutjes

    Contributed equally to this work with: Luisa C. Eggenschwiler, Anne W. S. Rutjes

    Roles Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland

  Sarah N. Musy

    Roles Conceptualization, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland

  Dietmar Ausserhofer

    Roles Investigation, Writing – original draft, Writing – review & editing

    Affiliations Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland, College of Health Care-Professions Claudiana, Bozen-Bolzano, Italy

  Natascha M. Nielen

    Roles Investigation, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland

  René Schwendimann

    Roles Investigation, Writing – original draft, Writing – review & editing

    Affiliations Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland, Patient Safety Office, University Hospital Basel, Basel, Switzerland

  Maria Unbeck

    Roles Conceptualization, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliations School of Health and Welfare, Dalarna University, Falun, Sweden, Department of Clinical Sciences, Danderyd Hospital, Karolinska Institutet, Stockholm, Sweden

  Michael Simon

    Roles Conceptualization, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Institute of Nursing Science (INS), Department Public Health (DPH), Faculty of Medicine, University of Basel, Basel, Switzerland



Adverse event (AE) detection is a major patient safety priority. However, despite extensive research on AEs, reported incidence rates vary widely.


This study aimed: (1) to synthesize available evidence on AE incidence in acute care inpatient settings using Trigger Tool methodology; and (2) to explore whether study characteristics and study quality explain variations in reported AE incidence.


Systematic review and meta-analysis.


To identify relevant studies, we queried PubMed, EMBASE, CINAHL, Cochrane Library and three journals in the patient safety field (last update search 25.05.2022). Eligible publications fulfilled the following criteria: adult inpatient samples; acute care hospital settings; Trigger Tool methodology; focus on specialty of internal medicine, surgery or oncology; published in English, French, German, Italian or Spanish. Systematic reviews and studies addressing adverse drug events or exclusively deceased patients were excluded. Risk of bias was assessed using an adapted version of the Quality Assessment Tool for Diagnostic Accuracy Studies 2. Our main outcome of interest was AEs per 100 admissions. We assessed nine study characteristics plus study quality as potential sources of variation using random regression models. We received no funding and did not register this review.


Screening 6,685 publications yielded 54 eligible studies covering 194,470 admissions. The cumulative AE incidence was 30.0 per 100 admissions (95% CI 23.9–37.5; I2 = 99.7%) and between study heterogeneity was high with a prediction interval of 5.4–164.7. Overall studies’ risk of bias and applicability-related concerns were rated as low. Eight out of nine methodological study characteristics did explain some variation of reported AE rates, such as patient age and type of hospital. Also, study quality did explain variation.


Estimates of AE studies using trigger tool methodology vary while explaining variation is seriously hampered by the low standards of reporting such as the timeframe of AE detection. Specific reporting guidelines for studies using retrospective medical record review methodology are necessary to strengthen the current evidence base and to help explain between study variation.


For the last two decades, patient safety has become and remained a key issue for health care systems globally [1]. One major driver of patient harm in acute care hospitals are adverse events (AEs)—“unintended physical injury resulting from or contributed to by medical care that requires additional monitoring, treatment or hospitalization, or that results in death” [2]. Reported AE rates vary between 7% and 40% [3], increasing health care costs by roughly 10,000 Euros per index admission [4]. Considering that approximately 40% of admissions can be associated with AEs, it is likely that the consequences, both on health care service costs and on patient suffering, are underestimated [4, 5]. While some AEs are hardly avoidable, others are: studies have indicated that 6%–83% of AEs are deemed to be preventable [6, 7].

Retrospective medical record reviews are commonly used when collecting data about patient safety such as AEs. Medical record review methodology using available data [8], was found to identify more AEs when compared with other methods [9, 10], can be repeated over time and can target specific AE types, or the overall AE rate [11].

There are several medical record review methods, and the most used ones are the Harvard Medical Practice Study (HMPS) methodology [12], with subsequently modifications [13], and the Global Trigger Tool (GTT) [2]. The GTT, popularised by the Institute for Healthcare Improvement (IHI) in the US, was primarily designed as a measurement tool in clinical practice to estimate and track AE rates over time, extending beyond traditional incident reports, and aiming to measure the effect of safety interventions [14, 15]. The GTT includes a two-step medical record review process. In the first step, knowledgeable hospital staff—mainly nurses, conduct primary reviews to identify potential AEs using predefined triggers as outlined in the GTT guidance. In the second step, physicians verify the reviews from the first step and authenticate their consensus. A "trigger" (or clue) is either a specific term or an event in a medical record that could indicate the occurrence of an AE, e.g., readmissions within 30 days or pressure ulcers [2]. Its main methodological advantage is that it is an open, inductive process, sensitive to detect various types of AEs [2]. GTT based studies typically report inter-rater reliability coefficients that represent satisfactory reliability (kappa 0.34 to 0.89; mean: 0.65) [16].

GTT’s triggers are grouped into six modules (e.g., Care Module, Medication Module). Some researchers use all six of these [17, 18] while most use only those relevant to their setting [19, 20]. Yet others either create additional modules (e.g., Oncology Module [21, 22]) or develop modified versions tailored specifically to their patient and care settings [3, 23]. While former versions diverge too importantly from the original GTT to label it as GTT, they are still considered as trigger tools (TTs).

When using the GTT outside of the USA, even in cases where translation is unnecessary, triggers need to be adapted to reflect local norms (e.g., blood level limits). Additionally, medication labels need to be adjusted as appropriate [24, 25]. Although the GTT was developed as a manual method, with the rise of electronic health records, the GTT process can be semi or fully automated [26].

Recent systematic reviews focussing on AEs detected via GTT or TT showed high detection rate variability [3, 6, 26]. Some of this variability may reflect differences in the studies’ methodological features. Adaptations in triggers, review processes or patient record selection protocols might influence detection rates, thereby impacting the comparability of detected AEs. Such differences in medical record review methodology have not yet been systematically addressed. Therefore, this study has two aims: (1) to synthesize the evidence identified by the TT methodology regarding AE incidence in acute care inpatient settings; and (2) to explore whether between study variation in the incidence of AEs can be explained by study characteristics and study quality.



This systematic review and meta-analyses adhered to the preferred reporting items for PRISMA guideline [27, 28].

Search strategy and information sources

Our search strategy was developed and validated using methods suggested by Hausner et al. [29, 30]. This involves generating a test set, developing and validating a search strategy and documenting the strategy using a standardized approach [30]. The medical subject headings (MeSH) and keywords for titles and abstracts in our search string were: (trigger[tiab] OR triggers[tiab]) AND (chart[tiab] OR charts[tiab] OR identif*[tiab] OR record[tiab] OR records[tiab]) AND (adverse[tiab] OR medical error[mh]). We used this to query four electronic databases: PubMed, EMBASE, CINAHL and Cochrane Library. In addition, we also hand-searched the top three journals publishing about GTT/TT (BMJ Quality & Safety; Journal of Patient Safety; International Journal for Quality in Health) and screened all authors’ personal libraries. In all searches, publication dates were unrestricted. The detailed search strategy used for this review and further explanations on chosen journals is published in Musy et al. [26]. The index search was conducted in November 2015, additional five update searches in April 2016, July 2017, January 2020, September 2020, and the latest update on May 25 2022.

Eligibility criteria

We included publications fulfilling six criteria:1. publication in English, French, German, Italian or Spanish; 2. adult inpatient samples; 3. acute care hospital settings; 4. medical record review performed manually via GTT or other TT methods; 5. specialties in internal medicine, surgery (including orthopaedics), oncology, or any combination of these (mixed); and 6. outcome data relevant to our study, e.g., number of detected AEs. Systematic reviews and studies addressing only adverse drug events or exclusively deceased patients were excluded.

Study selection and data extraction

Titles and abstracts were screened independently by two researchers in a first round if they included any information on GTT or TT and in a second round on the eligibility criteria. After screening the titles and abstracts, two researchers individually assessed the full-text articles for eligibility. To ensure high-quality data entry, data were extracted by one researcher and verified by a second. Information on study characteristics (e.g., number of admissions, setting, patient demographics) and patient outcomes (incidence, preventability) were collected into an online data collection instrument ( Where studies of authors of this report were considered, a pair without direct involvement in the primary study was chosen to abstract and appraise the study. Differences between researchers were then discussed in the research group to reach consensus.

Our main outcome of interest was AEs per 100 admissions ((number of AEs / number of admissions) * 100). In addition, we included three secondary outcomes: AEs per 1,000 inpatient days ((number of AEs / number of inpatient days) * 1,000), the percentage of admissions with one or more AEs (number of admissions with ≥1 AE / number of admissions) and percentage of preventable AEs (number of preventable AEs / number of AEs). We included nine TT methodology characteristics in our statistical analysis to assess their potentially influence on AE detection rates. We categorized these under four headings: setting (type of hospital, type of specialty), patient characteristics (age, length of stay), design (AE definition, timeframe of AE detection, commission/ omission) and reviewer (training, experience). Definitions of our variables, our categorisations of the selected characteristics and our rationale for the chosen variable and its categorisation are available in Table 1.

Quality assessment

To assess the risk of bias and applicability-related concerns for each included study, we developed and piloted a quality assessment tool (QAT) (see S1 File). This was inspired by the Quality Assessment Tool for Diagnostic Accuracy Studies 2 (QUADAS-2) tool and by the QAT developed by Musy et al. [41]. While assessing our included studies, we used both QUADAS-2 tool dimensions: the risk of bias and applicability-related concerns [41]. We assessed five domains: 1) patient selection; 2) rater or reviewer; 3) trigger tool method; 4) outcomes; and 5) flow and timing. Following the QUADAS-2 structure each domain included standardised signalling questions to help researchers’ rate each of the two dimensions, i.e., risk of bias and applicability-related concerns. Possible dimension classifications were low, high, or unclear. For each study, a QAT was completed by one researcher and reviewed by a second. To reach consensus, differences were discussed between the two and, if necessary, within the research group.

Statistical analysis

To analyse and plot our results we used R version 4.1.3 on Linux [42] with the meta [43] and metafor [44] packages. We determined the number of AEs per 100 admissions and the number of AEs per 1,000 patient days from the reported data. If the number of AEs was not explicitly described, we calculated it from the reported estimate of AEs per 100 admissions and number of patient admissions. The number of patient days could for example be calculated from the total number of AEs per 1,000 patient days. For studies published by this study’s co-authors or in some cases by their research colleagues, when samples overlapped, we asked them for additional information in order to avoid double counting of admissions and AEs [34, 45, 46]. Pooled estimates for AEs per 100 admissions and AEs per 1,000 patient days were derived using a random effects Poisson regression approach within the R metarate function [43, 44]. With the R metaprop function, a random effects logistic regression model was used to obtain summary estimates and confidence intervals (derived by the Wilson method) for the outcomes expressed as percentage of admissions with ≥1 AE and percentage of preventable AEs [43].

Subgroup analysis.

Heterogeneity was explored by stratified analyses, which were performed on the main outcome measure, i.e. number of AEs per 100 admissions to evaluate the influence of the nine study characteristics: type of hospital, type of specialty, patient age, length of stay, AE definition, timeframe of AE detection, commission and omission, reviewer training, and reviewer experience. In addition, we analysed five elements relating to risk of bias and the three for applicability-related concerns. P-values were derived from the likelihood ratio test for model fit (p < 0.05 was considered significant). Furthermore, between study heterogeneity was evaluated visually and by calculating the prediction intervals [47, 48]. To assess the risk of publication bias related to small study size, we created a funnel plot regressing the logit of the AEs per 100 admissions on the standard error, assessed the symmetry of the distribution and performed the Egger test [49].


The index search and update searches produced 9,780 returns. Deleting duplicates left 6,685 separate entries. The more detailed screening process left 54 studies, which were published in 72 publications [5, 9, 10, 14, 15, 1722, 24, 34, 3740, 45, 46, 50102]. Fig 1 depicts the complete review procedure.

Fig 1. Flow diagram of literature search and included studies.

From [27] (GTT, Global Trigger Tool, TT, Trigger Tool).

Study characteristics

The 54 included studies were all published between 2009 and 2022. Their study periods ranged from one month to six years (Table 2). They were conducted in 26 countries, most of them in Europe (34 studies, 63%), followed by the US (12 studies, 22%) and Others (8 studies, 15%).

Table 2. Characteristics of the 54 included studies.

Sorted by continent; within continent alphabetically by country code, and within the country by year.

Four studies (7%) did not report their clinical specialties [10, 17, 71, 77]. For those remaining, almost half (24 studies, 44%) involved mixed specialties. One study included no information on the number of included records [40]. The numbers of included records ranged from 50 to 56,447. Overall, we included 194,470 index admissions in our report.

Table 3 illustrates AE rates’ key characteristics. In seven studies, we could not retrieve the main outcome measure AEs per 100 admissions [14, 24, 40, 55, 70, 80, 94]; for the remaining 47, rates ranged from 2.5 to 140 per 100 admissions. Per 1,000 patient days, the 36 (67%) studies with sufficient data yielded counts ranging from 12.4 to 139.6. And in the 48 studies whose data allowed us to calculate percentages of admissions with one or more AEs, these ranged from 7% to 69%. AE preventability percentages, which 37 studies (69%) reported, ranged from 7% to 93%; however, four of these studies provided no relevant raw data [21, 45, 55, 56].

Table 3. Main characteristics of adverse events (AE) rates.

Quality assessment

Our quality assessment results (Fig 2) indicate that most of the domains of the risk of bias are rated as low (range: 48%–93%). However, the “patient selection” and “reviewer” domains received respectively 15% and 13% high ratings—considerably more than the other domains (range: 2%–6%). In two domains, risk of bias was largely unclear: “reviewer and “trigger tool method” received this rating respectively in 39% and 30% of cases.

Fig 2. Quality assessment of all included studies.

Assessments are presented in risk of bias and applicability-related concerns. (TT method, Trigger Tool method).

Overall applicability-related concerns were predominantly low (range of domains: 65%–87%). High ratings were most prevalent (17%) in the “patient selection” domain; unclear ratings were most common (28%) for “reviewer”. Quality assessment results on study-level are provided in S1 Table.

Summary estimates from meta-analyses

The forest plot in Fig 3 presents AEs per 100 admissions by sample size. Forty-five samples from single countries contributed, as well as two multi-country (n = 10) samples [61, 71]. The summary estimate was 30.0 AEs per 100 admissions (95% CI 23.9–37.5). Visual inspection of the forest plot indicated a high level of between study heterogeneity, which was confirmed by an I2 of 99.7% (95% CI 99.7–99.7). The prediction interval ranged from 5.4 to 164.7 AEs per 100 admissions. Four studies had exceptionally high detection rates [19, 20, 38, 87]. At the opposite side, seven study samples reported fewer than ten AEs per 100 admissions [17, 56, 71].

Fig 3. Forest plot of adverse events per 100 admissions.

Ordered by sample size [5, 10, 15, 1722, 34, 3739, 45, 46, 5054, 5669, 7179, 8291, 93, 95102]. In Wilson et al. 2012, countries were not further specified. (AEs, Adverse events; * pooled estimate; • mean estimate; ‡ calculated total number of AEs).

S1S3 Figs present additional forest plots for the three secondary outcomes, respectively AEs per 1,000 patient days (n = 36 studies), percentages of admissions with AEs (n = 48 studies), and percentages of preventable AEs (n = 33 studies). Our meta-analysis showed a summary estimate of 48.3 AEs per 1,000 patient days (95% CI 40.4–57.8) with high level of between study heterogeneity (prediction interval 15.9–147.0). Twenty-six percent of admissions experienced one or more AEs (95% CI 22.0–29.5, prediction interval 7.8–58.3). Within the studies that rated preventability, 62.6% of AEs were classified as preventable (95% CI 54.0–70.5, prediction interval 16.8–93.3). Similarly, visual inspection indicated a high between study heterogeneity. Funnel plot exploration did not suggest evidence for publication bias or other biases related to small study size (P from Egger test = 0.3, S4 Fig).

Effect of study characteristics.

Eight of nine analysed study characteristics explained part of the heterogeneity between studies (Fig 4).

Fig 4. Forest plot with stratified analysis of the nine selected study characteristics.

(AE, adverse event; CI, confidence interval; GTT, Global Trigger Tool; IHI, Institute for Healthcare Improvement; N Studies, number of studies).

As for the type of hospital study characteristic, academic medical centres (n = 25, 45%) detected more AEs per 100 admissions than non-academic hospitals (respectively 47.1, 95% CI 36.6–60.5 and n = 6, 11%; 35.8, 95% CI 30.8–41.7), but as the summary estimate for mixed types of hospitals (n = 21, 38%; 17.0, 95% CI 11.7–24.8) is lower than either academic and non-academic hospitals, this association is likely confounded by a third feature. For type of clinical specialty, the significant differences within categories were driven by the not reported category (n = 11, 20%), which had fewer AEs per 100 admissions compared to the others (10.6, 95% CI 6.8–16.7). The internal medicine specialty (n = 7, 13%) had the highest number of AEs per 100 admissions (56.4, 95% CI 40.5–78.5), followed by surgery/orthopaedics (n = 11, 20%; 41.7, 95% CI 29.5–59.0). Oncology (n = 4, 7%) had numbers similar to those of the mixed designation (respectively 40.0, 95% CI 26.2–61.3 vs. 33.5, 95% CI 25.0–44.8).

Older patients (mean > 70 years; n = 8, 15%) had a higher incidence of AEs than younger ones (mean ≤ 70 years; n = 38, 69%), although only eight studies specifically investigated older patients (respectively 63.7, 95% CI 43.6–93.0 and 25.9, 95% CI 19.6–34.2). As occurred with the type of clinical specialty, for the category length of stay, the not reported category (n = 20, 36%) has a driving effect, with a mean of 16.7 AEs per 100 admissions (95% CI 11.6–23.9). Greater lengths of stay (mean >5 days; n = 24, 44%) had slightly higher AE rates than shorter ones (<5 days; n = 11, 20%) (respectively 42.9, 95% CI 32.7–56.4 and 40.8, 95% CI 29.0–57.3).

Almost all studies reported an IHI-like definition of AEs (n = 45, 82%). Of the five (9%) that did not report such a definition, AE rates were lower (respectively 29.0, 95% CI 22.4–37.5 and 22.6, 95% CI 13.9–36.8). The remaining five (9%) studies applying a wider than IHI AE definition reported clearly higher AE rates (55.3, 95% CI 42.1–72.7).

For the two characteristics, timeframe of AE detection and commission and omission the studies failed to report in 69% and 82% of the cases, seriously hampering the analyses. Studies that employed a pilot phase as part of the reviewer training (n = 14, 25%) might have had slightly higher detection rates than training only (respectively 36.8, 95% CI 26.3–51.5 and n = 31, 56%; 24.9, 95% CI 18.0–34.4). Reviewers with no experience in medical record review (n = 11, 20%) detected fewer AEs than those with experience (respectively 12.4, 95% CI 7.3–21.2) and n = 16, 29%; 40.9, 95% CI 30.6–54.4). Half of all studies did not report (n = 28, 51%) whether their reviewers had experience in medical record review. In those cases, the reported AE rates were comparable to those of experienced reviewers (35.8, 95% CI 27.5–46.5).

Effect of risk of bias.

Our quality assessment explained some of the variation regarding AE detection rates (S5 Fig). In eight studies (15%), patient selection was rated as high risk of bias because they included a slightly different patient population than defined in the inclusion criteria. These studies had higher rates of AEs than studies with a low risk of bias (respectively 61.2 vs. 32.5 AEs per 100 admissions). In studies where the risk of bias for the trigger tool methodology, the outcome category and the flow and timing were rated as high or unclear, considerably lower AE rates were detected than in those with a low risk of bias.

Similarly, regarding the trigger tool methodology’s applicability-related concerns, ratings of unclear correlated with lower AE rates than those of low (respectively 10.7 vs. 38.7 AEs per 100 admissions).


The aim of this systematic review and meta-analysis was to synthesize AE detection rates with TT methodology and to explore variations in AE rates and assess the study quality in acute care inpatient settings. Reporting of study characteristics varied widely, and non-reporting of characteristics ranged from 5% to 82%. The summary estimate for AEs per 100 admissions was 30 (95% CI 23.9–37.5). An AE rate of 48 per 1,000 patient days, which translates into, 48 AEs in 200 patients with a length of stay of 5 days. Twenty-six percent of patients experience at least one AE related to their hospital stay and 63% out of all AEs were deemed preventable. Eight out of nine study characteristics explained variation in reported AE results. Studies conducted in academic medical centres, or with older populations reported higher AE rates than non-academic centres or younger adult populations. For several risk of bias categories (e.g., outcome, flow and timing), a higher risk of bias in a study indicated lower AE rates, which points to an underestimation of AE detection rates in low quality studies.

Analysing 17 studies in general inpatients, Hibbert et al. [3] reported AE rates of 8–51 per 100 admissions—a far smaller range than we detected (2.5–140). Our studies’ larger range of AEs could result from our larger study sample (n = 54). Further, their rates of admissions with AEs ranged from 7% to 40%, with a cluster of nine falling between 20% and 29% [3]. We found a wider range—7%–69%, but the average (26%) is close to Hibbert et al. [3].

Schwendimann et al.’s scoping review [32] of multicentre studies reported a median of 10% of admissions with AEs, which is less than half what we found. But this is congruent with Zanetti et al.’s integrative review, which reported between 5% and 11% [7]. Both of those reviews, especially Schwendimann et al.’s, concentrated solely on studies applying the HMPS methodology, not TT methodology [7, 32]. One possible reason for the lower rates could be that TT methodology requires the research team to include all identified AEs (if present, several AEs for one patient, not only the most severe, like in HMPS) [2, 12].

Interestingly, Panagioti et al.’s meta-analysis [6] found that half of their sample’s AEs were preventable whereas our meta-analysis indicated an overall preventability of 61%. For an academic hospital with 32,000 annual admissions, a preventable percentage of 61 would mean roughly 5,000 AEs could be prevented annually–given effective prevention strategies could be implemented. The confidence intervals reported by Panagioti and our 95% CI largely overlaps despite the difference in selection criteria for inclusion. They included every study that explored AEs’ preventability and many of those used the HMPS methodology, i.e., targeting more severe AEs [6].

Our meta-analysis explained part of the broad variation in AE detection via the selected study characteristics. One unanticipated finding was that, for many of these characteristics, essential details (e.g., length of stay) were not provided. For those, the not reported group had a dominant influence on AE detection rates. Although four study characteristics—type of specialty, length of stay, timeframe of AE detection, and commission and omission—showed differences in the subgroups, as the differences were driven by the not reported category, these only slightly explain the variation between AE detection rates. For all four characteristics, eight countries from which Wilson et al. [71] drew their samples fell within the not reported category, which might explain some of this result.

Compared to other categories, academic hospitals [34], higher patient age [75], and experienced reviewers [39] all corresponded with more AEs per 100 admissions. Supporting Sharek et al. [39] we found that experienced reviewers were less likely to miss AEs than unexperienced reviewers. These results support many published medical record review studies [23, 3133]. Nevertheless, the findings need to be interpreted with some caution. Regarding type of specialty, the data on internal medicine and surgery including orthopaedic both involve wide confidence intervals (respectively 95% CI 40.5–78.5, and 95% CI 29.5–59.0); therefore, their higher numbers of AEs per 100 admissions (respectively 56.4 and 41.7) are to be questioned: numerous publications have found that surgical patients typically experience more AEs during their hospital stay than medical patients [6, 37, 103].

Addressing the overall quality of the included studies, we rated both their risk of bias and applicability-related concerns as low. This finding is supported by those of two earlier systematic reviews. First, Klein et al.’s [104] assessment of 24 of our 66 included publications indicated reasonable overall quality; second, also using a study sample that overlapped somewhat with ours, Panagioti et al. [6] rated all of the overlapping studies’ risk of bias as low.

Nevertheless, regarding adherence to TT methodology, including data completeness and usability, our meta-analysis clearly showed that our overall study sample’s reporting quality was inadequate. Our QAT explained part of the AE detection rate’s high variability: where risk of bias is rated as high or unclear for “outcome”, “trigger tool method” and “flow and timing”, AE rates are lower than where risk of bias is rated as low. This suggests that insufficient reporting resulted in lower estimates, i.e., the actual AEs per 100 admissions are likely higher than reported here.

Although patterns of publication bias in the field of single arm studies measuring the incidence of AEs are not well understood, we decided to perform a funnel plot analysis to evaluate any association between small study size and the magnitude of the estimates of AEs per 100 admissions. Whenever an uncontrolled study evaluates effects and safety of a therapeutic intervention, publication bias may still be expected, where higher estimates of AE may be less likely to be published. If this type of publication bias is associated with small study size, funnel plot exploration may detect it. The studies included in our review were more about health services and delivery research and we did not anticipate to find obvious signs of publication bias [105], which was eventually confirmed. The vast majority of studies did not report the occurrence of AEs per patient days. Rather than considering this as potential selective reporting bias, we reason that the field is insufficiently aware of the advantage of using person-time incidence rates over incidence proportions, where former facilitates comparison across studies.

Strengths and limitations

Our systematic review was based on an exhaustive search strategy so that it is unlikely we missed studies that would have changed our findings. Throughout the search we have included two studies that were not identified with our search strategy. Those were lacking on of the core components like “adverse” [40] or “record” [86]. We did not do a systematic search of “grey literature” which may lead to remaining studies not identified.

In absence of a suitable risk of bias tool for the type of studies included, we adapted an existing QAT to simultaneously address risk of bias and applicability-related concerns of the included studies. We conducted stratified analyses not only to evaluate effects of studies’ characteristics but also to evaluate effects of QAT domains. Our systematic review included a considerable high number of included studies when compared to previous reviews and resulted in a proportionately higher number of index admissions.

However, we also acknowledge further limitations. One was the exclusion of psychiatric, rehabilitation, emergency departments and intensive care settings. We set this criterion to maximize comparability across study settings. Similarly, by excluding studies focussed only on adverse drug events, we avoided skewing AE rates based on single-event results. Despite their benefits, both decisions reduced the final sample size.

Also, although we consider the identification and labelling of adverse events vital, we chose not to address either the types of AEs or their severity. Furthermore, we did not conduct an analysis of the influence of reported conflict of interest or funding in the included studies, which could further explain some of the variation. For the future, we also acknowledge that the registration of the review protocol on an open access repository is necessary.

Still, the most important limitation is that high levels of not reported information that hampered a full appreciation of the findings. The data did not allow to run multivariable models in a meaningful manner, so that all findings from univariable analyses need to be interpreted with caution, as we cannot exclude that some of the observed association, such as the effect of type of hospital, are confounded. For future studies on AEs via retrospective medical record review, irrespective of the detection methods used, the certainty of the evidence base would benefit from the standard use of a dedicated reporting guideline. Such a guideline is currently lacking for the type of studies included.


Based on our analyses of 54 studies using TT methodology, we found an overall incidence of 30.0 AEs per 100 admissions—affecting 26% of patients. Of these we estimated that 63% were preventable, indicating a high potential to improve patient safety. However, lack of reporting and high levels of statistical heterogeneity limit these estimates’ reliability.

Of nine TT study characteristics evaluated, our analyses indicate that eight explained part of the wide variation in AE incidence estimates. In four of those, most of the variation was driven by the not reported category (type of specialty, length of stay, timeframe of AE detection, commission and omission). For two characteristics (time frame of AE detection, commission and omission), studies even failed to report the methodological information in 69% and 82%.

To enhance comparability—and the reporting of TT studies clearly needs improvement—we recommend the development and implementation of a reporting checklist accompanied with a guidance document specifically for studies on the use of retrospective medical record review methods for AE detection.

Supporting information

S1 File. Quality assessment tool template.


S1 Table. Assessments of risk of bias and applicability-related concerns.


S1 Fig. Forest plot of AEs per 1000 patient days.

* = pooled estimate, • = mean estimate, ‡ = calculated total number of AEs, ~ = calculated total number of patient days [5, 15, 1722, 34, 37, 39, 40, 45, 46, 5052, 54, 57, 58, 60, 6265, 67, 68, 72, 73, 7679, 82, 8487, 8991, 93, 95, 96, 98100, 102].


S2 Fig. Forest plot percentage of admissions with at least one adverse event (AE).

CI, confidence interval; * = pooled estimate, • = mean estimate, + = calculated total number of admissions with ≥ 1 AE [5, 9, 14, 15, 1722, 24, 34, 37, 39, 45, 46, 5058, 6068, 70, 7287, 8994, 96101].


S3 Fig. Forest plot percentage of preventable adverse events (AEs).

CI, confidence interval; * = pooled estimate, • = mean estimate, ¢ = calculated number of preventable AEs [15, 1720, 34, 3739, 46, 50, 51, 53, 59, 6367, 7175, 77, 78, 87, 8991, 9698, 100, 101].


S4 Fig. Funnel plot for AEs per 100 admissions [5, 10, 15, 1722, 34, 3739, 45, 46, 5054, 5669, 7179, 8291, 93, 95102].


S5 Fig. Forest plot with stratified analysis of the risk of bias and applicability-related concerns.

AE, adverse events; N studies, number of studies; CI, confidence interval [5, 10, 15, 1722, 34, 3739, 45, 46, 5054, 5669, 7179, 8291, 93, 95102].



The authors would like to thank Chris Shultis for the editing of this manuscript.


