Interventions to improve resilience in physicians who have completed training: A systematic review

Background Resilience is a contextual phenomenon where a complex and dynamic interplay exists between individual, environmental, and socio-cultural factors. With growing interest in enhancing resilience in physicians, given their high risk for experiencing prolonged or intense stress, effective strategies are necessary to improve resilience and reduce negative outcomes including burnout. The objective of this review was to identify effective interventions to improve resilience in physicians who have completed training, working in any setting. Methods and findings We included randomized controlled trials (RCT), and observational studies (including pilot studies) published in English, French, and Spanish that included an intervention to improve resilience in physicians who have completed training. We included studies that implemented interventions to reduce burnout, anxiety, and depression or to improve empathy to ultimately enhance resilience, rather than studies designed solely to reduce stress or trauma-induced stress. We performed a systematic search of Medline, EMBASE, PsychInfo, CINAHL and Cochrane Library with no publication year limit. The last search was conducted on March 29, 2017. We used random effect models to calculate pooled standardized mean differences. Resilience was the primary outcome measure using validated resilience scores. Secondary outcome measures included proxy measures of resilience such as burnout, empathy, anxiety and depression. Our search strategy identified 7,579 records;74 met the criteria for full-text review. Seventeen studies were included in the final review published between 1998 and 2016 of which 9 (4 RCT, 5 observational) had physician data extractable. Interventions varied greatly regarding their approach, duration, and follow-up. Two RCTs measured resilience using validated scales; both found a significant improvement. No meta-analysis for resilience was conducted due to the presence of high clinical and methodological heterogeneity. Conclusions Our systematic review demonstrates that there is weak evidence to support one intervention over another to improve resilience in physicians who have completed training. The quality of evidence for the outcomes ranged from very low to low. There is a need for a consensus on the definition of resilience and how it is measured. Longer follow-up is required to ensure any intervention effects are sustained over time.


Introduction
Resilience refers to the act of coping, adapting, or thriving from adverse or challenging events, where a complex and dynamic interplay exists between individual, environmental and sociocultural factors. Thus interventions towards improving resilience should be geared towards, individual, group and organizational levels [1][2][3][4]. It negatively relates to various psychological morbidities ranging from burnout to depression, and frequently overlaps with the concept of wellness [1]. To assess the effectiveness of various resilience interventions, one must understand the factors that impact resilience, as well as the definitions of this construct.
Personal factors including personality, previous experience of adversity, coping strategies, and organizational factors, including workload and hours, play an important role in predicting resilience [5]. However, the culture of the more immediate social network within which an individual operates is a third factor, with the stigma associated with a doctor suffering psychological problems reducing resilience [1].
Several definitions for the concept of resilience are described with a wide range of scales used for measuring resilience [6][7][8][9][10][11][12]. In this review, we focus on resilience as a continuous, effective and positive adaptation process to adversity, rather than the absence of burnout alone [9].
Enhancing resilience in stressful occupational settings is of growing interest given that these individuals may be more likely to experience adversity increasing burnout risk. Physicians, in particular, are at higher risk of stress [13]. Effective strategies are necessary to prevent negative outcomes including anxiety, depression, burnout, relationship problems, suicide ideation and alcohol abuse [9,14,15]. There are also negative organizational outcomes including impaired work performance and high turnover [16]. Resilience has been identified as an important factor that could help to prevent these consequences and also potentially improve patient clinical outcomes [17][18][19].
With respect to previous reviews of resilience research a recent systematic review on resilience only studied primary healthcare professionals and publications were limited from 1994 to 2014 with a focus on definitions, measures and associations, rather than interventions [9]. One systematic review and meta-analysis on the efficacy of resilience training programs studied randomized trials on diverse adult populations (not specifically healthcare) and persons with chronic diseases [20]. Another systematic review on educational interventions to improve resilience focused on educational programmes, for healthcare students and professionals [21]. Finally, a systematic review of interventions to foster physician resilience among physicians included those still in training [22]. In contrast to these reviews, we focus specifically on resilience-building interventions to enhance resilience in physicians who have completed training, working in all settings. Although resilience should be enhanced at all stages of the medical career, we have excluded residents and medical students in this review as they represent distinct groups from physicians in independent practice. As they operate within educational and training environments, it is likely that interventions may impact these groups differently. We conducted a systematic review and meta-analysis of published studies of interventions to improve resilience in physicians who have completed training, working in any setting.

Search strategy and study eligibility
The protocol for this systematic review and meta-analysis was registered in PROSPERO (CRD42017060197). The conduct of this systematic review and meta-analysis is reported in adherence to the guidelines for systematic review and meta-analyses using the preferred reporting items for systematic review and meta-analyses (S1 Checklist) [23,24]. We included randomized controlled trials (RCTs) and observational studies published in English, French and Spanish that included an intervention to improve resilience in physicians who have completed training working in any setting. We were primarily interested in studies that implemented interventions to enhance resilience and which presented outcome data related to resilience and not just with the intention of reducing stress. Studies with the aim of enhancing resilience but which did not specifically measure resilience i.e. which measured a 'proxy' for resilience such as burnout, depression, anxiety and empathy were included as our secondary outcome measures as these studies targeted these states with the aim of enhancing resilience without specifically measuring resilience. While it is sometimes assumed that burnout and resilience are at opposite ends of the same spectrum, they are separate distinct entities with different measurement scales rating different concepts. Other studies which focused on stress reduction alone (unrelated to resilience) were excluded i.e. when it was clear that they were measuring more short-term or specific interventions which did not really apply to resilience (e.g. when they were interventions to reduce anxiety/stress for breaking bad news specifically, or for distress following patient adverse events, or when they measured states such as phobic anxiety) because stress and burnout are two separate but intertwined entities. Stress does not necessarily cause burnout, but it is not possible to be burned out without experiencing stress [25]. Studies that only included interventions to improve resilience in residents and medical students were ineligible. However, any study that included physicians in addition to residents, medical students or other health care providers was eligible for inclusion, but the extraction of data depended on whether there was subgroup analysis for physicians. For the outcome, we included studies that measured resilience based on the definitions above and the outcome measures as stated in our PICO. Commentaries, perspectives, expert opinions, conference proceedings, editorials, book chapters, and theses were all excluded.
Medline, EMBASE, PsychInfo, CINAHL and Cochrane Library were searched with the help of an information specialist trained in conducting systematic reviews (S1 Database). There was no publication year limit, and the last search was conducted on March 29 th , 2017. We also searched in Google Scholar, BMJ Careers and grey literature. Clinical trial registries were searched to identify completed and in-progress studies. We contacted study authors and hand-searched relevant study references. The full electronic search strategy for all databases is available in S1 Database.
Screening, eligibility and inclusion assessment was conducted independently by two reviewers (CL, MN). Search results were exported into Endnote for duplication removal and referencing and then exported to Covidence for independent screening and additional removal of duplicates. Initial screening involved assessment of abstracts and titles only. Full eligibility was then assessed through full-text screening. Any disagreements were resolved by consensus. When consensus was not achieved, additional reviewers (ES, DF) were consulted. In cases where a study had multiple publications, the most recent one was retained.

Data extraction and quality assessment
Data from included publications was extracted independently by the two reviewers (CL, MN) using a data extraction form piloted. Any disagreement was resolved by consensus, and when consensus was not achieved, the other reviewers (ES, DF) were consulted. Authors of the included studies were contacted when extraction was unclear or when data was missing. Specific data extracted included the study author, publication year, study design, setting, and sample size (number of participants invited, enrolled, randomized and analyzed according to study design). Demographics included the population studied such as physicians' level of care/specialty (primary, secondary or tertiary), mean age and gender. We described the type and details of the interventions and comparators such as duration and frequency. We included outcome measures with their corresponding scales, along with measurement time points and follow-up.
We also extracted data regarding missing data, funding sources and risk of bias. We used the Cochrane Collaboration risk of bias tool for RCTs [26], and the Cochrane Risk of Bias Assessment Tool for Non-Randomized Studies of Interventions (ACROBAT-NRSI) [27] for non-randomized studies. We extracted post-intervention means and standard deviations (SD) for RCTs. For observational studies, we extracted pre-intervention and post-intervention means and SDs. When SDs were not reported, we calculated SDs from confidence intervals (CI) if provided. For outcomes that were reported using different scales, we converted the difference in means (MD) to standardized mean differences (SMD).

Data aynthesis and analysis
All data was analyzed using RevMan 5.3. Before pooling results, we assessed clinical and methodological heterogeneity. Due to extensive heterogeneity, pooling of data was limited to resilience and burnout outcome measures. RevMan 5.3 was used to conduct meta-analyses using random effects and to calculate the I 2 according to the Cochrane Handbook thresholds, where >50% was considered substantial heterogeneity [26]. Publication bias using funnel plots was not assessed due to insufficient number of studies. Instead, we critically appraised the evidence, focusing on broadness of the search, selective outcome reporting and other sources of bias. To evaluate the quality of evidence for our outcome measures, we used the grade recommendation assessment development and evaluation (GRADE) approach, separately for RCTs and observational studies [28]. We evaluated the quality of evidence using the risk of bias, inconsistency, indirectness, imprecision and publication bias. We reported the quality as very low, low, moderate or high. We conducted pre-specified subgroup analyses separating studies that reported resilience measured by validated resilience scores from studies reporting secondary outcome measures of resilience. We also conducted planned subgroup analysis based on level of care (primary, secondary or tertiary) and study design.

Study characteristics
We identified 7,579 records in our search strategy; 74 met the criteria for full-text review. The PRISMA flow diagram is presented in Fig 1. Seventeen studies were included in the final review. Eight studies [26][27][28][29][30][31][32][33][34][35][36] included other health care providers or residents, but did not provide subgroup analysis for physicians, and thus their results were not analyzed (S2 Table). Ten authors of the included studies were contacted (twice if necessary) due to unclear or missing data results, or no physician subgroup analysis. One author provided physician subgroup data. For the others that replied, no subgroup analysis was provided. Thus, 9 eligible physician-specific studies [15,[37][38][39][40][41][42][43][44] were analyzed.
A summary of the clinical characteristics for the 9 eligible studies is presented in Tables 1 and 2. A summary of the methodological characteristics is presented in S1 Table. In general, the studies were published between 1998 and 2016. Four were randomized controlled trials (RCT) [15,37,43,44], 2 of which were pilot RCTs [15,37], one of which [15] was a wait-list controlled trial. Five articles [38][39][40][41][42] were observational studies, one of which [42] could not be analyzed as we were unable to extract the data, as the results were only graphical presentations of the outcome measures and p-values. These five were before-and-after studies with no control groups, except for one [42]. Five studies were in the United States [15,39,40,43,44], 3 of which [15,43,44] were conducted by the Department of Medicine at the Mayo Clinic. Three studies [37,41,42] were conducted in Europe (Norway, UK, and Germany). One study [38] was conducted in Australia.
The physician population included physicians working across all settings and various specialties. Four [39,40,42,44] did not report mean age, and for those that did, age ranged between 31 and 50 years. Two [39,42] did not report gender specifically for physicians; one study [37] had substantially more females, and two studies [43,44] had substantially more males. One [38] only included females. The sample sizes ranged from 40 to 290 participants.

Resilience-building interventions
Interventions for building resilience varied across studies. One RCT undertook a psychosocial skills training and cognitive behavioural approach [37]. Another RCT studied the Stress Management and Resiliency Training (SMART) program, developed at the Mayo Clinic to decrease stress and enhance resilience [15]. One trial facilitated physician discussion groups, mindfulness, reflection and small group learning [44]. Another trial provided an online self-directed micro-tasks intervention specifically for physicians [43]. One observational study conducted an intensive educational program in mindfulness and communication [40]. A prospective cohort study offered two intervention options, both of which were based on an integrative approach incorporating psychodynamic, cognitive and educational theories [41]. Another observational study involved a course in adaptation practice to learn how to cope with stress, anxiety, and depression [42]. Another study offered a paid Mindfulness Based Stress Reduction (MBSR) course for healthcare providers [39]. The intensity of interventions also varied, ranging from a single 90-minute session, to repeated sessions across several months. Followup ranged from a minimum of 8 weeks, to a maximum of 3 years. Three studies [40,43,44] offered monetary incentives, while another [40] offered Continuing Medical Education credits. Two studies reported our primary outcome, resilience; one used the Brief Resilient Coping Scale [37] and the other the Connor-Davidson Resilience Scale [15]. Both are validated resilience scales. Seven reported secondary outcome measures [15,[39][40][41][42][43][44] (burnout, anxiety, empathy and/or depression) but not specifically resilience. Burnout was measured in four studies [40,41,43,44] with the Maslach Burnout Inventory (MBI), which has three subscales, emotional exhaustion (EE), depersonalization (DP) and personal accomplishment (PA). However, one of these only measured the emotional exhaustion subscale [41].
Two pilot RCTs [15,37] reported our primary outcome measure of resilience measured by validated resilience scales (Tables 1 and 2). Both pilot trials [15,37] reported significant improvement in resilience. Six studies [38][39][40][41]43,44] reported burnout scores using the 3 subscales of the MBI, except Isaksson who used the emotional exhaustion 5-item subscale of the MBI. Of these 6 studies, the two RCTs [43,44] reported no statistically significant differences in the three subscales of burnout, except DP which was reported to be significant in one of the studies [44] but for which data could not be extracted for meta-analysis. The 4 observational studies [38][39][40][41] reported significant improvements in the EE subscale for burnout. Two studies [39,40] reported significant improvements for the DP and PA burnout subscales. No statistical significance was reported for depression in the two RCTs [43,44]. One observational study [42] reported significant improvements in depression and anxiety. One RCT measured anxiety, reporting a statistically significant improvement [15]. Two studies measured empathy, where the RCT [44] reported no statistically significant improvement, while the observational study [40] reported significant changes.

Synthesis of results
We were unable to conduct meta-analyses due to significant methodological heterogeneity in study designs, and inconsistency in the outcomes measured across studies. Only the results for burnout could be meta-analyzed. We present a forest plot for resilience measured by validated resilience scales but without an overall pooled effect measure due to heterogeneity (S1 File). We present a general forest plot for burnout for visual simplicity in Fig 2 including observational studies and one RCT. The RCT showed no statistically significant differences for all three burnout subscales. For the observational studies, 4 studies contributed to random effects meta-analysis for emotional exhaustion [pooled SMD -0.67 (95% CI -0.84 to -0.5) p = 0.81; I 2 = 0%]. For the depersonalization subscale, 3 studies contributed to meta-analysis [pooled MD -2.42 (95% CI -3.80 to -1.04) p = 0.76; I 2 = 0%]. For the personal accomplishment subscale, the same 3 studies contributed to meta-analysis [pooled MD 2.47 (95% CI 1.13 to 3.81) p = 0.55; I 2 = 0%]. For these observational studies, all burnout subscales showed a statistically significant improvement.
Due to the small number of studies we were only able to conduct subgroup analysis for primary care physicians for burnout including only two studies [38,40]

Assessment of publication bias and quality of evidence
We did not conduct funnel plots due to insufficient number of studies. While we undertook a broad search publication bias may be possible as we only included studies in English, French and Spanish. Most of the studies were not prone to selective reporting bias, however, a few of the studies were unclear. Most studies were funded with potential conflict of interests in two studies [41,42].
The quality of evidence for our outcome measures was assessed using GRADE. For resilience and MBI subscales (EE, DP and PA) it was estimated to be very low for RCT and low for observational studies (S5 Table). For the subsequent outcomes, two studies measured empathy. One was an observational study for which the quality was very low, but for the RCT, it was deemed low. Anxiety was only measured in one RCT and the quality was estimated as low. Depression was measured in 2 RCTs and it was deemed of moderate quality (S6 Table). The risk of bias for all outcomes measures was serious across studies. Most observational studies lacked a control group and confounding adjustment. Some studies had substantial loss to follow-up and for all studies blinding of participants and of the outcome assessment was not possible.
The risk of bias was similar between RCT studies (S3 Table). Three [37,43,44] out of 4 RCTs used computer-generated algorithms for randomization, but only one [44] had allocation concealment. Blinding of participants, study personnel and outcome assessment was deemed not possible for the four studies due to the nature of the intervention. Attrition bias was generally low. Reporting bias ranged from low to unclear. The within-study risk of bias for the 5 observational studies was also relatively similar between studies (S4 Table). Bias due to potential confounding was deemed serious for all 5 studies. Selection bias was deemed moderate in most observational studies [38][39][40][41], as selection into the study may have been related to the intervention and outcome, where physicians in greater need of the intervention may be more likely to participate. One [39] was a paid course, impacting selection of participants into the study. One study [41] had 19% lost to follow-up and reported that those participants had higher levels of distress (emotional exhaustion) and higher levels of emotion-focused coping strategies at baseline. Another study [40] had 20% loss to follow-up at the end of the study, and no information was given regarding characteristics or reasons for being lost to follow-up. Bias in measurement of outcomes was deemed moderate for all studies, as both blinding of participants and of the outcome assessment was not possible. Bias in selection of the reported results was generally low, but two studies [38,42] had serious bias due to incomplete presentation of results or imprecise p-values.

Discussion
Overall, our systematic review provides insights into resilience interventions for physicians who have completed training. Physicians within training programs practice differently (e.g., protected learning time, working hours' regulations, supervision) than physicians who have completed training who face unique challenges and pressures related to the responsibilities of independent practice. As such, we believe that the interventions applicable to building resilience in this group have different outcomes and merit separate study.
Specifically, this review demonstrates there is currently weak evidence to support one intervention over another to improve resilience. Interventions varied greatly in approach, duration, intensity and follow-up. Only two studies measured resilience using validated resilience scales, both of which provided definitions of resilience that matched our criteria. Both were small pilot studies with analyzable data specifically for physicians. Both studies reported statistically significant improvement in resilience scores, but they were of small sample sizes with one reporting substantial loss to follow-up. We cannot conclude about clinical significance with only two small studies in different physician specialties. We were unable to provide a pool estimate for resilience measured by validated resilience scales due to considerable clinical and statistical heterogeneity (I 2 = 79%). Overall, we are uncertain of the effectiveness of these interventions to improve resilience.
For our secondary outcome measures, we found modest improvement in burnout. However, most studies were uncontrolled before-and-after studies with no confounding adjustment and relatively small sample sizes. The RCT that provided analyzable data for burnout had no significant improvements in burnout (all subscales). Thus, the slight improvement observed in the observational studies, was not enough to support any specific intervention towards improving burnout to enhance resilience. For other secondary outcome measures (empathy, depression and anxiety) data was lacking to conduct a meta-analysis. Only a few studies contributed to these outcomes and their samples sizes were small, limiting our ability to conclude with confidence the effect of the implemented interventions.
We conducted this systematic review in accordance with our registered protocol following PRISMA guidelines. Some studies did not present data in a way that could be extracted for meta-analysis, limiting our sample size and overall conclusions, and some studies included physicians from various levels of care limiting our ability to compare the effect of the intervention across these. Overall, there was such heterogeneity in various aspects of research design that drawing general recommendations was not possible. Furthermore, most of the observational studies that contributed data for meta-analysis did not have a control group, and many studies did not provide reasons and characteristics of those lost to follow-up. Overall, the quality of the evidence for all outcome measures was very low or low and risk of bias was serious, further limiting the strength of the evidence.
We only included English, French and Spanish articles, and all were small in sample size. Determining whether a study implemented an intervention to improve resilience by means of our secondary outcome measures was dependent on our interpretation, and authors' presentation of their interventions and related outcome measures. Specifically, regarding burnout, we had to distinguish between studies that aimed to reduce trauma-induced stress from studies that reduced burnout to ultimately enhance resilience. Additionally, the small number of studies and high statistical heterogeneity limited our data analysis. However, as resilience interventions continue to be implemented in the health professions, this systematic review provides insights into ways that future studies can be designed or defined to ensure that we are able to identify effective resilience interventions.

Conclusions
Resilience interventions have the potential to improve resilience among physicians, yet there is limited information available as to their efficacy and impact across studies. To our knowledge, this is the first systematic review to assess interventions to improve resilience specifically in physicians who have completed training, working in any setting. Our findings for burnout from observational studies showed a modest improvement in burnout, which is consistent with a recent systematic review [45] of interventions to prevent and reduce physician burnout. However, we did not find the same results for RCTs. We recommend that future research include large pragmatic RCTs with sufficient follow-up time. In addition, trials should define what they mean by resilience and whether the outcome measure actually measures resilience or another proxy outcome.   Table. GRADE: Quality of evidence assessment for resilience and burnout subscales (emotional exhaustion, depersonalization and personal accomplishment). (DOCX) S6 Table.