Agreement among Healthcare Professionals in Ten European Countries in Diagnosing Case-Vignettes of Surgical-Site Infections

Objective Although surgical-site infection (SSI) rates are advocated as a major evaluation criterion, the reproducibility of SSI diagnosis is unknown. We assessed agreement in diagnosing SSI among specialists involved in SSI surveillance in Europe. Methods Twelve case-vignettes based on suspected SSI were submitted to 100 infection-control physicians (ICPs) and 86 surgeons in 10 European countries. Each participant scored eight randomly-assigned case-vignettes on a secure online relational database. The intra-class correlation coefficient (ICC) was used to assess agreement for SSI diagnosis on a 7-point Likert scale and the kappa coefficient to assess agreement for SSI depth on a three-point scale. Results Intra-specialty agreement for SSI diagnosis ranged across countries and specialties from 0.00 (95%CI, 0.00–0.35) to 0.65 (0.45–0.82). Inter-specialty agreement varied from 0.04 (0.00–0.62) in to 0.55 (0.37–0.74) in Germany. For all countries pooled, intra-specialty agreement was poor for surgeons (0.24, 0.14–0.42) and good for ICPs (0.41, 0.28–0.61). Reading SSI definitions improved agreement among ICPs (0.57) but not surgeons (0.09). Intra-specialty agreement for SSI depth ranged across countries and specialties from 0.05 (0.00–0.10) to 0.50 (0.45–0.55) and was not improved by reading SSI definition. Conclusion Among ICPs and surgeons evaluating case-vignettes of suspected SSI, considerable disagreement occurred regarding the diagnosis, with variations across specialties and countries.


Introduction
Despite progress in prevention, [1] surgical-site infection (SSI) remains one of the most common adverse events in hospitals, accounting for 11% to 26% of all healthcare-associated infections [2][3]. SSI prevention is therefore receiving considerable attention from surgeons and infection control physicians (ICPs), healthcare authorities, the media, and the public in most European countries. There is a perception among the public that SSIs may reflect poor quality of care.
Several countries require public reporting of hospital-acquired infections, using either process indicators or infection rates, [4] with the goal of improving transparency, patient safety, public information, and performance by benchmarking of surgical units and healthcare facilities. However, there is little evidence that publishing quality indicators improves care [5]. The public reporting of infection rates remains debated at the national and international levels [6]. In any case, if SSI rates are to serve as a quality indicator for healthcare facilities and the public, they must be determined in a reliable way that produces robust infection rates [7]. SSI rates vary according to co-morbidities and to the contamination class and conditions of the surgical procedure. The need for adjustment has been demonstrated, and most surveillance networks use the National Nosocomial Infection Surveillance (NNIS) index for risk stratification [8][9]. Another factor that influences SSI rates is the robustness of SSI diagnosis. The extent to which different healthcare professionals agree about the presence of SSI depends on many factors including the use of a shared SSI definition, training, and experience. In several studies, the diagnosis of SSI varied according to the definitions used [10]. A recent French study documented considerable intra-and interspecialty disagreement among healthcare professionals regarding the diagnosis of SSI [11]. Furthermore, recent studies from a European network suggested large differences in SSI recognition across countries [12].
We designed a study to assess agreement in SSI diagnosis among ICPs and surgeons involved in SSI surveillance in 10 European countries.

Ethics Committee Approval
Because of the observational and blinded nature of the study, the institutional review board of the Bichat-Claude Bernard Hospital waived the requirement for informed consent. According to this statement, written consents of patients were not collected. The study has been approved by the ethical committee of the Bichat-Claude Bernard Hospital group.

Development of Case-vignettes
Case-vignettes allow an assessment of the same cases by ICPs and surgeons involved in diagnosing SSI. We used blinded random assignment of the case-vignettes to ICPs and surgeons to assess agreement regarding SSI diagnosis and depth. In addition, we determined whether providing SSI definitions influenced SSI diagnosis and depth assessment.
The case-vignettes were built from SSI surveillance data collected in six surgical units in four French university hospitals. Surgical procedures were selected based on the following criteria: i) preferentially clean or clean-contaminated surgical procedure; ii) presence of a skin incision allowing standard wound surveillance and SSI diagnosis, iii) surgical procedure usually requiring at least 1 week of in-hospital post-operative surveillance, and iv) sufficiently high SSI incidence to ensure the collection of a large number of suspected cases within a short period.
Consecutive patients with suspected SSI were followed throughout their hospitalisation or re-hospitalisation. Each day, a bedside evaluation was performed; the medical chart and nursing log were reviewed; and laboratory test results, microbiological findings, and imaging study findings were recorded. Photographs of the wound and/or computed tomography (CT) results were obtained. Suspected SSI was defined as wound modification or discharge and/or evidence of infection. We used the Centers for Disease Control SSI definition, which is identical to the European HELICS/IPSE definition [13].
We identified 20 patients with suspected SSI and complete information after heart surgery (n = 5), gastrointestinal surgery (n = 5), orthopaedic surgery (n = 4), obstetric surgery (n = 2), neurosurgery (n = 2), or ENT surgery (n = 2). A single investigator developed standardised case-vignettes, in English, based on these 20 patients. Each vignette described demographic data, past medical history, the surgical procedure, and the postoperative data. Figure S1 shows one of the case-vignettes.

Participants
We asked 10 European leaders in SSI surveillance and prevention in 10 European countries (Finland, France, Germany, Hungary, Italy, Serbia, Switzerland, The Netherlands, Turkey, and the UK) to each recruit 10 ICPs and 10 surgeons for the study, using their personal connections, and to send the list of participants to the study investigators. Because of the observational and blinded nature of the study, the institutional review board of the Bichat-Claude Bernard Hospital waived the requirement for informed consent.

Study Design and Data
The 20 vignettes were assigned at random to allow assessments of agreement among (i) participants in the same speciality in the same country; (ii) participants in the same speciality in different countries; and (iii) participants in different specialities in the same country.
Each of the 20 vignettes was to be scored four times by different ICPs and surgeons in all 10 countries. Then, four ICPs and four surgeons taken at random in each country read the SSI definitions and repeated the scoring of one vignette.
Scores were assigned using a seven-point Likert scale ranging from ''SSI certainly absent'' (score 1) to ''SSI certainly present'' (score 7) [14]. When the score was between 4 and 7, the participant scored SSI depth on a 3-point scale (1,superficial SSI;2,depth unclear;and 3,deep or organ/space-related SSI). We simplified the depth assessment by classifying deep and organ/space-related SSIs in the same group, as both SSI categories have the same severe consequences in terms of mortality, morbidity, and hospital stay prolongation.
A secure online relational database was established for data collection. Each participant had a personal login and password [15]. Patient data were presented chronologically, and the scores assigned before reading the SSI definition could not be changed. Before scoring the vignettes, each participant provided the following information: age, gender, type of hospital, and time working in the current job.

Statistical Analysis
We estimated the number of vignettes and participants needed to assess agreement within specialties based on the precision of the intra-class correlation coefficient (ICC) [16] and on feasibility considerations (number of participants available in each specialty, maximal time needed for scoring). With 20 vignettes each scored four times and an expected ICC of about 0.60, half the exact 95 per cent confidence interval (95%CI), i.e., precision, would be 0.29. Data were described as mean+/ 2SD, median (interquartile range), or percentage. Agreement was assessed before and after reading the SSI definition without distinguishing specialties or countries. To evaluate intra-and inter-specialty agreements for SSI diagnosis based on 1-7 Likert scale scores, we computed the ICC with the 95%CIs. An ICC of 0 indicates the level of agreement produced by chance alone and an ICC of 1 indicates perfect agreement. We defined poor agreement as ICC values less than 0.4, good agreement as ICC values of 0.4 to 0.7, and very good agreement as ICC values greater than 0.7 [17].
To evaluate intra-and inter-specialty agreement regarding SSI depth scored on a 3-point scale, we computed the kappa coefficient with the 95%CIs. We added a fourth category comprising the participants who did not score SSI depth because their SSI diagnosis score was less than 4. Agreement is considered poor when k is 0.20 or less, fair when k is 0.21-0.40, moderate when k is 0.41-0.60, good when k is 0.61-0.80, and very good when kappa coefficient is 0.81-1.00 [18]. Analyses were performed using SAS System, Version 9. 3 (SAS Institute, Cary, NC, USA).

Characteristics of the Participants and Case-vignettes
Overall, 100 ICPs and 86 surgeons agreed to participate; there were 10 surgeons from each of six countries and four to nine surgeons from the remaining four countries. The 186 participants worked in publicly funded (n = 179) or private (n = 7) healthcare facilities in 75 university and 57 non-university hospitals; of these 132 hospitals, 95 (72%) each contributed one participant, 35 (27%) two or three participants, and two (1%) five participants. Median (IQR) age was 47 (40-53) years, 117 (62.9%) participants were men, median time in the current job was 13 (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) years, and 142 (76.3%) participants were directly involved in SSI surveillance programmes in their healthcare facility (Table S1). Table S2 reports the characteristics of the 20 patients selected to build the case-vignettes. SSI was suspected before hospital discharge in 11 patients and after hospital discharge in 9 patients who required re-admission. Wound modification was a feature in 12 (60%) patients. Microbiological specimens were obtained from the surgical wound in 15 patients and were positive in 13.
As four countries contributed 14 fewer surgeons than expected, they contributed lower than expected scores. In all, each of the 20 vignettes was scored without the SSI definitions 40 times by ICPs; for scoring by surgeons, 8 vignettes were scored 35 times, 5 were scored 33 times, 3 were scored 34 times, and 4 had miscellaneous numbers of scorings. In all, there were 1488 scorings without the SSI definitions, instead of the expected 1600.

Case-vignette Scores
In addition to the 1488 scorings without the SSI definitions, 14 vignettes were each scored four times and six vignettes three times with the SSI definitions, for a total of 74 scorings. The median SSI diagnosis score on the 7-point Likert scale obtained without reading the SSI definitions varied across countries from 6 to 7 for ICPs and from 5 to 7 for surgeons (Table 1).

Intra-specialty and Inter-specialty Agreement Regarding SSI Diagnosis in each Country
For ICPs, the ICC based on scores assigned without reading the SSI definitions ranged across countries from 0.26 to 0.65. Agreement was best in Germany (ICC, 0.65; 95%CI, 0.45-0.82) and the UK (0.59, 0.38-0.80), good in four other countries, and poor in the four remaining countries. For surgeons, the ICC ranged across countries from 0.00 to 0.46. Agreement was good in Germany (ICC, 0.46; 95%CI, 0.23-0.69) and poor in the other nine countries ( Table 2).
The inter-specialty ICC based on scores assigned without reading the SSI definitions ranged across countries from 0.04 to 0.55. Agreement was best in Germany (ICC, 0.55; 95%CI, 0.37-0.74), good in two other countries and poor in the remaining seven countries ( Table 2).

Intra-specialty and Inter-specialty Agreement Regarding SSI Diagnosis Across Countries
The intra-specialty ICC was computed based on all scores in all countries. Agreement was good for ICPs (0.41, 0.28-0.61) and poor for surgeons (0.24, 0.14-0.42). Scoring after reading the SSI definitions improved agreement among ICPs (0.57, 0.20-0.82) but not among surgeons (0.09, 0.00-0.55). The inter-specialty ICC was estimated based on all 1488 scores obtained without reading the SSI definitions and showed poor agreement among the 186 participants (0.24, 0.14-0.42) ( Table 2).
The inter-speciality kappa coefficient for superficial/deep SSI scored without reading the SSI definitions varied across countries from 0.09 to 0.35, being highest in Germany. Reading the SSI definition did not change the scoring for superficial/deep SSIs by ICPs or surgeons (data not shown).

Discussion
In a large panel of ICPs and surgeons involved in SSI surveillance in 10 European countries, agreement regarding the diagnosis and depth of SSI varied across countries and across individuals within both specialties. Reading the SSI definitions did not significantly improve agreement among ICPs or surgeons regarding the diagnosis or depth of SSI.
Although preventing all SSIs may not be feasible, bundles of peri-and postoperative measures have been developed to minimise the risk of SSI [9]. SSI surveillance is a component of these bundles. Reporting of SSI rates combined with active infection control efforts by qualified professionals and data feedback to the clinical staff has been shown to be a key factor in preventing SSIs [19][20]. At the local scale, incidence trends obtained by collecting SSI rates using same method over years provide information on the impact of preventive measures. At the global scale, SSI rates can theoretically be compared across units and hospitals. National SSI surveillance programmes have shown decreases in SSI rates, although the underlying mechanisms remain unclear [19,21]. SSI surveillance is now strongly recommended and widely used in industrialised countries. National surveillance networks have issued standard surveillance protocols. However, SSI surveillance faces methodological challenges. If the SSI rate is to serve as a performance indicator, then valid and consistent SSI rates must be obtained. The challenge is both to obtain accurate information about the denominator characterising the study population and to accurately measure the number of SSIs (the numerator). Since 2008, the European Centre for Disease Prevention and Control (ECDC) has been monitoring SSI rates in Europe using a protocol established by the Hospitals in Europe Link for Infection Control through Surveillance (HELICS). A network of European countries that use the same surveillance methods was established, [22] and its results are published every year by the ECDC. However, the reliability and validity of infection reporting must be assessed regularly [23]. The main obstacles in interpreting SSI rates are variations in the diagnosis of SSI and in postoperative follow-up duration [24].
Several studies have documented imperfect agreement across physicians regarding the diagnosis of SSI. In one study, wide differences in SSI diagnosis were noted between ICPs and surgeons, as well as across surgeons [25]. In a recent study, surgeons tended to diagnose only deep and organ-space SSIs, whereas ICPs also diagnosed superficial SSIs, thereby doubling the total SSI rate [26]. A study comparing SSI rates from 11 European countries showed that the proportion of superficial SSIs varied from 20% to 80%, suggesting differences in SSI detection and/or classification across countries [12]. Finally, a study based on the same methodology as the one reported here assessed agreement among a large number of healthcare workers in France [11]. Agreement regarding SSI diagnosis and depth varied across specialties and across individuals within each specialty. Reading the SSI definition produced small improvements in agreement about SSI diagnosis and depth.
Our study further supports the existence of considerable uncertainty regarding SSI detection at the European level. Our results are probably reliable, as we placed the participants in unbiased conditions by asking them to score the same casevignettes through an Internet database. This method ensured that the participants were not influenced by factors such as perceived SSI risk in a particular unit or patient. Considering such factors would probably have increased disagreement among participants. Disagreement may be higher regarding the diagnosis of postdischarge SSIs or of SSIs in patients with minimal wound discharge and no microbiological results.
We found scoring differences across participants, across countries, and across case-vignette types. Agreement for SSI diagnosis and depth was good in Germany within ICPs, within surgeons, and between both specialties. In Germany, the regular cross-hospital evaluations of diagnostic accuracy through casevignettes, conducted as part of the KISS surveillance network, probably improve agreement [1]. Several other countries, such as The Netherlands, France, and the UK have had SSI surveillance networks for many years, which may have improved diagnostic accuracy via the sharing of surveillance methods and SSI rates. Providing the SSI definition did not improve the correlation between scores in our study, in keeping with a previous study demonstrating variable interpretations of the same definition [10]. Our results further support the need for a multidisciplinary approach to SSI surveillance [27]. Our data from 10 European countries consistently showed differences in agreement in each country, suggesting that our results may be also relevant to other countries.
Our study has several limitations. First, the vignettes were scored for the presence or absence of SSI by each participant working alone. SSI is often a difficult diagnosis that is typically made after discussion among surgeons and ICPs. Thus, SSI surveillance aims not only to obtain accurate SSI rates, but also to enhance teamwork between surgical and infection-control teams in order to ensure the implementation of effective preventive strategies. Our results indicate that surveillance should not be performed by individuals in a single specialty [27]. Second, the vignettes were scored via an online database. The vignettes were built from real cases, and the diagnosis of SSI may have been easier for surgeons or ICPs who had had direct contact with the patients. Third, the study was not designed to assess the accuracy of SSI diagnosis. Instead, we focused on agreement among ICPs and surgeons. We were therefore unable to determine which participants made the correct diagnosis, as illustrated in Table 1 showing SSI diagnosis score differences of up to 6 points between Table 1. Distribution of scores assigned by infection control physicians and surgeons before reading the definitions of surgical site infections, on a 7-point Likert scale, in each of the ten European countries.    two participants from the same specialty. Fourth, we selected cases of suspected SSI to assess agreement about the diagnosis of SSI. However, SSI is suspected in only a small proportion of patients after surgery. Our data on agreement about SSI diagnosis would not apply to an actual series of surgical patients. Fifth, in some countries we did not reach the ten expected surgeons for participation. This lower than expected number of participants could have lead to a less precise analysis. Finally, participants in each country were contacted by European leaders in the field of SSI surveillance and prevention. This recruitment method may have lead to the selection of participants working in universities or high-level hospitals and, therefore, to overestimation of agreement in diagnosing SSI.
In conclusion, among ICPs and surgeons evaluating casevignettes of possible SSI, considerable disagreement in SSI diagnosis occurred both between and within countries. This finding supports the need for caution when using SSI rates for benchmarking or public reporting. Nevertheless, SSI surveillance and feedback remain critical for SSI prevention, and must be encouraged despite intrinsic limitations. Rather than stopping SSI surveillance because of uncertain reliability, our results support regular evaluations of SSI diagnosis accuracy, with case-vignettes probably constituting a valuable educational tool. Figure S1 Example of a case-vignette developed for the study.