Agreement among Health Care Professionals in Diagnosing Case Vignette-Based Surgical Site Infections

Objective To assess agreement in diagnosing surgical site infection (SSI) among healthcare professionals involved in SSI surveillance. Methods Case-vignette study done in 2009 in 140 healthcare professionals from seven specialties (20 in each specialty, Anesthesiologists, Surgeons, Public health specialists, Infection control physicians, Infection control nurses, Infectious diseases specialists, Microbiologists) in 29 University and 36 non-University hospitals in France. We developed 40 case-vignettes based on cardiac and gastrointestinal surgery patients with suspected SSI. Each participant scored six randomly assigned case-vignettes before and after reading the SSI definition on an online secure relational database. The intraclass correlation coefficient (ICC) was used to assess agreement regarding SSI diagnosis on a seven-point Likert scale and the kappa coefficient to assess agreement for superficial or deep SSI on a three-point scale. Results Based on a consensus, SSI was present in 21 of 40 vignettes (52.5%). Intraspecialty agreement for SSI diagnosis ranged across specialties from 0.15 (95% confidence interval, 0.00–0.59) (anesthesiologists and infection control nurses) to 0.73 (0.32–0.90) (infectious diseases specialists). Reading the SSI definition improved agreement in the specialties with poor initial agreement. Intraspecialty agreement for superficial or deep SSI ranged from 0.10 (−0.19–0.38) to 0.54 (0.25–0.83) (surgeons) and increased after reading the SSI definition only among the infection control nurses from 0.10 (−0.19–0.38) to 0.41 (−0.09–0.72). Interspecialty agreement for SSI diagnosis was 0.36 (0.22–0.54) and increased to 0.47 (0.31–0.64) after reading the SSI definition. Conclusion Among healthcare professionals evaluating case-vignettes for possible surgical site infection, there was large disagreement in diagnosis that varied both between and within specialties.


Introduction
Surgical site infection (SSI) is receiving considerable interest from healthcare authorities, the media, and the public. Because they are often considered avoidable, the SSI rate has been used for performance assessments and benchmarking [1], and several countries require that healthcare facilities publish SSI rates to improve transparency, and possibly quality of care and patient safety [2]. However, the evidence that publishing quality indicators improves care is scant [3]. Recent reports indicate a need for improved measurement reliability [4], and mandatory public reporting remains a focus of vigorous debate [5,6].
Methodological issues, related to benchmarking and public reporting, remain controversial. If the SSI rate is to serve as a performance indicator, then valid and consistent SSI rates must be obtained [2]. SSI rates vary according to co-morbidities, to the contamination class and conditions of the surgical procedure. The need for adjustment has been demonstrated, and most surveillance networks use risk stratification [7,8]. Another factor that influences SSI rates is the certainty of SSI diagnosis. The extent to which different healthcare professionals will agree regarding the diagnosis of SSI depends on many factors including training, experience, and the use of a common SSI definition. A single-centre study showed variability in the SSI incidence rate according to the SSI definition [9].
We designed a study to assess agreement among healthcare professionals within and among different specialties regarding diagnosis and superficial or deep SSI, based on case-vignettes concerning real patients. We also evaluated whether the providing of NHSN criteria change the agreement estimates

Development of the case-vignettes
Case-vignettes allow an assessment of the same cases by healthcare professionals involved in diagnosing and treating SSI. We used blinded random assignment of the case-vignettes to healthcare professionals.
We followed consecutive patients with suspected SSI throughout their hospitalization or re-hospitalization in four surgical units, two digestive surgery units and two cardiac surgery units in three French University hospitals. Each day, a bedside evaluation was performed; the medical chart and nurses' log were reviewed; and the findings from laboratory and microbiology tests, and imaging studies were recorded. Photographs of the wound and/or computed tomography (CT) results were obtained. We identified 40 patients with suspected SSI and complete information, 20 in cardiac surgery and 20 with gastrointestinal surgery (colorectal or bariatric procedures).
Suspected SSI was defined as wound modification or discharge and/or evidence of infection. We used the Centers for Disease Control SSI definition (Table S1) [10], which is identical to the European HELICS/IPSE definition [11,12].

Participants
We identified 20 healthcare professionals from each of seven specialties potentially involved in SSI management: surgeons in any specialty, anaesthesiologists, microbiologists, infectious diseases specialists, infection control nurses, infection control physicians, and public health specialists.
To build our study sample, participants were recruited by direct solicitation of close colleagues from other hospitals and relation network. In addition, we used the French network for SSI surveillance for surgeons' identification, together with several French societies for the other specialties, i.e. the Public Health Society, the French Hygiene Society, the French Society for Infectious Diseases, the French Society for Microbiology and the French Society for Anesthesiology and Intensive Care.
No randomized selection was done and the first 20 participants volunteering to participate were included in the study. Most of the participants were health-care workers as some of public health specialists were engineer involved in the risk control in hospitals. All 140 participants worked full time in public or private French hospitals with surgical activity, including university and nonuniversity facilities. None of them had been involved in the management of patients used to build the vignettes. All the 140 participants scored the assigned case vignettes during a 4-month period. Because of the observational and blinded nature of the study, the institutional review board of the Bichat-Claude Bernard Hospital waived the requirement for informed consent.

Study design and data
Twenty of the 40 vignettes were randomly assigned for assessing the intra-specialty agreement. These twenty vignettes were scored twice without the SSI definition by participants for each specialty. The same 20 vignettes were also scored twice with the SSI definition by participants inside each specialty. All 40 case vignettes were randomly assigned for assessing the inter-specialty reliability of scoring with or without the SSI definition. In total, each participant scored six vignettes. The first three vignettes were scored without the SSI definition. Then three other vignettes were scored with the SSI definition. Of the 6 vignettes read by one participant, 5 were different, and one was scored twice, first without the SSI definition then with the SSI definition. In total, 20 vignettes were read four times and 20 vignettes were read two times by specialty. Consequently, taking into account the seven specialties, 20 vignettes were scored 28 times and 20 vignettes were scored 14 times, for a theoretical total of 840 scores.
Scores were assigned using a seven-point Likert scale ranging from ''SSI certainly absent'' (score one) to ''SSI certainly present'' (score seven) [13]. When the score was between four and seven, the participant scored superficial/deep SSI on a three-point scale (one, superficial SSI; two, depth unclear; and three, deep or organ/space-related SSI). We simplified the depth assessment by putting deep and organ/space-related SSIs in the same group, as both SSI categories have the same severe consequences in terms of mortality, morbidity, and prolongation of hospital stay.
An online secure relational database was constructed for collecting the study data. Each participant had a personal login and password [14,15]. The patient data were presented chronologically, and the scores assigned before reading the SSI definition could not be changed. Before scoring the vignettes, each participant provided the following information: age, gender, type of hospital, and duration of experience in the current job.

Statistical analysis
We estimated the number of vignettes and participants needed to assess agreement within specialties, according to the precision of the intraclass correlation coefficient [16] and taking into account the feasibility of the study. If 20 vignettes were scored twice and if the expected coefficient is close to 0.60, then the semi-width of the exact 95 per cent confidence interval (i.e., the precision) is equal to 0.29.
Data were described as mean 6 SD, median (interquartile range), or percentage.
Intra-and interspecialty agreement analysis were performed before and after reading the SSI definition. To evaluate intra-and interspecialty agreement regarding the one-seven Likert scale, we computed the intraclass correlation coefficient (ICC). We used the bootstrap procedure (Bias-corrected and accelerated bootstrap) to estimate 95% confidence intervals (95%CIs). An ICC value of 0 indicates the level of agreement produced by chance alone and a value of 1 indicates perfect agreement. We defined poor agreement as ICC values lower than 0.4, good agreement as ICC values of 0.4 to 0.7, and very good agreement as ICC values higher than 0.7 [17].
We also dichotomized the Likert scale (i.e. scores one to four, corresponding to the absence of SSI and scores five to seven, corresponding to the presence of SSI). To evaluate intraspecialty agreement, observed agreement (exact 95% confidence intervals) and simple kappa coefficient (with 95% confidence intervals) were computed. To evaluate interspecialty agreement, we computed kappa for multiple raters with their 95%CIs [18]. Agreement assessed by Kappa coefficient is considered poor when kappa is 0.20 or less, fair when kappa is 0.21-0.40, moderate when kappa is 0.41-0.60, good when kappa is 0.61-0.80 and very good when the kappa value is 0.81-1.00 [19].
To evaluate intra-and interspecialty agreement regarding superficial/deep SSI scored on the 3-point scale, we computed observed agreement (exact 95% confidence intervals) and kappa coefficient (with 95% confidence intervals). We added a fourth category comprising the participants who did not score SSI depth because their score for SSI diagnosis on the 7-point Likert scale was lower than 4.
Analyses was performed using SAS System, Version 9.2 (SAS Institute, Cary, NC) for descriptive and kappa statistics and graphs. R 1.9 software and its ''boot'' and ''psy'' library were used for computing ICCs. Table S2 reports the main characteristics of the 140 participants. All 140 participants completed the study. They originated from 29 University and 36 non University hospitals in France. There was one participant in 40 hospitals (62%), 2 to 4 participants in 20 (31%) hospitals and 5 or more participants in 5 (7%) hospitals. Their median (IQR) age was 48 (29-65) years and 77 (55%) were male. Their median time in their current job was 17 (1-36) years and 98 (70%) of them were directly involved in SSI surveillance programs in their healthcare facility. Among the 140 participants, 104 (74%) worked in publicly funded healthcare facilities, 19 (14%) in private healthcare facilities, and 17 (12%) in other types of centers.

Characteristics of the participants and case-vignettes
SSI was suspected before hospital discharge in 36 patients and after hospital discharge in 4 patients, who required re-admission. Wound modification was a feature in all 20 cardiac surgery patients and in 11 (55%) gastrointestinal surgery patients. Microbiological specimens were obtained from the surgical wound in all 20 cardiac surgery patients and were positive in 11 (55%) of these patients. Of the 20 gastrointestinal surgery patients, 3 underwent wound sampling for microbiological tests, which were positive in 2 patients. Based on the consensus of the two main investigators (DLP and JCL), there was an agreement in 36 out of the 40 vignettes, with the presence of SSI in 21 vignettes (52.5%).

Case-vignette scores
In total, the 40 case-vignettes were scored 822 times and not 840 as theoretically scheduled. Due to a computer assignment glitch, three surgeons were assigned vignettes that had previously been assigned to other surgeons. Therefore, the 18 vignettes that these surgeons were supposed to receive were not scored. The median SSI diagnosis score before reading the SSI definition on the seven-point Likert scale varied across specialties from four (IQR, 2-6) for public health specialists and infection control nurses to seven for anesthesiologists (IQR, 3.5-7) ( Table 1).

Agreement regarding SSI depth
Intraspecialty kappa values for superficial/deep SSI scored without the SSI definition varied from 0.10 to 0.54 (Table 4) (Table 4).
Interspecialty kappa values for SSI depth scored before reading the SSI definition were 0.21 (0.16-0.25). Reading the SSI definition increased in the interspecialty kappa values to 0.29 (0.27-0.31).

Discussion
In a large panel of healthcare professionals from different specialties involved in SSI surveillance, agreement regarding the diagnosis and depth assessment of SSI varied across specialties and across individuals within each specialty. Scoring with the SSI definition improved agreement regarding the SSI diagnosis and depth assessment only in the specialties where agreement was poor initially.
There is an abundance of studies evaluating SSI risk factors and risk stratification [20]. In addition, many studies assessed techniques designed to improve the measurement of the numerator, i.e., the number of SSIs. The reference standard method for SSI surveillance includes daily bedside surveillance Table 1. Distribution of scores assigned before reading the definition of surgical site infection, on a 7-point Likert scale, in each of the seven specialties. and post-discharge surveillance [21]. Several authors evaluated the usefulness of surrogate indicators [22,23].
We are aware of a single study evaluating the impact of different SSI definitions on SSI rates [9]. In this study, SSI rates varied by more than 50% when small changes were made in the SSI definition. This study has limitations, however, including the single-centre design and possible observation bias due to the expectation that SSI rates would vary according to the SSI definition. Other studies suggest imperfect agreement across physicians regarding the diagnosis of SSI. In one study, wide differences in the diagnosis of SSI were noted between infection control practitioners and surgeons, as well as across surgeons [24].   A recent study showed that surgeons tended to diagnose only deep and organ space SSIs, whereas the infection control team doubled the number of SSIs by also detecting superficial SSIs [25]. A study comparing SSI rates from 11 European countries showed substantial differences in SSI distribution, with the proportion of superficial SSIs ranging from 80% to 20%-30%, suggesting differences in SSI detection and/or classification across countries [26].
Our study further supports the existence of considerable uncertainty regarding the detection of SSI. Providing the SSI definition did not change the agreement, except in specialties with an initially low agreement. Agreement decreased in infection control physician, without clear explanation. Our results are probably reliable, as we placed the participants in unbiased conditions by asking them to score the same case-vignettes through an Internet database. This method ensured that the participants were not influenced by factors such as perceived SSI risk in a particular unit or patient. Considering such factors would likely have increased disagreement among participants. Thus, SSI rates may be less than ideal performance indicators. In addition, mandatory surveillance and public reporting may lead to gaming, misinterpretation, and underreporting [5,6]. As recently suggested, there is a need for regular assessments of the reliability and validity of infection reporting [27].
We found scoring differences across participants and across types of case-vignettes. As expected, agreement for diagnosis and superficial/deep SSI assessment were well correlated among surgeons. More surprisingly, the correlation was poor among infection control professionals. Our results further support the need for a multidisciplinary approach to SSI surveillance [28].
Our study has several limitations. First, only one investigator (DLP) selected the suspected SSI and standardized the vignette. In addition, each participant worked alone to determine whether SSI was present in each vignette. SSI is often a difficult diagnosis that requires discussion among surgeons and infection control professionals. The main goal of SSI surveillance is accurate SSI rate determination with feedback of appropriate data to surgeons, but another goal is to strengthen collaboration between surgical and infection-control teams in order to implement effective preventive strategies and to improve quality of care. Our results indicate that surveillance should not be performed by individuals in a single specialty [28]. Second, the participants scored vignettes via an online database. The vignettes were built from real cases, and the diagnosis of SSI may have been easier for healthcare professionals who had had direct contact with the patient. Third, the study was not designed to assess the accuracy of SSI diagnosis. Instead, we focused on agreement among healthcare professionals regarding SSI diagnosis. The two main investigators tentatively classified the vignettes as indicating SSI or no SSI, but their classification differed for several vignettes. We were therefore unable to determine which participants made the right diagnosis. This is illustrated in Table 1, which shows SSI diagnosis score differences of up to 6 points between two participants from the same specialty. Fourth, we selected suspicions of SSI to assess the agreement in the diagnosis of SSI. SSI suspicion however occurs in a small proportion of patients after surgical procedure. The agreement about the presence of SSI would have been higher if the heterogeneity of the population had been greater, e.g., if the population studied have been an actual series of surgical patients rather than a series of surgical patients with suspected infection. Fifth, the study was performed in a country where specific SSI surveillance method and practice are used. Results might have been different in another country. Finally, we selected casevignettes in only two surgical specialties, representing clean and contaminated surgery, respectively. Increasing the spectrum of surgical procedures would probably have increased the degree of disagreement regarding SSI diagnosis and depth assessment. For example, SSI may be particularly difficult to diagnose in the absence of a skin incision, e.g., after vaginal hysterectomy or transurethral resection of the prostate.
In conclusion, among healthcare professionals evaluating casevignettes for possible surgical site infection, there was large disagreement in diagnosis that varied both between and within specialties. These results support a multidisciplinary approach for SSI diagnosis. Our finding supports the need for caution when using SSI rates for benchmarking or requiring public reporting of SSI rates. Similar concerns have been voiced regarding other publicly reported infection rates, such as rates of catheter-related bloodstream infections [29,30] or ventilator-associated pneumonia [31] in critically ill patients. Nevertheless, SSI surveillance and feedback remain important tools for SSI prevention [32]. Further studies are needed to improve agreement regarding the diagnosis of SSI.

Supporting Information
Table S1 This table presents the definition of surgical site infection from the Centers of Diseases Control and Prevention (CDC) that were used in this study [10]. (DOC)