Risk factors for surgical site infections using a data-driven approach

Objective The objective of this study was to identify risk factors for surgical site infection from digestive, thoracic and orthopaedic system surgeries using clinical and data-driven cut-off values. A second objective was to compare the identified risk factors in this study to risk factors identified in literature. Summary background data Retrospective data of 3 250 surgical procedures performed in large tertiary care hospital in The Netherlands during January 2013 to June 2014 were used. Methods Potential risk factors were identified using a literature scan and univariate analysis. A multivariate forward-step logistic regression model was used to identify risk factors. Standard medical cut-off values were compared with cut-offs determined from the data. Results For digestive, orthopaedic and thoracic system surgical procedures, the risk factors identified were preoperative temperature of ≥38°C and antibiotics used at the time of surgery. C-reactive protein and the duration of the surgery were identified as a risk factors for digestive surgical procedures. Being an adult (age ≥18) was identified as a protective effect for thoracic surgical procedures. Data-driven cut-off values were identified for temperature, age and CRP which can explain the SSI outcome up to 19.5% better than generic cut-off values. Conclusions This study identified risk factors for digestive, orthopaedic and thoracic system surgical procedures and illustrated how data-driven cut-offs can add value in the process. Future studies should investigate if data-driven cut-offs can add value to explain the outcome being modelled and not solely rely on standard medical cut-off values to identify risk factors.


Introduction
Surgical site infections (SSI), as defined by the European Centre for Disease Prevention and Control (ECDC) [1], make up 19.6% of the total number of healthcare-associated infections (HAIs) in Europe. With an estimated 81 089 patients in Europe having an HAI on any given day, almost 16 000 people in Europe are suffering from some form of SSI at any given time [2]. The burden of SSI can be measured in terms of increased length of stay in hospital, additional (surgical) procedures required, increased morbidity and mortality, as well as in economic terms [3].
Risk factors relating to the patient, procedure and the environment alter the odds of an SSI occurring. Research has been done to identify risk factors for SSI with the aim to identify preventative actions to reduce the incidence rate of SSI [4][5][6][7][8][9][10]. Patient-related risk factors for SSI, such as obesity, diabetes, surgery duration and the American Society of Anaesthesiologists (ASA) score are risk factors for digestive system, thoracic and orthopaedic surgical procedures [11][12][13][14][15][16][17][18][19][20][21][22]. Risk factors in low-income countries also include unemployment and level of education due to the disparity in socioeconomic status [14]. Risk factors can be modifiable or nonmodifiable [23]. Modifiable risk factors are most interesting of the two since they can be changed preoperatively to reduce the risk of SSI.
The Segmentation of surgical procedures into homogenous groups makes it possible to find useful and relevant risk factors unique to each segment. Digestive system surgical procedures are more prone to SSI as they are generally clean-contaminated or dirty surgeries which make deep space SSI more likely. The occurrence of SSI after thoracic and orthopaedic surgeries are both relatively low because they are both typically clean surgeries, but the probability of attracting a deep space SSI after thoracic surgery is much higher compared to orthopaedic surgeries [15]. Because of these differences, we focus on digestive system, thoracic and orthopaedic surgical procedures for this study.
Multivariate logistic regression is the most common statistical model used to identify risk factors in longitudinal study design data [16]. Not all studies report the discriminatory power of the multivariate logistic regression model fitted. Risk factor identification studies do not usually specify how continuous variables cut-offs are determined. Cut-off values for variables such as age (�18) or patient temperature (37˚C) may seem intuitive or standard for clinical practice, but they may not statistically be the best cut-offs values determined by the data [17].
The objective of this study is to identify risk factors for SSI from digestive, thoracic and orthopaedic system surgeries using clinical and data-driven cut-off values. A second objective is to compare the identified risk factors in this study to risk factors identified in the literature.

Literature search
A literature search was performed to identify known risk factors for SSI associated with digestive system surgical procedures, thoracic surgery and orthopaedic procedures using the corresponding medical subject headings (MeSH) linked data representation and the MEDLINE database.
Search strings used for MEDLINE literature search: The search results were sorted, using the Best Match algorithm [18] developed by PubMed. Search results were deemed relevant using title and abstract screening. Risk factors were extracted if they were significant in a multivariable analysis until data saturation was achieved [19]. Risk factors identified, which were common to all three groups of surgeries, were defined as "general risk factors" in this study.

Setting and data collection
The Erasmus MC University Medical Centre in Rotterdam is the largest university medical hospital in the Netherlands with more than 1 300 beds [15]. The data used for this study were anonymised in accordance with the Dutch Personal Data Protection Act (WBP). Approval from the Medical Ethical Research Committee was obtained (MEC-2018-1185).
A weekly prevalence survey was performed by infection control practitioners (ICP) from January 2013 until December 2013 and two-weekly until June 2014 using a semi-automated algorithm proposed by Streefkerk et al. [20,21]. This algorithm was used to calculate a nosocomial infection index (NII) which was then verified by ICP in case of a positive outcome to determine whenever an HAI was present or not. An ICP verified all patients with an NII > 7, and a definite SSI outcome was concluded by the ICP using the electronic patient data system. This outcome was used in this study as the occurrence of SSI outcome variable.
Data were extracted from a centralised database, containing cross-departmental data, clinical synopsis reports, infectious disease consultation reports, laboratory results and imaging reports. Data regarding the prescription of antimicrobials, in the J01 class of the Anatomical Therapeutic Chemical (ATC) classification system [22], were also included. Surgeries were included if they were part of the three groups of surgeries under investigation in this study and had a point prevalence measurement within 30 days after the surgery took place. If a second surgery took place within 30 days after an included surgery, then the recent surgery was excluded. All emergency surgeries were excluded to avoid possible undesirable confounding effects relating to the urgency and necessity of the surgeries.

Statistical analysis
The differences in the averages of variables with missing values and those without were evaluated using t-tests and were found statistically significant. These tests, together with Little's MCAR test, convinced us that the missing values were not completely randomly missing and that we could not make use of more simple imputation methods. Therefore, we chose to use conditional Markov chain Monte Carlo (MCMC) with multiple imputations for the imputation process [24,25].
Two methods were used to discretise continuous measurement variables: 1) standard medical cut-offs as used by Erasmus MC and 2) recursive partitioning [17]. Recursive partitioning is a data-driven, supervised discretisation method, used to group continuous values with similar outcomes optimally. The data-driven method was used to test and confirm if the standard medical cut-offs were the best way to explain the outcome variable for the groups of surgical procedures considered.
To build a prognostic prediction model for SSI, Hosmer et al. suggest fitting a univariate logistic regression model to each variable separately and if the p-value is less than a specific pvalue, 0.1 is this case, then consider the variable good enough to include in the multivariate logistic regression model [26]. A univariate analysis was performed for each of the three groups of surgeries using the variables identified from the literature search. Significant variables (p<0.1) in the univariate analysis were added to the list of variables associated with each group of surgery, together with the variables identified from the literature search. This resulted in an extended list of general risk factors as more risk factors were common across the three groups of surgeries.
A multivariate logistic regression model was built using a forward stepwise approach for each of the three groups of surgeries [27]. The general risk factors were first added to the model and then the risk factors unique to each surgery group in the order of the Akaike information criterion (AIC) until convergence was reached. In this case, we chose the conversion of the model to imply that there are no additional variables which can be added which will be statistically significant with a p-value of less than 0.05 or an AIC of 3.8415. Model performance was determined using the Gini coefficient after each step of the multivariate model, and the difference is reported as the marginal contribution of surgery group-specific risk factors for this study [19,28]. Model performance was cross-validated using 5-fold cross-validation to estimate how the model would perform on new data [29]. R [30] was used in this study together with packages mice (multiple imputation) [31], smbinning (recursive partitioning) [32], dplyr (data wrangling) [33], finalfit (formatting of tables) [34] and scorecard (cross-validation) [35].
Approval was obtained from the Medical Ethical Committee of Erasmus MC (MEC-2018-1185) to perform this study. Data were analysed anonymously, and thus no further consent was obtained.

Literature search
The literature search resulted in 1 422 research papers (as at 5 March 2020) using the MeSH headings in the PubMed search engine. We identified 24 research papers, published from 2008 until 2019, which contained statistically significant results from a multivariate analysis. A total of 79 risk factors were identified for the three groups of surgical procedures [11-13, 16, 23, 36-54] (S1 Table). Age, ASA class, body mass index (BMI), preoperative length of stay and diabetes were identified as general risk factors from the literature search. In total, 29 risk factors for digestive system surgical procedures, 31 for orthopaedic procedures and 19 for thoracic surgeries were identified. This amounted to 59 unique risk factors, of which 15 were present in more than one group of surgeries.

Risk factor identification
A total of 21 of the 59 unique risk factors could be replicated using our own data. The variable describing the type of surgery was used to create three homogenous groups of surgical procedures. The emergency classification variable was used to exclude emergency surgeries from the study such that 19 risk factors remained (Table 1). We observed 3 250 surgeries over the study period and excluded 526 (16.2%) emergency surgeries to be left with 2 724 surgical observations. CRP and temperature data were available for 52.55% (60.47% for in-patients) and 96.88% of all surgeries respectively. The significant univariate results of digestive system, orthopaedic and thoracic surgical procedures are shown in Table 2. Antibiotic use, CRP and temperature were added to the list of general risk factors after being found statistically significant in the univariate analysis-increasing the number of general risk factors to 8. Diabetes was identified as a general risk factor from our literature search but was not found significant in any of the three univariate analyses in our own study. For digestive system surgical procedure and thoracic procedures, the datadriven cut-off for age was obtained as 23 years and both the standard cut-off (18 years) and the data-driven cut-off were statistically significant with p-values of less than 0.001 which resulted in rejecting the null hypothesis that the coefficient associated with the age of the patient is zero. For orthopaedic procedures, the data-driven cut-off for the temperature (39 degrees) was found statistically significant, but the standard medical cut-off not. A data-driven CRP cut-off of 8.1 was identified for orthopaedic surgical procedures as opposed to a standard medical CRP cut-off of 10; both cut-offs are statistically significant.
The multivariate results using standard medical cut-offs and data-driven cut-offs are shown in Tables 3 and 4, respectively. The temperature variable was statistically significant in the multivariate analysis using the data-driven cut-offs for all three groups of surgeries, but not in one of the multivariate analysis using the medical standard cut-offs. The duration of the surgery was the only statistically significant variable in the multivariate analyses which was not     (Table 5) shows that 10 of the 19 risk factors, identified during the literature search, were not statistically significant in the univariate or multivariate analysis for any of the surgery groups. BMI and diabetes were identified across all three groups of surgeries and multiple studies as risk factors for SSI but were not statistically significant in this study. Temperature and the duration of the surgery were confirmed as risk factors for digestive system surgeries, and similarly, antibiotic use and age were confirmed as risk factors for thoracic surgeries. Antibiotic use and CRP were identified as risk factors for digestive surgeries from the multivariate analysis, which were identified during the literature search for thoracic and orthopaedic surgeries, respectively. Antibiotic use and temperature were Data-driven, cut-off values determined using recursive partitioning; CRP, C-reactive protein; CI, Confidence Interval; OR, Odds ratio. 1 The multivariate analysis was performed using data-driven cut-offs.
https://doi.org/10.1371/journal.pone.0240995.t004 Table 5. Statistical significance of risk factors and the source which lead them to be considered by surgical procedure.  statistically significant for all three groups of surgeries and were included because of two studies regarding thoracic and digestive system surgeries, respectively [40,55].

Discussion
We identified temperature and antibiotics used at the time of surgery as risk factors for digestive, orthopaedic and thoracic system surgical procedures in this study. The duration of the surgery was identified as a risk factor for digestive surgical procedures. Being an adult (age � 18) was identified as a protective effect for thoracic surgical procedures. Data-driven cut-offs were identified for temperature, CRP and age, which differ from the standard medical cut-offs. Temperature would not have been identified as a risk factor if only standard medical cut-offs were considered. From our literature search, we identified age, ASA class, BMI, preoperative length of stay and diabetes as general risk factors, while CRP, temperature and antibiotic use were identified as general risk factors because of this study. The identified risk factors may be classified as modifiable or non-modifiable, depending upon the circumstances of the patient like the complexity of his condition. For instance, the temperature of a patient may be high because of an existing infection, which is why the surgery is needed in the first place and may not be modifiable before surgery. Age, on the other hand, may be a modifiable risk factor if the surgery can be postponed for several years, e.g. due to a heart defect. This study revealed that children are more likely to be diagnosed with an SSI after thoracic surgery than adults. There are studies which identify risk factors for children after thoracic surgeries, but none found that being a child is a risk factor for SSI [42,48] after undergoing thoracic surgery. We segmented the thoracic surgeries between adults and children and obtained multivariate results for children and adults separately. The multivariate model based only on children (age � 18) did not reveal any significant results, contrary to the results of the thoracic study which found age to be a risk factor for children [12]. This absence could be partly due to the small study population size of 248. Antibiotic usage was the only significant factor in the multivariate analysis of thoracic surgeries based on adults. The other two groups of surgical procedures were consistent in terms of their statistical significance of risk factors based on adults.
The data-driven cut-offs confirmed the existing standard medical cut-offs. On average the clinical cut-off for temperature was one degree Celsius lower, while for digestive system surgical procedures, the clinical cut-off for CRP (10) was just less than two units more than the data-driven cut-off of 8.1. This means that there is a greater difference between the occurrence of SSI for patients with a CRP below and above 8.1 than below and above 10. The data-driven cut-offs improved the ability of the statistical model to explain the occurrence of SSI. The performance of the digestive system surgical procedure prediction model increased by 19.5% due to using data-driven cut-offs rather than the standard medical cut-offs. Using data-driven cutoffs, we were able to identify temperature as a risk factor for all three groups of surgical procedures. If standard clinical cut-offs were used, temperature would not have been significant from the multivariate analysis. This potential oversight illustrates the importance of evaluating the cut-offs used for continuous variables against the data before identifying risk factors.
Antibiotic use, temperature and CRP were added to the list of general risk factors by incorporating the statistically significant results of the univariate analysis. These risk factors might have been overlooked when the focus was on only one type of surgery. Temperature was identified as a risk factor in the multivariate results for all three groups of surgical procedures, whereas the literature search identified it only for digestive surgeries. Antibiotic use was not found during our literature search for digestive or orthopaedic surgical procedures but was found significant for both groups of surgeries in the multivariate analysis of our study.
The Centres for Disease Control and Prevention (CDC), the European centre for disease prevention and control (ECDC), World Health Organisation (WHO) and Netherlands National Institute for Public Health and the Environment (RIVM) suggest maintaining normothermia intraoperatively to prevent undesirable hypothermia (during some thoracic and neurosurgeries, hypothermia may be desirable). [56][57][58] A lower intraoperative bound for temperature of 35.5˚C to 36˚C is explicitly mentioned, and only the RIVM mention an upper bound of 38˚C which is consistent with the risk factors identified in our study. An upper limit for preoperative temperature should, therefore, be investigated instead of only the lower limit. The four health organisations refer to the proper administration and timing of surgical antimicrobial prophylaxis, but not to the proper preoperative use of standard prescription antibiotics. Systemic antibiotics are typically prescribed to stabilise patients before undergoing surgery. A possible explanation for the increased occurrence of SSI associated with antimicrobials prescribed before surgery could be that these patients were not completely stabilised before surgery which increased their risk of SSI. The proper preoperative use of antibiotics should be well defined, and the reason why antibiotic-use was identified as a risk factor for SSI should be further investigated.

Limitations
This is a retrospective, single-centre study, and therefore the data were not collected for the purpose of this study. Even though cross-validation was performed to estimate model performance on new data, the models were not externally validated. Surgeries were aggregated into three broad groups of surgical procedures which serve as a proxy for the reason for surgery but leads to the loss of information regarding the exact reasons for the surgery. Some measurements, like temperature and CRP, were not always present and was partly overcome using imputation. Patient information concerning smoking and drinking habits may be understated due to incomplete medical records. The literature search used for this study was not exhaustive but rather based on the principal on data saturation. A comprehensive list of variables related to the nutritional and immunological alterations of the patients was not included in the analyses as they were not available from the data. We used a 30-day outcome period in which we observe if an SSI was present or not, but according to the CDC definition, this outcome period should be one year for surgical implantation procedures. Since our data only spans over 18 months, it was not possible to use a 12-month outcome window for all surgical implantation procedures, which is a limitation of this study. The administration of prophylaxis and the optimal timing thereof is an important risk factor for the occurrence of SSI. However, these data were not available.

Future work
Future work will investigate the modifiability of the risk factors identified in this study in more detail, as the circumstances under which this occurs are hitherto unclear. The exact purpose of the use of antibiotics over the time of surgery was not investigated in depth, which can be done in future studies. Future research can also investigate differences between adults and children, which lead to the occurrence of SSI among children. Another opportunity for future research is to investigate which risk factors are predictive for the occurrence of SSI over different periods. Doing this will enable healthcare workers to identify which risk factors explain the occurrence of SSI soon after surgery, towards the end of the 30 days and even later for implantation surgeries. These insights can help set guidelines to determine the vigilance necessary to mitigate the risk of SSI on a patient level.

Conclusion
This study shows that data-driven cut-offs can be used to identify risk factors which would not have been identified by only using standard medical cut-offs. Preoperative temperature and antibiotic use were identified as risk factors for digestive, orthopaedic, thoracic system surgeries, while the duration of surgery and age were identified as risk factors for orthopaedic and thoracic system surgeries, respectively. In contrast with literature, this study found that an SSI is more likely to occur in children (age < 18) than in adults after thoracic system surgeries. Statistical modelling has been important to quantify important risk factors and indicate their significance. Clinical studies using retrospective data are important to carry out, despite limitations in the data sets. To this end, future studies should use both standard medical cutoffs and data-driven cut-offs to investigate risk factors.
Supporting information S1 Table. Risk factors identified from multivariate analysis during literature search.