Clinical correlates of workplace injury occurrence and recurrence in adults

Objectives To examine the morbidities associated with workplace injury and to explore how clinical variables modify the risk of injury recurrence. Methods A case-control study was designed using Florida’s statewide inpatient, outpatient, and emergency visits data obtained from the Healthcare Cost and Utilization Project. We included adults who were admitted for a workplace injury (WPI) or injury at other places (IOP), and a matched population of random controls without WPI/IOP. The associations between WPI and clinical morbidities were assessed by univariate and multivariable regression, ranking predictors by information gain, area under the receiver operating characteristic (AUROC), and odds ratios. We analyzed WPI recurrence using survival methods (Kaplan-Meier, Cox regression, survival decision trees) and developed prediction models via regularized logistic regression, random forest, and AdTree. Performance was assessed by 10-fold cross-validation comparing AUROC, sensitivity, specificity, and Harrell’s c-index. Results A total of 80,712 WPI, 161,424 IOP, and 161,424 control patients were included; 485 distinct clinical diagnostic and 160 procedure codes were analyzed after filtering. Acute bronchitis and bronchiolitis, sprains and strains of shoulder and upper arm, ankle and foot, or other and unspecified parts of back, accidents caused by cutting and piercing instruments or objects, and overexertion and strenuous movements were identified as important consequences of WPI. The prediction models of injury recurrence identified several key factors, such as insurance type and prior injury events, although none of the models exhibited high predictive performance (best AUROC = 0.60, best c-index = 0.62). Conclusions WPI is associated to diverse serious physical comorbidity burden. There are demographic, social and clinical comorbidity components associated to the risk of WPI recurrence, although their predictive value is moderate, which warrants future investigation in other information source domains, e.g. deepening into the environmental and societal sphere.

Introduction Workplace injury is a public health concern. According to the United States' Bureau of Labor Statistics, in 2016 in there were approximately 2.9 million nonfatal workplace injuries and illnesses reported by private industry employers; additionally, the number of fatal injuries was more than 5,000 [1]. Although the overall rates of workplace injuries have been steadily decreasing in the U.S. in the past decade, they still pose a substantial economic burden. It is estimated that nonfatal workplace injuries cost nearly $60 billion each year in direct compensation [2,3].
Individual consequences from workplace injuries lead to substantial personal life and public health burden. Previous studies have shown how occupational injuries are associated with increased medical care encounters and health insurance claims. Workplace injuries often entail a variety of psychological and behavioral responses, including stress, reluctance of return to work, and other personal and social afflictions [4].
Recently, the National Institute for Occupational Safety and Health suggested a framework for assessing workplace injury burden, which includes four main approaches: (1) utilizing multiple information source domains; (2) taking a broader view of injuries and related diseases; (3) assessing the impact of the entire working-life continuum; and (4) applying the comprehensive concept of "well-being" [5]. Following this framework, we aimed to examine the short-term and long-term consequences of workplace injury, collating information from the socio-demographic and clinical domains, using a large state-wide healthcare database of the US State of Florida. The overarching goal was to investigate which clinical consequences are associated with workplace injury and to explore how sociodemographic and clinical variables modify the risk of injury recurrence.

Methods
This study was evaluated by the University of Florida's institutional review board as exempt (protocol no. IRB201701906).
The Healthcare Cost and Utilization Project (HCUP)'s State Inpatient Databases (SID), State Ambulatory Surgery and Services Databases (SASD), and State Emergency Department Databases (SEDD) for the state of Florida, US between 2005 and 2014 were utilized [6]. The SID, SASD and SEDD contain anonymized, longitudinally-linked inpatient, outpatient, and emergency room visits data, including patients' demographics, insurance, diagnoses, and procedures for each hospital visit. Between 2005 and 2014, diagnoses and procedures have been encoded using the International Classification of Diseases version 9 (ICD-9) ontology.
Our study included patients aged 18 years and older with at least three years of medical records prior to baseline and with at least five years of follow-up. The rationale behind exclusion of individuals younger than 18 came from US federal and Florida state laws that regulate the employability of minors and the maximum number of working hours per week. The lengths of prior medical history and of follow up were chosen to assure stability in the areas of residence as well as detailed characterization of health statuses before and after injuries. Workplace injury (WPI) was defined by ICD-9 diagnostic code E846, E849.1, E849.2, and E849.3. There were two comparison groups in our analysis: (1) those who had injuries at other place (IOP) were defined as any with diagnoses codes E849.0 to E849.9 (except E849.1, E849.2, or E849.3) and (2) a group of patients without any injuries who met the inclusion criteria as random controls. As long as a patient had one WPI, it was assigned to the WPI group; but if any IOP occurred before, it was accounted for. If one subject had more than WPI, the first one recorded in the data base was used to set the baseline date, while the others would count as recurrence (unless they were readmissions, see below). The two comparison groups were extracted in a 2:1 ratio with the WPI sample and were matched on the distribution of diagnostic years.
The observation unit of this study was the individual patient. For each patient, we associated their diagnoses and procedures recorded before the baseline or during the follow-up using three-digit ICD-9 codes. We also calculated the Charlson's comorbidity index (CCI) [7,8] before and after WPI/IOP or the matching date for random controls. Socio-demographic variables included race/ethnicity, insurance status, and area deprivation index [9] associated with the ZIP code of residence at baseline. Diagnostic codes and procedures recorded in less than 2.5% of the WPI group were removed.
To explore the health consequences of WPI, we examined the association between injury status and clinical comorbidities by multivariable logistic regression, with WPI status as independent variable and each of ICD-9 codes as dependent variable after the occurrence of WPI/ IOP, i.e. up to 999 for standard ICD-9 codes and 290 for supplemental V&E code. The models were adjusted by age, gender, race, insurance, CCI, and status of the corresponding ICD-9 code prior to the occurrence of WPI/IOP. After fitting all multivariate logistic models, we ranked separately: (a) adjusted odds ratios (OR) of injury status variable, (b) information gain, and (c) area under the receiver operating characteristic (AUROC) of each model. Specifically, the OR measures quantifies the strength of the association (increased occurrence) between two variables. The information gain measures reduction in entropy of one variable by knowing another variable; thus, the less information is lost, the higher the quality of that variable. The AUROC measures the discriminatory power of a variable or model. We then combined these three measurements to create a more robust index to identify importance of health consequences [10].
Finally, we built predictive models for injury recurrence in the WPI group using prior information. WPI recurrence was defined as any WPI diagnosis recorded at least 30 days after the first WPI diagnosis. We decided to use such time window because a WPI diagnosis within the time window could have been a readmission for the same injury. In fact, the Centers for Medicare & Medicaid Services (CMS) define a hospital readmission as "an admission to an acute care hospital within 30 days of discharge from the same or another acute care hospital." Logistic regression with least absolute shrinkage and selection operator (LASSO) regularization, random forest, and AdTree methods were used to predict whether a patient would ever have an injury recurrence (ignoring time-to-event and censoring). Then, survival models (accounting for time-to-event and right censoring) were fit, namely a Cox regression with stepwise selection and a survival tree with log-rank split. Performance was assessed via 10-fold cross-validation, comparing AUROC, sensitivity and specificity, and Harrell's c-index. All statistical analyses were conducted using R and its packages, including glmnet, survival, party, ggplot, survminer, RWeka and pROC [11].
A total of 485 unique three-digit ICD-9 diagnostic codes and 160 ICD-9 procedure codes were identified (all above 2.5% frequency in the WPI group). Table 1 displays population characteristics stratified by outcome group. There was a higher proportion of males, Black African American and Hispanic ethnicity in the WPI group as compared to IOP group. The proportion of Hispanics was also higher in the WPI group than in the random controls. The patients in the WPI group had a median age of 37 and a median CCI of 0; they were younger and had fewer comorbidities than the IOP group (median age 56 and median CCI 1), while the random controls had a comparable median age of 57 and a median CCI of 0. The WPI group had substantially higher proportion of self-payers (and insurance types other than federal or private) as compared to the other two groups. The patients in the WPI group also resided in areas with a median deprivation index higher than that of the other groups (106.3 in WPI vs. 105.1 in IOP and 104.6 in control).
Among the WPI patients, the most common diagnosis prior to baseline was non-dependent drug abuse (ICD-9 code 305), followed by symptoms involving respiratory system and other chest symptoms (786), and essential hypertension (401). The most frequent post-WPI diagnoses were essential hypertension (401), non-dependent drug abuse (305), and general symptoms (780), as shown in Table 2.
When analyzing deaths, the Kaplan-Meier estimate yielded a four-year survival probability of 85.8% for the WPI group, 93.8% for the IOP group, and 98.2% for the control groups (Fig 1). Next, we assessed the health consequences of WPI by evaluating the clinical diagnoses made after the injury (or after the corresponding baseline time for the random controls). After Bonferroni adjustment, at significance level of 0.05, a total of 166 variables were identified comparing WPI vs. IOP, and 176 variables for WPI vs. random controls. Fig 2 shows the top-20 ICD-9 codes identified by merging the three distinct measurements of information gain, AUROC, and odds ratio (as absolute regression coefficient in the linear scale). Table 3 further  displays in details the ranking values and confidence intervals for the top variables found in each comparison group that were selected by at least two ranking methods, highlighting those that were more frequently associated to the WPI. When analyzing injury recurrence, the Kaplan-Meier estimate yielded the injury recurrence probability of 66.6% at the second year, 44.4% at the third year, and 13.4% at the fourth year for the WPI group.
The predictive models for injury occurrence exhibited moderate performance. The 'ever recurrence' models yielded a cross-validated AUROC between 0.57 and 0.60 ( Table 4), while the survival models yielded a cross-validated c-index between 0.55 and 0.62 (Table 5). Overall, the Cox regression with stepwise selection exhibited the highest AUROC, and the variables of this model along with their hazard ratios (HRs) are listed in Table 6.

Discussion
In this work, we investigated the health consequences associated with WPI and explored factors that may predict future recurrence of WPI in a large longitudinal statewide data set.
We found that WPI group showed higher proportions of people from Black African American and Hispanic ancestry, male, younger, who lived in areas with a higher deprivation index. The WPI group also included higher proportions of self-payers other than federal or private insurance. These findings were consistent with previous reports from national surveys in the U.S. [12,13].
The WPI group had the worse survival probability, after adjusting for age, compared to IOP and random controls; this confirms prior findings [14,15]. In this population, the three-year survival rate in WPI is 85.8%. We reckon that patients in IOP and WPI have different age distribution because of the employment ages. Although we had not matched ages in the design, we included only adult individuals; yet, there might be still a difference due to younger-not yet employed-adults and older-retired-adults.
In addition to higher mortality, people in the WPI group were also associated with higher risk of physical health morbidity. We used a robust framework to determine the importance of clinical consequences by combining three distinct measurements obtained from regression models. Compared against both IOP and random control groups, patients suffering WPI were more likely to be admitted into care, after injury, for acute bronchitis and bronchiolitis (ICD- , sprains and strains of other and unspecified parts of back (ICD-9: 847), and overexertion and strenuous movements (ICD-9: E927). In addition to these conditions, when compared with random controls, the WPI group had higher odds of having sprains and strains at other body parts such as shoulder and upper arm (ICD-9: 840), and ankle and foot (ICD-9: 847). Our observations are consistent with prior findings that show how WPI lead to physical injuries [16]. Also, studies have reported that occupational exposure to various substances such as silica dust, gas, and fumes is related with the occurrence of chronic obstructive pulmonary disease (COPD) and related illnesses in the spectrum, as chronic bronchitis [17][18][19]. It is notable that the known associations are with chronic illness rather than acute, which is instead what we found. There is a number of possible explanations to this: 1) as we analyzed the first WPI and a limited, censored follow up time, a diagnosis of chronic bronchitis might not have been made yet, but recurring attacks of acute bronchitis may lead to chronic bronchitis; 2) our study population included both routine care and acute care, i.e. emergency rooms and urgent care centers, where an attack of bronchitis could be diagnosed as acute even in presence of an underlying condition; 3) possible selection bias, i.e. people with chronic bronchitis would have increased risk to be in care regardless the injury type, but acute episodes are differently distributed.
It is recognized that re-occurrence of incidents with a similar cause and circumstance in the workplace environment is a public health concern with unacceptable high incidence in the U. S. and worldwide [20]. Both pre-injury and post-injury correlates include social (disparity) determinants as race, and clinical conditions such as mental health disorders and drug dependence (which can be ascertained by prior ICD diagnoses) [21,22]. The Cox regression highlighted a number of conditions that affect the risk of WPI recurrence ( Table 6). We identified a number of factors in the sociodemographic and clinical domains (e.g. age, insurance, gender, extant physical injury, chronic pulmonary conditions), but the prediction models did not yield good prediction performance. One of the reasons is that prior clinical history and basic sociodemographics may not be the most informative domains to predict risk of WPI. Other predictors explaining a larger portion of variance could include job type, workplace safety, specific post-WPI work conditions, et cetera, which are not present in the HCUP data base. This study has some limitations. First, heterogeneity in WPI was not accounted for in our study, as we used a group of ICD-9 codes to define the WPI group, which included all "accidents occurring in industrial places and premises", mine and quarry accidents, farm accidents, and accidents involving powered vehicles used solely within the buildings and premises of industrial or commercial establishment. These code do not differentiate the types of these industrial places and premises. Different types of industrial places and premises may have different impacts and risks for the future injury occurrence, but such information was not available in the data we used. In addition, mine and quarry accidents, and farm accidents were considered as WPI, while accidents occurring in unspecified place were considered as IOP. These are not 'explicitly' WPI, so it could introduce potential selection bias to our study; of note, the frequency of such codes was very low. Second, we used only ICD-9 codes for clinical diagnosis and procedures. Although the results of using such taxonomy can be directly applied to other electronic health record systems or similar data bases, if we want to fully understand the mechanics of how the predictors work and how to intervene on the identified predictors, clinical interpretation of these codes is still needed. In addition, in our analysis, short-term consequences and long-term consequences were combined together, i.e. we did not differentiate whether a diagnosis was made shortly after the injury or years after the injury.
Despite these limitations, we conducted a comprehensive analysis on the health consequences of and survival from workplace injuries, and their recurrence. Since the people's demographic and clinical features are responsible for a small portion of total recurrence risk, we reiterate the recommendation of the National Institute for Occupational Safety and Health to examine multiple information domains, especially the social and the ecological determinants, given the important role we found of the racial, health insurance and area deprivation distributions.