Predicting Non Return to Work after Orthopaedic Trauma: The Wallis Occupational Rehabilitation RisK (WORRK) Model

Background Workers with persistent disabilities after orthopaedic trauma may need occupational rehabilitation. Despite various risk profiles for non-return-to-work (non-RTW), there is no available predictive model. Moreover, injured workers may have various origins (immigrant workers), which may either affect their return to work or their eligibility for research purposes. The aim of this study was to develop and validate a predictive model that estimates the likelihood of non-RTW after occupational rehabilitation using predictors which do not rely on the worker’s background. Methods Prospective cohort study (3177 participants, native (51%) and immigrant workers (49%)) with two samples: a) Development sample with patients from 2004 to 2007 with Full and Reduced Models, b) External validation of the Reduced Model with patients from 2008 to March 2010. We collected patients’ data and biopsychosocial complexity with an observer rated interview (INTERMED). Non-RTW was assessed two years after discharge from the rehabilitation. Discrimination was assessed by the area under the receiver operating curve (AUC) and calibration was evaluated with a calibration plot. The model was reduced with random forests. Results At 2 years, the non-RTW status was known for 2462 patients (77.5% of the total sample). The prevalence of non-RTW was 50%. The full model (36 items) and the reduced model (19 items) had acceptable discrimination performance (AUC 0.75, 95% CI 0.72 to 0.78 and 0.74, 95% CI 0.71 to 0.76, respectively) and good calibration. For the validation model, the discrimination performance was acceptable (AUC 0.73; 95% CI 0.70 to 0.77) and calibration was also adequate. Conclusions Non-RTW may be predicted with a simple model constructed with variables independent of the patient’s education and language fluency. This model is useful for all kinds of trauma in order to adjust for case mix and it is applicable to vulnerable populations like immigrant workers.


Introduction
Injuries are a major public health problem that incurs huge costs [1][2][3][4][5].Among injuries, non-fatal orthopaedic trauma is a leading cause of persistent pain, poor quality of life, long lasting sick-leave and disabilities [2,6,7].As in chronic low back pain [8], only a minority of trauma patients have poor outcomes [3,9].As there is evidence that work has a positive impact on health, helping people returning to work is a focal point for public health [9].Consequently, screening patients at risk of unsuccessful return to work (RTW) after orthopaedic trauma is an important issue.
In 2010, Clay and coll.published a systematic review of prognostic factors for RTW after acute orthopaedic trauma [10].Due to the lack of factors included in more than one cohort, the level of evidence of most predictors was weak.There was strong evidence only for the level of education and blue collar work and moderate evidence for self-efficacy, injury severity and receipt of compensation as prognostic factors for the duration of work disability [10].Since this review, some prospective studies suggested additional potential prognostic factors such as age, gender, self-employment, work injury, living in a deprived area, low income, pain intensity, pain attitudes, strong belief in recovery, health status, physical functioning or the presence of symptoms of depression [9,[11][12][13].From these studies, it appears that broad biopsychosocial knowledge is useful to predict RTW after orthopaedic trauma.
Nevertheless, prognostic research after orthopaedic trauma has received limited attention [3,[14][15][16].All the available models for screening patients at risk of poor outcomes were built, and are only useful, for the acute phase after trauma.After the acute phase and the usual period of recovery a large proportion of patients may then be referred to vocational facilities in case of persistent disabilities [17,18].However, these patients do not have the same risk of unsuccessful RTW and to date there is no useful predictive model for them.Consequently, such a model will help to better identify patients with different risk profiles and allow to test the efficiencies of risk adapted interventions in randomized control trials (RCTs) [19,20].
To date, the vocational literature is mostly focused on factors predicting RTW for patients with low back pain or various musculoskeletal disorders [21,22].Some other recent prospective studies also examined this issue for trauma patients [23][24][25].All these studies underline that a biopsychosocial approach is needed.This is most often assessed by the means of self-reported questionnaires [24,26].Nevertheless, modelling RTW prediction based on questionnaires may suffer from selection bias: often, only a subsample of all eligible patients is used [27] because those with poor health literacy or language fluency are excluded [27,28].For instance exclusion of non-native workers, a growing segment of the work forces in industrialized countries, may bias a predictive model [27,29].It is well known that non-native workers are a vulnerable population and may be at risk of being exposed to adverse working conditions [29,30].Therefore, they may have more difficulties returning to work.Another reason for the higher risk of unsuccessful RTW for this group of patients may be different cultural representations and expectations, which can be a reason for drop-outs from occupational rehabilitation [31].An elegant strategy to overcome this problem and to include all the eligible patients may be to build a predictive model from a validated generic tool of biopsychosocial complexity not relying on language fluency.This is precisely a key feature of the INTERMED tool [32,33], a well-studied measure of biopsychosocial complexity [34][35][36][37].Moreover, the INTERMED was recently able to predict poor outcomes and unsuccessful RTW after rehabilitation [25,38].
Therefore, the purpose of this study was to develop and validate a predictive model that estimates the likelihood of unsuccessful RTW for trauma patients who need occupational rehabilitation.This model must associate easily available potential predictors, such as gender, age, education, injury severity and pain, with biopsychosocial variables not relying on language fluency, assessed by the INTERMED.

Study Design
The data come from a prospective, monocentric cohort study, with a collection of biopsychosocial predictors that were (a) available at admission to a rehabilitation clinic and (b) assessable independently from the patient's language fluency.Return to work was assessed through a questionnaire sent two years after discharge from the rehabilitation clinic; in case of non-response, two reminders were sent.

Ethics Statement
The protocol was approved by the ethical committee of the local medical association (Commission Cantonale Valaisanne d'Ethique Me ´dicale CCVEM 04107).Patients gave an oral informed consent and the study was conducted according to the principles expressed in the ''Declaration of Helsinki''.Only demographic and usual clinical data were used and anonymously analysed.In case of disagreement, patients signed a refusal letter and were excluded.This consent procedure was approved by the ethics committee.

Setting
This study took place in the Clinique Romande de Re ´adaptation (CRR) at Sion (Canton of Wallis) in the French-speaking part of Switzerland.Patients, mostly blue collar workers, with orthopaedic trauma of the back, upper or lower limb and multiple trauma were included in the study between January 1 st , 2004 to December 31 st , 2007 for the development sample and between January 1 st , 2008 and April 1 st , 2010 for the temporal validation sample.Patients are referred to the clinic from all of the Frenchspeaking counties of Switzerland, which includes urban and industrial city centres like Geneva or mountainous and more rural regions like Wallis.Switzerland is also a country with an important proportion of immigrant workers in all sectors of the economy (for details see www.bfs.admin.ch/bfs/portal/fr/index).

Participants
All patients, hospitalized for a rehabilitation program after an orthopaedic trauma, were eligible for this study if they had no severe traumatic brain injury at time of accident (Glasgow coma Scale #8), had no spinal cord injury, were capable of judgment, were not under legal custody and were not older than 62 years of age at the moment of hospitalization (considered as too old to have a reasonable chance to RTW).Most of the patients were blue collar workers and were injured after traffic, work or leisure accidents.Upper limb injuries constituted 33% of all accidents, back injuries 18%, pelvic and lower limb injuries 41% and multiple trauma 8%.Patients were sent to the rehabilitation clinic when they presented persistent pain and functional limitations incompatible with RTW (median: 9 months after the accident).The aim of the therapeutic program was to control the diagnosis and to take care of patients using an interdisciplinary approach (somatic, psychological, social and occupational) in order to reduce Restrictions in coping (0-3) 1.2 (0.9) 1.4 (0.9) 1 (0.9) 1.2 (0.8) 1.4 (0.8) 1.1 (0.8) Psychiatric dysfunction (0-3) 0.7 (0.8) 0.8 (0.9) 0.6 (0.8) 1 (0.9) 1.1 (0.9) 0.9 (0.9) Resistance to treatment (0-3) 0. pain and disabilities and improve chance of returning to work (usual or adapted to impairments).The average duration of stay was 5 weeks.

Sample Size
For assessment of statistical power in studies estimating predictor effects for binary event outcomes, the number of participants in the smallest group (i.e.RTW or non-RTW) determines the effective sample size.The usual rule of thumb is ''10 to 20 events needed per candidate predictor'' [39].In our study, we had 36 potential predictors.The proportion of patients not returning to work two years after discharge is about 0.5, therefore it was estimated that we would need 1400 patients, resulting in about 700 cases.This analysis was embedded in an ongoing cohort study with different research questions.In 2010, there were 1505 patients with follow-up data available, therefore we decided at this time-point to develop the model.The development model had 19 variables and we therefore would need 380 cases, i.e. 760 patients with follow-up data.In 2012, 819 patients had follow-up data and it was decided to validate the model.

Identification of Potential Predictors
In order to avoid selection bias during the development of the prognostic model, the choice of the potential prognostic predictors was made according to the following principles.Firstly, the variables should be obtainable independently from the patient's language fluency and health literacy [27,28].Secondly, the variables should be clearly defined and reproducible to enhance generalizability, avoiding the use of items that leave room for different interpretations [20].
The INTERMED is an observer rated and semi-structured interview which assess the patients' biopsychosocial complexity [32,34,51].It contains 20 items grouped in 4 domains (biological, psychological, social, health care system), with each one assessed over time (past, present, prognosis).Conducted by a trained nurse, the interview for the INTERMED takes about 20 minutes and has been used in our daily clinical practice since 2003.Each question is rated on a 4-point scale from 0 to 3. A total INTERMED score ranging from 0 to 60 is calculated, whereby a higher score means a higher biopsychosocial complexity.INTERMED has been compared with a variety of other validated instruments, such as the Medical Outcomes Study 36-Item Short-Form Health Survey, the Hospital Anxiety and Depression Scale, the pain VAS, and numerous others [32,34].It shows high inter-rater reliability and agreement [52].Predictive validity (for example health care needs, return to work, risk of persistent disability, and need of psychosocial interventions) was analysed in dozens of studies, using many different populations and settings, from emergency room [37] to rehabilitation [25,38], and it also exists in several languages (English, German, Dutch, French, Italian, Spanish, Japanese for instance) (for details see: http://www.intermedfoundation.org/).The INTERMED may be used as a continuous variable (from 0-60 points), but is also available with a cut-off score ($ 21 points) [53].For this research, each item of the INTERMED was regarded as a potential prognostic predictor.As this study started in 2004, the 5.1 version (January 2003) was used.

Data Collection
For the present analysis, the potential prognostic predictors were assessed within 3 days after hospitalization.All of these were prospectively recorded from the INTERMED interviews at admission and from the patient's electronic medical chart.In order to minimize selection bias, all eligible patients were included in the study.Data was assessed by a study nurse; predictors did not depend on the mother-tongue spoken and were available for all patients in the clinic as these predictors were routinely used.To reduce loss of follow-up, two reminders were sent to the patients.The rate of non-response was similar to other studies [54,55].To reduce the measurement bias, the INTERMED was completed following the recommendations (for details see: http://www.intermedfoundation.org/)and other potential predictors were either administrative data or VAS.Outcome Measure RTW was measured by a questionnaire 2 years after discharge.RTW was defined as return to the same or accommodated job, full time or part time, over the survey period [24].

Selection of Model Content (Model Derivation)
The model was developed with all consecutive patients staying in the clinic during the years 2004, 2005, 2006 and 2007.Candidate predictors included in the first development model are shown in Table 2.
Variable selection based on random forest.To select the best subset of predictive variables, we used a random forest classification model for the prediction of non-return to work, using the R package ''varSelRF'' [56,57].Random Forest is a method that determines a consensus prediction for each observation by averaging the results of many individual recursive partitioning tree models [58,59].A training set of size N ( = total sample size) is drawn from the original data using bootstrap with replacement.A classification tree is computed with this training data.We repeat that a large number of times (509000) and the final classification is the one that appears the most frequently.
When the training set is sampled, about one-third of original observations are left out.These are used to test the classification of the trees and get an error estimate [60].We can also get information about the importance of a given predictor by comparing this classification accuracy to what we get by randomly permuting the values of this predictor.Hence a high Mean Decrease Accuracy indicates high importance of the predictor.
The random forest approach has been shown to provide sets of predictors with good predictive value and to be robust against overfitting, which makes them especially useful for the evaluation of a large number of possible predictors and their potential interactions as well as their association with the outcome [61].Because standard random forest method is prone to favour continuous predictors, we used conditional random forest, as proposed by Strobl [62].
Because of the little amount of missing values we decided not to impute the missing values [63].

Model Performance
To evaluate the model performance we presented indices for discrimination and calibration.For discrimination, we calculated the area under the receiver operating characteristic (ROC) curve, as well as sensitivity, specificity, and positive and negative predictive values.For testing the calibration we used the Hosmer-Lemeshow test [64] and plotted the observed proportions of non-return to work against the predicted probabilities for groups defined by ranges (10%) of predicted risk as well as the slopes and intercepts [65].

Temporal External Validation
For temporal validation, we applied the model to all consecutive eligible patients included in the years 2008, 2009, and the beginning of 2010.For this validation, the coefficients and the intercept predicted in the development sample were used to predict the probabilities of not returning to work.We presented ROC-curves, calibration plots and decision curve as well as a table with sensitivity, specificity, positive and negative predictive values.

Decision Curve Analysis
We plotted decision curves to show the net benefit of classifying patients based on our models compared to classifying all patients as not returning to work or classifying all patients as returning to work [66].The y-axis denotes the net benefit in the units of true positives.The x-axis indicates the threshold probability at and above which one decides that the patients will not return to work.

Construction of the Prediction Score
Firstly, to obtain the most precise estimation of the coefficients, we recalculated them using both the development and the training sample for the clinical use.This approach is often chosen because it makes full use of the data resulting in narrower confidence intervals and more stable risk scores [67][68][69].Secondly, we used the coefficients from the logistic regression model to build a prediction score, which provides the predicted probability of not returning to work even when treated.The formula is: Probability Risk Score: = 1/[1+ exp(2 scoring function)], where the scoring function consists of the sum of all products of the coefficients and the values of the predictors.The formula is implemented into an excel-sheet, so that clinicians automatically receive the probability after entering the values of the predictors of a given patient.

Results
For the development and validation periods, from the years 2004 to 2010, a total of 3177 patients with orthopaedic trauma have been in the rehabilitation clinic.At 2 years, the non-RTW status was known for 2462 patients (77.5%).
For the development period 2004 to 2007, 2048 patients were eligible.Out of these patients, 1505 answered to the two year follow-up questionnaire (73.5%).
For these analyses, 1466 patients were available.Out of these 1466 patients, 1395 had complete data for the set of predictors included in the first model.See Figure 1.
For the validation period 2008 to 2010, we had 1129 patients of which 957 returned the two years follow-up questionnaire (84.8%).We had a sample size in the validation sample of 917 of whom 819 had a complete dataset.See Figure 1.
Missing values were below 2.5% for all variables.
The baseline characteristics of the development population (n = 1395) and the validation population (n = 819) are shown in Table 2.Both samples are similar with only small, clinically nonrelevant differences.For instance, 50.5% did not return to work in the development sample and 49.9% in the validation sample.

Responders versus Non-responders
In the development sample, patients not responding to the follow-up were on average 3.4 years younger (40 versus 43.4 years, p = 0.003), more often living alone (p,0.001) and having higher values in the social (p = 0.008) and biological domains of the INTERMED (p = 0.019).
In the validation sample the only difference between responders and non-responders was age: patients not responding to the followup questionnaire were 2 years younger (41 versus 43 years, p = 0.008).

Model Selection
Univariable and multivariable odds ratios for the predictors in the development sample are shown in Table 3.The random forest variable selection procedure yielded 19 variables, which were then used for the final prediction model in the development sample (shown in the last column in Table 3).

Model Performance
The discrimination of the reduced model after the random forest variable selection procedure was moderate (AUC 0.74; 95% CI 0.71 to 0.76) but nearly as good as the full model (AUC 0.75; 95% CI 0.72 to 0.78).In the validation sample, the discrimination of the reduced model was still sufficient with an AUC of 0.73 (95% CI 0.70 to 0.77).The calibration was good for the full model as well as the reduced model in the development and the validation sample, as indicated by the calibration plots, with p-values for the Hosmer-Lemeshow test indicating that there was no significant deviation between the observed from the predicted risk (see lower panel of Figure 2).The calibration can also be evaluated by the vertical confidence intervals in the lower panel of Figure 2: for the prediction of non-return to work, the confidence intervals of the observed probabilities (vertical black lines) covered the line of ideal calibration (diagonal grey line in the lower panel of Figure 2).

Predictive Values
The sensitivity, specificity, and positive and negative predictive values for different cut-off points were similar in the development and the validation sample (see Table 4).
In these samples, all patients received the traditional healthcare (usual occupational rehabilitation) which corresponds to using a cut-off of 1, meaning that everybody is considered as potentially returning to work.Using the predictive model would allow some  B).The y-Axis represents the net benefit, which is the probability of true positives minus the probability of false-positives weighted for the threshold probability.With threshold probability (or risk thresholds) we mean the threshold above which a patient is declared at risk to not return to work at two years.The dashed red curve shows net benefit of considering all patients as positive (i.e.classified as being not returning to work).The benefit of considering all patients as returning to work was set as reference (solid grey horizontal line).In the left Panel (A) we see that the net benefits for both models are quite similar.The Full Modell would show advantages if a threshold would be set between 15% to 82%.The right Panel (B) shows that that the net benefit in the temporal validation sample is only little lower than in the development sample.Clear benefits are seen from risks thresholds from about 20 to 75%.The net benefit is calculated as (proportion of true positives) -(proportion of false positives)*pt/(12pt), where pt is the threshold probability.doi:10.1371/journal.pone.0094268.g003 Table 5. Proportions of true-positives (TP), false-positives (FP), true-negatives (TN) and false-negatives (FN) given by the Reduced Model in the temporal validation sample, according to threshold of 0.5 (sample with 100 patients).patients to be classified as non-RTW and those would receive an adapted occupational treatment.The net benefit for patients classified as non-RTW [66] was quite similar in the validation sample compared to the development sample.Net benefit was present for threshold probabilities of around 20% to 75% (see Figure 3).In Figure 3, the net benefit at a threshold probability of 50% is 0.16.This corresponds to the difference between the proportion of true-positives (those correctly classified as non-RTW) and the proportion of false-positives (those classified as non-RTW that actually would RTW).
In Table 5 we illustrate with the threshold of 0.5 (our choice of preference) the proportion of patients correctly or wrongly classified.Compared to the current situation (i.e.threshold of 1), by using a threshold of 0.5, in a sample of 100 patients, we correctly withheld usual occupational rehabilitation from 36 patients (i.e.true positives).This comes at a cost: we falsely withheld usual occupational rehabilitation from 19 patients (i.e.false positives).This means that a more comprehensive assessment is needed using a second step.With this comprehensive assessment, most of the false positives will be re-allocated to the usual occupational rehabilitation.In other words, for clinical use, all candidates to occupational rehabilitation should be screened with the predictive model at entry, an inexpensive and fast procedure.Then, candidates who would be above the designated cut-off point (i.e.putative true positives) should have a comprehensive assessment for a few days to recover the false positives.These patients would be reallocated to the usual occupational rehabilitation while others (true positives) would benefit from an adapted occupational approach.

Scoring of the Prediction Model
For clinical use, the scoring sheet and the prediction formula is available as supporting information: see Reduced Model S1.

Discussion
Based on a prospective cohort of over 2000 patients, we developed and validated a simple predictive model (19 items) to estimate the probability of non-return to work after orthopaedic trauma.This model, which showed acceptable discriminative ability to assess the likelihood of non-return to work and good calibration (see Figure 2), can be applied to all patients requiring occupational rehabilitation independent of their language fluency and literacy.Consequently, unlike in most studies using questionnaires, this strategy will reduce selection bias observed in earlier studies [27] i.e. allow the assessment of all eligible patients including vulnerable patients with different languages and education backgrounds, like for instance immigrant workers.
This study has several strengths.To date, this is the first model available for patients suffering of persistent impairments and disabilities after orthopaedic trauma.The previous models were all reserved for the acute phase after orthopaedic trauma [3,[14][15][16].
In the acute and sub-acute phase, most of the patients will recover and those remaining with persistent disabilities will not have a similar risk profile of non-returning to work [23][24][25].Consequently, our predictive model may improve the decision-making process if occupational rehabilitation is needed.Further strengths of our study are the large sample size, the external temporal validation and the appropriate variable selection based on random forest.For instance, random forest has clear advantages over stepwise selection methods [71].In addition our model is constructed on the biopsychosocial framework, which adheres to the current recommendations [40,45].Furthermore, language fluency was not a barrier for the participation in this study.
Nevertheless, our study has also some limitations.Firstly, the potential predictors were selected ten years ago.Hence, our model may miss newer ''candidates''-predictors, for instance patient's subjective appraisal of injury severity, self-perceived disability, pain beliefs, and recovery and job expectations.However, the review of the current literature shows that the chosen predictors in the present study cover most of those cited in the recent literature [9,[11][12][13].For instance, ''living in deprived areas'', a predictor found important in the study of Kendrick [9] is close to the ''social vulnerability'' concept of the INTERMED.Another limitation to keep in mind is the fact that we only did a temporal validation and not a validation in a different setting in regard of the health system, the culture and the case-mix [63].However, this disadvantage may be reduced by the fact that our patients came from many different areas of Switzerland, with various cultural backgrounds.Moreover, 50% of our sample consists of immigrant workers.Our definition of RTW and the time point of its assessment may also be questionable: firstly, we used a subjective method (questionnaire).Yet there is no clear consensus on the best way to assess RTW [72].The risk exists to over or underestimate the RTW rate whichever method you use [73].Nevertheless, self-report indicators are recommended to capture a fuller extent of workers' experience [74].In further studies, it would be necessary to define when and how long people had RTW.Secondly, the 2 years follow-up may be too long or even too short to evaluate a successful RTW.Too long because within a time frame of two years much can happen independently of the patient's state at prediction.Too short because in this group of patients vocational reintegration and the insurance process may take longer.For instance, data of the Swiss Injury insurances suggest that it takes up to four years until the closure of the case (for details, see www.unfallstatistik.ch).In other words, further studies with different time frames to estimate RTW are also needed.Differences between non-responders may also bias the prevalence of RTW (lack of outcome data in 22.5% of our sample).It is hard to interpret these findings because differences may be influential in both directions (over or underestimation of the non-RTW rate).Nevertheless, this prevalence was quite close in both samples despite different non-responders characteristics.Our model has only moderate discriminative ability (AUC).However, predictive models have generally lower performance (AUC between 0.6 to 0.85) compared to diagnostic or explicative models (AUC .0.8) [19,63].Finally, our predictive model was developed and validated in a highly selected population which limits its generalizability.Further studies are needed with different time off work and access to occupational rehabilitation facilities.
Comparison with other predictive tools is limited.To our knowledge, there is no predictive model for a population with similar characteristics than ours.Predictive models during the acute phase after orthopaedic trauma are prominently based on injury severity [3,[14][15][16].However, the importance of psychosocial factors to predict RTW increases when we move away from the accident [9,13,40].The subjective perception of injury severity may also become more significant [24] than objective severity as measured by clinical tools.This is confirmed by the present study in which the severity of the accident was not included in the final model.When we compare our model with prediction models used in patients with neck or low back pain, a research domain very close to ours, we observe that their theoretical constructs follow the same biopsychosocial framework [75,76].We notice, however, that in low back pain the most helpful predictors of persistent disability are often issued from self-reported questionnaires [77].In a multicultural context, this approach requires the translation and cultural adaptation of several questionnaires, which is a costly and time consuming process [78].For this reason, we used another strategy and our predictive model shows comparable discriminative abilities than others for neck and low back pain [76,79,80].The INTERMED, the essential source of our predictive model (12 items in the final model), is already available in several language (see Method section) and this is a worthwhile advantage.Moreover the remaining 7 items are all easy to translate into any languages.
Our study has some implications for practice and research.First, our model provides a short patient's bedside tool useful to estimate the likelihood of non-return to work.Considering that a complete INTERMED interview (20 items) takes no more than 20 minutes [37], we assume that our model can be filled out in a similar time or even less.Only 12 INTERMED items have been retained in our model.The other 7 items are 5 basic medical data, easily accessible from the patient's chart and 2 VAS evaluated by the patient.Clear instructions on how investigators should answer to the different items exist and the predictive formula may be easily programmed on electronic devices (see supporting information, Reduced Model S1).Currently, it has become customary in industrialized countries to address patients with persistent disabilities to an interdisciplinary occupational rehabilitation program [17].Nevertheless, in our setting, this approach is unsuccessful for 50% of the patients (patients do not return to work despite vocational rehabilitation).Our model may allow a reduction of the number of unsuccessful usual traditional rehabilitations.The benefit may be to save money, but most importantly to try alternative approaches for these patients.In this way, our model may also allow to define groups of patients with similar risks of non-RTW profiles.This might help to improve the design of randomized controlled trials to test alternative interventions for patients with high risk of non-RTW.However, our model also needs external validations studies in different settings (case-mix, insurance environment etc.) and impact studies on clinical practice [63].On the other hand, the discriminative ability could probably be improved by introducing simple questions on patients' jobs expectations [23,72,81].

Conclusion
This validated prediction model allows the estimation of the probability of non-return to work for patients requiring occupational rehabilitation after orthopaedic trauma.This model, the Wallis Occupational Rehabilitation RisK (WORRK) model, presents only 19 items easily assessed in a clinical setting.It has moderate discriminative ability, adequate calibration, is useful for all kinds of trauma and is applicable to vulnerable populations like immigrant workers.This makes this model informative for physicians and multidisciplinary teams managing such patients and may facilitate research in this domain by enabling the study of patients with similar risk profiles.

Figure 2 .
Figure 2. Receiver Operating Characteristic Curves (upper panel) and calibration plots (lower panel).Receiver operating characteristic curves with areas under the curves (upper panel A-C) and calibration plots (lower panel, D-F).The leftmost column is from the full model in the development sample, the middle column shows the reduced model in the development sample and the right column shows the temporal external validation of the reduced model.AUC = area under the curve.N = total number of participants with complete data for the variables in the model.doi:10.1371/journal.pone.0094268.g002

Figure 3 .
Figure 3. Decision curve analysis.Decision curve analysis of the Full Model (dashed line black line) and Reduced Model (blue solid line) in the development sample (Panel A) and the Reduced Model in the temporal validation sample (PanelB).The y-Axis represents the net benefit, which is the probability of true positives minus the probability of false-positives weighted for the threshold probability.With threshold probability (or risk thresholds) we mean the threshold above which a patient is declared at risk to not return to work at two years.The dashed red curve shows net benefit of considering all patients as positive (i.e.classified as being not returning to work).The benefit of considering all patients as returning to work was set as reference (solid grey horizontal line).In the left Panel (A) we see that the net benefits for both models are quite similar.The Full Modell would show advantages if a threshold would be set between 15% to 82%.The right Panel (B) shows that that the net benefit in the temporal validation sample is only little lower than in the development sample.Clear benefits are seen from risks thresholds from about 20 to 75%.The net benefit is calculated as (proportion of true positives) -(proportion of false positives)*pt/(12pt), where pt is the threshold probability.doi:10.1371/journal.pone.0094268.g003

Table 1 .
Summary of the domains assessed with the INTERMED.

Table 2 .
Characteristics of the development and validation study population overall and by return to work status.

Table 3 .
Non-return to work: Odds ratios for the univariable, multivariable and the reduced model after random forest selection process.

Table 4 .
Comparison predictive Values in the development and the validation sample.Compares diagnostic properties in the development sample with the validation sample.Threshold = Chosen cut-off for the dichotomizing in test negatives (i.e. return to work, below thresholds; non return to work, equal or above threshold).doi:10.1371/journal.pone.0094268.t004