Using the World Health Organization Disability Assessment Schedule 2.0 to assess disability in veterans with posttraumatic stress disorder

The introduction of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) was accompanied by the elimination of the Global Assessment of Functioning (GAF) scale, which was previously used to assess functioning. Although the World Health Organization Disability Assessment Schedule, Version 2.0 (WHODAS 2.0) was offered as a measure for further study, widespread adoption of the WHODAS 2.0 has yet to occur. The lack of a standardized instrument for assessing posttraumatic stress disorder (PTSD)-related disability has important implications for disability compensation. Accordingly, this study was designed to determine and codify the utility of the WHODAS 2.0 for assessing PTSD-related disability. Veterans from several VA medical centers (N = 1109) were included. We examined PTSD using several definitions and modalities and considered results by gender and age. Across definitions and modalities, veterans with PTSD reported significantly greater WHODAS 2.0 total (large effects; all ts > 6.00; all ps < .01; all Cohen’s ds > 1.03) and subscale (medium-to-large effects; all ts > 2.29; all ps < .05; all Cohen’s ds > .39) scores than those without PTSD. WHODAS 2.0 scores did not vary by gender; however, younger veterans reported less disability than older veterans (small effects; all Fs > 4.30; all ps < .05; all η2s < .05). We identified 32 as the optimally efficient cutoff score for discriminating veterans with and without PTSD-related disability, although this varied somewhat by age and gender. Findings support the utility of the WHODAS 2.0 in assessing PTSD-related disability.


Introduction
The introduction of the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders   [1] was accompanied by the elimination of the Global Assessment of Functioning (GAF) scale, a clinician rating of global psychological, social, and occupational functioning. Although the DSM-5 Task Force recommended the World Health Organization

Participants
Participants included 1109 veterans from four independent samples (see Table 1 for demographic characteristics). Sample 1 included veterans recruited through the Boston VA Healthcare System for a study designed to validate the PTSD Checklist for DSM-5 [14] and the Clinician Administered PTSD Scale for DSM-5 [15]. Inclusion criteria included veteran status, aged 18 or older, literacy in English, and endorsement of at least one traumatic event and at least one PTSD symptom. As reported elsewhere [16] 142 participants completed the study. Data from six participants were dropped from current analyses because they did not complete the WHODAS 2.0, leaving a total sample of 136 participants. Sample 2 included veterans recruited through the Boston VA Healthcare System for one phase of a study to develop and validate the Inventory of Psychosocial Functioning (IPF) [17]. Included participants were aged 18 or older and were literate in English. Of the 285 participants who completed this phase of the larger study, data from 10 were dropped from current analyses because they had not completed the WHODAS 2.0. This left a total of 275 participants.
Sample 3 included veterans recruited from both the Boston VA Healthcare System and the VA Pacific Islands Healthcare System; these veterans were recruited for another phase of a study to develop and validate the IPF [17].There was no overlap among participants in Samples 2 and 3. Like Sample 2, inclusion criteria included veteran status, being aged 18 or older, and literacy in English. Of the 394 participants who completed this phase of the larger study, four were dropped from current analyses due to missing data on the WHODAS 2.0. This left a final sample of 390 participants.
Sample 4 included veterans recruited from the Central Texas Veterans Healthcare System for a study examining predictors of post-deployment mental health and functional outcomes. Inclusion criteria included veteran status, having deployed in support of the post-9/11 wars in Iraq and Afghanistan, and absence of a bipolar and/or psychotic disorder diagnosis. In addition, although receiving treatment was not a condition of eligibility, those who were receiving treatment were required to be stable in treatment to ensure that symptoms and impairment were not due to recently starting or stopping treatment. Recruitment involved over-sampling for women veterans and those with probable PTSD. Of the 309 veterans who completed the study, one was excluded because of missing data on the WHODAS 2.0. This left a final sample of 308 participants.

Measures
WHODAS 2.0. The WHODAS 2.0 [2] is a 36-item self-report measure that assesses disability related to health conditions in the past 30 days. The measure assesses disability across six domains: cognition, mobility, self-care, getting along with others, life activities, and participation. Participants respond to each item on a 5-point scale from 0 (No Difficulty) to 4 (Extreme Difficulty/Cannot Do). In the current study, we used the simple scoring method, in which items are summed to total scores for each domain, with higher scores indicating greater disability. A global functional disability score was calculated by summing all 36 items. The WHODAS 2.0 has demonstrated excellent reliability and validity [2,18]. Across study samples, Cronbach's α for the overall score of global functional disability was .97, with the domain scale scores ranging from .83 (self-care) to .94 (life activities).
Clinician administered PTSD scale for DSM-IV (CAPS-IV). The CAPS-IV [19] is a structured diagnostic interview that assesses DSM-IV PTSD symptom severity and diagnosis. For each symptom, a clinician rates two dimensions, frequency and intensity, on separate 5-point scales. The CAPS-IV consistently demonstrates excellent psychometric properties [20]. In the current study, the CAPS-IV was used to assess DSM-IV PTSD symptoms and PTSD diagnosis in Samples 2 and 4. In Sample 2, the CAPS-IV was administered by doctorallevel clinicians who participated in regular reliability meetings. Interrater reliability was excellent (κ = .78). For Sample 4, interviews were conducted by masters-and doctoral-level clinicians, and weekly diagnostic review meetings supervised by doctoral-level staff were held to reach diagnostic consensus. The Frequency � 1/Intensity � 2 (F1/I2) rule was used to calculate DSM-IV PTSD diagnostic status. According to this rule, a PTSD symptom is present if the Frequency score is � 1 and the Intensity score is � 2; a PTSD diagnosis is derived by first dichotomizing the items, and then following the DSM-IV algorithm for PTSD (one reexperiencing symptom, three avoidance and numbing symptoms, and two hyperarousal symptoms) [21]. In addition, two of the items measuring social and occupational impairment were used to create an index of interviewer-rated PTSD-related psychosocial functioning. Participants with scores of 2 or above on either item were coded as having PTSD-related impairment. In the current study, of the 465 participants who completed the CAPS-IV, 31.2% (n = 145) met criteria for the PTSD diagnosis, and 57.4% (n = 299) met criteria for PTSD-related psychosocial impairment.
CAPS-5. The CAPS-5 [15] is a structured diagnostic interview that assesses DSM-5 PTSD symptom severity and diagnosis. Clinicians rate each symptom on a 5-point severity scale, ranging from 0 (Absent) to 4 (Extreme/incapacitating). Initial examination of the CAPS-5 suggests that it retains the same strong psychometric properties as the CAPS-IV [22]. In the current study, the CAPS-5 was used to assess DSM-5 PTSD symptoms and PTSD diagnosis in Sample 1. The CAPS-5 was administered by masters-and doctoral-level clinicians who participated in regular reliability meetings. Interrater reliability was excellent (κ = .78). We used the CAPS-5 to calculate PTSD diagnostic status according to DSM-5 using the SEV2/26 rule [22]. According to this rule, a PTSD symptom is present if the Severity score is � 2; a PTSD diagnosis is derived by first dichotomizing the items, and then following the DSM-5 algorithm for PTSD (one intrusion symptom, one avoidance symptom, two negative alteration in cognition or mood symptoms, and two arousal and reactivity symptoms). In addition, we used the CAPS-5 to create an index of interview-rated, PTSD-related psychosocial functional impairment. Participants with a score of 2 or above on the item assessing social functional impairment and/or the item assessing occupational functional impairment were coded as having PTSD-related impairment. In the current study, of the 131 participants who completed the CAPS-5, 53.4% (n = 70) met criteria for the PTSD diagnosis, and 68.9% (n = 93) met criteria for PTSD-related psychosocial impairment.
PTSD checklist-civilian version (PCL-C). The PCL-C [23] is a 17-item self-report measure designed to assess symptoms of PTSD according to DSM-IV. Respondents rate each item on a 5-point scale ranging from 1 (Not at all) to 5 (Extremely). A PCL-C total score is then calculated by summing each of the 17 items, with higher scores indicating higher levels of PTSD symptom severity. PCL-C scores consistently demonstrate strong psychometric properties [24]. In the current study, the PCL-C was completed by veterans in Samples 1, 3, and 4. Cronbach's α was .96 for the combined samples. We used the PCL-C to determine probable PTSD diagnostic status according to DSM-IV using a cutoff score of 44 [25]. In the current study, of the 786 participants who completed the PCL-C, 53.2% (n = 418) met criteria for probable PTSD.
PCL-5. The PCL-5 [14] is a 20-item self-report measure designed to assess PTSD symptoms according to DSM-5. Respondents rate each item on a 5-point scale ranging from 0 (Not at all) to 4 (Extremely). PCL-5 total scores are calculated by summing the 20 items, with higher scores indicating higher levels of PTSD symptom severity. Like the PCL-C, the PCL-5 has demonstrated excellent psychometric properties [16]. In the current study, Samples 1 and 3 completed the PCL-5. Cronbach's α was .96 for the combined samples. We used the PCL-5 to determine probable PTSD diagnostic status according to DSM-5 using a cutoff score of 33 [16]. In the current study, of the 473 participants who completed the PCL-5, 58.6% (n = 277) met criteria for probable PTSD.
Demographics. All participants provided information on their age, sex, education, ethnicity, race, marital status, and military history via self-report.

Procedure
As reported elsewhere [16] participants in Sample 1 were recruited from the Boston VA Healthcare System. All participants were listed in a large database of veterans who had previously consented to be contacted regarding research participation. Participants completed a battery of self-report questionnaires and, upon completion, were assessed with a clinical interview. For each participant, the WHODAS 2.0 was administered fifth (of 12 self-report measures), the PCL-C was administered seventh, and the PCL-5 was administered eighth. At study completion, participants were debriefed and compensated monetarily. Consistent with APA ethical standards, IRB approval was obtained from the VA Boston Healthcare System IRB for data collection, and all participants provided written informed consent prior to participation.
Participants in Sample 2 were recruited from a VA hospital using both fliers and from a large database of veterans who had previously indicated that they would be interested in participating in research. Participants first completed a battery of self-report questionnaires and then participated in several diagnostic interviews. For each participant, the WHODAS 2.0 was administered fifth (of 11 self-report measures). At the end of participation, veterans were debriefed and compensated for their time [17]. Consistent with APA ethical standards, IRB approval was obtained from the VA Boston Healthcare System IRB for data collection, and all participants provided written informed consent prior to participation.
Participants in Sample 3 were recruited from two VA hospitals from both fliers and from the research database discussed above. Participants completed a self-report questionnaire battery. For each participant, the WHODAS 2.0 was administered sixth (of 23 self-report measures), the PCL-C was administered tenth, and the PCL-5 was administered eleventh. After participation, they were debriefed and compensated [17]. Consistent with APA ethical standards, IRB approval was obtained from both the VA Boston Healthcare System IRB and the VA Pacific Islands Healthcare System IRB for data collection, and all participants provided written informed consent prior to participation.
Sample 4 participants were recruited through posted announcements and advertisements at a VA medical center, as well as through direct mailings to oversample for female veterans and veterans with mental health diagnoses. Participants completed clinical interviews and then were administered a battery of self-report measures. Five separate packets of the 30 self-report measures were randomized across participants, with each varying the order of measure presentation. The placement of the WHODAS 2.0 ranged from tenth to twenty sixth, whereas the placement of the PCL-C ranged from fifth to twenty third. Therefore, in some cases the WHO-DAS 2.0 was administered prior to the PCL-C; in other cases, this order was reversed. Upon completion of the study, all participants were compensated for participation. Consistent with APA ethical standards, IRB approval was obtained from the Central Texas VA Healthcare System IRB for data collection, and all participants provided written informed consent prior to participation.

Data analysis plan
Except for signal detection analyses, all analyses were conducted using SPSS version 25. First, we calculated means and standard deviations for the WHODAS 2.0 total score and each of the six subscales scores. Next, we conducted t-tests to determine if participants with and without PTSD demonstrated significantly different WHODAS 2.0 scores. We also conducted t-tests and ANOVAs to determine whether WHODAS 2.0 scores differed as a function of gender and age. Age was examined as a categorical variable that represented three levels: younger veterans (aged 18--34), mid-aged veterans (aged 35-59), and older veterans (aged 60 and older). This categorization of age is consistent with other studies that have examined the relation between age and functioning; these studies have found that some impairment differences may be evident beginning at approximately 35 years of age [26] and again at 60 years of age [27]. Effect sizes for t-tests were evaluated using Cohen's d, such that small = 0.2, medium = 0.5, and large = 0.8, and effect sizes for ANOVAs were evaluated using η 2 , where small = 0.01, medium = 0.06, and large = 0.14 [28]. We used a Holm [29] correction for the t-tests, overall F-tests, and each set of post hoc tests to simultaneously maintain power and protect against Type I error given the seven omnibus tests (one for the WHODAS 2.0 total score and each of the six subscale scores), and for the ANOVAs, three post hoc tests, conducted in each set of analyses.
Signal detection analyses were conducted to identify cutoff scores for the overall sample and for participant groups stratified by gender and age. We examined diagnostic accuracy for each cutoff score on the WHODAS 2.0 with weighted κ coefficients as measures of test quality, including quality of sensitivity (κ [1]), specificity (κ[0]), and efficiency (κ[.5]). Unlike commonly-reported measures of test performance (e.g., sensitivity, specificity, and efficiency), weighted κ coefficients are calibrated for chance agreement between test and diagnosis [30]. Guidelines developed for judging levels of clinical significance suggest that κ � 0.40 is poor, � 0.41 and < 0.60 is fair, � 0.60 and < 0.75 is good, and � 0.75 is excellent [31].
Our analyses guided us to identify the optimally efficient cutoff (i.e., the WHODAS 2.0 cutoff with the highest κ[.5]) for each group. We opted to examine optimally efficient cutoffs because they minimize diagnostic errors [30]. To do so, we first calculated receiver operating characteristic (ROC) curves using the WHODAS 2.0 as the test variable and the CAPS impairment index as the criterion variable, examined the sensitivity and specificity for each value, and identified 5-10 possible WHODAS 2.0 total cutoff scores that represented the best balance of sensitivity and specificity. Next, we dichotomized the WHODAS 2.0 on each of these potential cutoff scores such that all scores at or above the potential cutoff score were coded as "1" and all values below the potential cutoff score were coded as "0." We then ran crosstabs with the dichotomized WHODAS 2.0 variable as the test variable and the CAPS impairment index as the criterion variable. Because no participants were administered both the CAPS-IV and the CAPS-5, we combined the impairment indices from the two measures into a single variable. This allowed us to conduct signal detection analyses with all participants who had complete data on either CAPS version simultaneously, and to determine a cutoff score that was agnostic to the version of the DSM utilized. Using DAG_STAT [32], we calculated measures of test performance and test quality for each of the cutoff scores.

Results
As can be seen in Table 1, the mean WHODAS 2.0 total score across the four samples was 38.18 (SD = 25.24, range = 0 to 143). Across the four samples, mean WHODAS 2.0 subscale scores ranged from 1.95 (SD = 2.76; self-care) to 10.60 (SD = 7.27; participation). Further, regardless of the definition or the assessment instrument used, participants with PTSD reported significantly more functional disability than those without PTSD on the WHODAS 2.0 total scale and on each of the subscales, with medium to large effect sizes (all ts > 2.20, all ps < .05, all Cohen's ds > .39; see Table 2).

Association between WHODAS scores and both gender and age
Results indicated that men and women did not significantly differ on the WHODAS 2.0 total scale score or on any of the subscale scores (all ts < j1.96j; all ps > .05; see Table 2); however, significant differences were observed between the three age groups on the WHODAS 2.0 total scale score and on the mobility, self-care, and participation subscale scores (all Fs > 7.10; all ps < .01), with younger veterans having less disability. Simple effects tests revealed that for all significant omnibus tests, younger participants were less disabled than older participants, albeit with small effect sizes (all η 2 s < .05; see Table 2).

Signal detection analyses
We first determined the optimal cutoff scores for identifying PTSD-related disability among all participants who had been administered the CAPS overall (n = 656) and by gender.  Table 4).

Discussion
This is the first study to examine the utility of the WHODAS 2.0 across PTSD definitions and modalities. Regardless of the DSM definition (DSM-IV versus DSM-5) or the assessment modality (interview versus self-report) used, WHODAS 2.0 scores differed significantly between veterans with and without PTSD, with medium to large effects. These findings are consistent with a large body of research documenting that the association between PTSD and impairment is typically characterized by medium to large effects [17,[33][34]. We also examined the effect of gender and age on WHODAS 2.0 scores. Consistent with past research [35], men and women did not differ on the WHODAS 2.0 total score or any of the subscale scores; however, as might be anticipated, younger veterans were significantly less impaired than both midaged and older veterans on the WHODAS 2.0 total score and four of the six subscale scores. Signal detection analyses indicated that the optimally efficient cutoff score on the WHO-DAS 2.0 for separating veterans with and without PTSD-related impairment was 32. Our cutoff score of 32 differs substantially from the cutoff score of 40 found by Marx and his colleagues [7]. This is likely because Marx et al. used a small, disability-seeking sample, whereas we employed a much larger, more diverse sample of veterans. Further, whereas Marx et al. used the interview version of the WHODAS 2.0, we used the self-report version. Finally, unlike Marx et al., we calculated measures of test quality, rather than measures of test performance, to determine the optimal cutoff score. Our use of test quality measures ensures that our findings are not unduly influenced by chance.
In response to Konecky and colleagues' [13] concern that the utility of the WHODAS 2.0 might be undercut by varying reliability across demographic groups, we also examined whether different cutoff scores would be appropriate for different demographic subgroups. Findings suggested that the cutoff score of 32 generally appeared to be appropriate across men and women, and across younger, mid-aged, and older veterans (the optimally efficient cutoff score for all groups was 31 or 32, with κ[.5]s ranging from .38-.49). However, the importance of considering group membership was evident when age was considered within the context of gender. Although the best cutoff score for men regardless of age group was 31-32, for women, this was not the case. For younger women, cutoff scores of 28-34 demonstrated the same psychometric properties, and for mid-aged women, 34 was the optimally efficient cutoff score. This means that, although a cutoff score of 32 will generally capture probable PTSD-related disability across age and gender, it may produce additional false negatives among mid-aged men, both false positives and false negatives among younger women, and additional false positives among mid-aged women.
Relatedly, in this study, we chose our cutoff score by identifying the WHODAS 2.0 score with the greatest optimal efficiency (κ[.5]). Optimally efficient cutoff scores are recommended for differential diagnosis, because they minimize diagnostic errors by maximizing the number of agreements between test and diagnosis [30]. However, there may be settings where optimally sensitive or specific cutoff scores may be better suited. Scores with the greatest optimal sensitivity (κ [1]) are more lenient, identifying more individuals as having positive tests (i.e., they produce more false positives). For this reason, optimally sensitive cutoff scores are ideal for screening [30]. In contrast, cutoff scores with the greatest optimal specificity (κ[0]) are more stringent, and are ideal for confirming a diagnosis (i.e., they produce more false negatives) [30]. In our analyses, choosing an optimally efficient versus an optimally sensitive cutoff score would generally not change the recommended cutoff score. This was also the case for an optimally specific cutoff score across women and older men. However, if an optimally specific (rather than optimally efficient or sensitive) cutoff score was chosen for the total score and for younger and mid-aged men, the cutoff score would be higher. Similar to considerations discussed previously, this decision further highlights the importance of considering the purpose of the cutoff score to guide usage. The differences in cutoff scores across both certain demographic groups and optimally efficient, sensitive, and specific scores highlight the importance of considering context when choosing a cutoff score.
Taken together, our findings begin to answer APA's call for additional research on the WHODAS 2.0 regarding its clinical utility. Our results suggest that the WHODAS 2.0 does show appropriate sensitivity to PTSD diagnostic status, indicating that it is likely capturing the impairment associated with the disorder. Further, although additional work is needed, the establishment of a cutoff score is the first step towards allowing clinicians and researchers to determine if different interventions cause either reliable or clinically significant change in PTSD-related impairment. Finally, our finding that cutoff scores may differ importantly across demographic subgroups highlights the necessity of a precision-based medicine approach; use of the WHODAS 2.0 may help clinicians and researchers better conceptualize how different groups benefit from different interventions, above and beyond measures of symptom severity.
The current findings also have important clinical implications. The development of a cutoff score that is representative of interviewer-rated PTSD-related disability is essential for establishing clinically significant impairment when the use of a clinical interview is not feasible (e.g., during disability compensation examinations). Importantly, the specification in the PTSD diagnosis requiring that symptoms cause clinically significant distress or impairment was intended to set the threshold for diagnosing the disorder and therefore avoid overpathologizing individuals who are not bothered or impaired by symptoms [3]. Without a cutoff score by which to operationalize this impairment, this specification cannot be assessed. Our findings will therefore allow disability examiners to use the WHODAS 2.0 in concert with a measure of PTSD symptom severity (e.g., the PCL) to establish whether individuals have clinically significant PTSD-related disability in addition to PTSD symptom levels consistent with the PTSD diagnosis. Having an established cutoff score will also be beneficial in clinical settings; it will allow for the identification of individuals with subthreshold PTSD who are experiencing clinically significant impairment so that they can be referred for appropriate care.
Strengths of the study include the use of a large sample collected across several VA medical centers that oversampled for women. Further, generalizability was increased by employing both self-report and clinical interview measures that assessed both DSM-IV and DSM-5 PTSD criteria. Nonetheless, the study is not without limitations. First, due to the absence of an existing stand-alone interview designed to assess for PTSD-related impairment, we used two impairment items on the CAPS that were not created for this purpose. Second, although we oversampled for women, our sample of women (particularly older women) was quite small. This small sample size prohibited us from determining a cutoff score for older women. Relatedly, the κ[.5]s for women were consistently weaker than those for men; whereas [.5]s for women fell in the "poor" range, for men, they were in the "fair" range. These lower kappas among women may be due to the smaller sample size. Consistent with this, the largest group of women (mid-aged) had the highest kappas. Therefore, replication with a larger sample of women is needed. Finally, our sample included veterans who were seeking care at a VA. Thus, the degree to which the present findings generalize to other veteran or civilian populations is unknown.
Despite these limitations, this study provides additional evidence that the WHODAS 2.0 can be used to differentiate veterans with and without PTSD-related impairment. By establishing a cutoff score of 32, the study provides a threshold for interpreting WHODAS 2.0 scores in relation to the PTSD criterion of clinically significant impairment. Although this cutoff score can generally be used across groups, more research is needed to ensure appropriate cutoff scores for different groups, particularly women.