Validation of a Computerized Adaptive Test Suicide Scale (CAT-SS) among United States Military Veterans

To validate the Computerized Adaptive Test Suicide Scale (CAT-SS), Veterans completed measures at baseline (n = 305), and 6- (n = 249), and 12-months (n = 185), including the CAT-SS (median items 11, duration of administration 107 seconds) and the Columbia-Suicide Severity Rating Scale (C-SSRS). Logistic regression was used to relate CAT-SS scores (baseline) to C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined). A mixed-effects logistic regression model was used to evaluate the relationship between the lagged CAT-SS scores and outcomes (6- and 12-months). The baseline CAT-SS demonstrated predictive accuracy for all outcomes at 6-months, and similar results were found for baseline and all outcomes at and through 12-months. Longitudinal analysis revealed for every 10-point change in the CAT-SS there was a 50–77% increase in the likelihood of suicide-related outcomes. The CAT-SS demonstrated added value when compared to current suicide risk prediction practices.


Introduction
In the United States, rates of suicide have been increasing among military and civilian cohorts [1,2]. According to work by Ahmedani et al. [3], almost 30% of individuals who died by suicide had a healthcare visit in the week prior to their death. Recognizing  Alert, which recommended that universal suicide risk screening be implemented [4]. Ideally, such efforts would facilitate identification of those with occult risk (individuals who may disclose suicidal thoughts and behaviors only if they are directly asked) who may not be engaged in mental health treatment [5]. Nonetheless, options and evidence regarding tools which can be used to facilitate universal risk screening remain limited [4].
As screening for depression frequently occurs in primary care settings, often using the Patient Health Questionniare-9 (PHQ-9) [6], efforts to evaluate the utility of the PHQ-9 as a suicide risk screener have been undertaken. However, likely related to the measure only containing one item (item 9) specifically focused on suicidal ideation ("bothered by thoughts of being dead or of hurting yourself in some way"), as well as the reality that a sizable number of individuals' risk for suicide is related to factors other than depression (e.g., chronic pain, anxiety), results have been mixed. In specific, data regarding psychometric properties (e.g., positive predictive value) have been less than ideal [6][7][8]. Moreover, results from most rapid screeners like the PHQ-9 item 9 [6] often do not provide the clinician with information regarding risk severity or magnitude [7].
In addition, many suicide risk screening measures (e.g., the Columbia Suicide Rating Scale (C-SSRS)-Screener) [9] include items solely focused on suicidal ideation and behavior; thereby limiting the ability to measure "the full spectrum of suicidal symptomatology" [7; pp. 1376]. Ideally, suicide risk screening approaches would incorporate personalized items associated with a range of risk factors. Tailoring screening measures while maintaining psychometric properties requires implementation of novel approaches such as computerized adaptive testing (CAT) based on unidimensional or multidimensional item response theory (M/IRT). Traditional mental health measures are based on classical test theory, where all respondents receive all items and which are equally weighted in terms of deriving the test score, which is the often the summation of the individual item scores, rated either dichotomously or as polytomous Likert scale items. In contrast, IRT-based CAT uses unidimensional or multidimensional item response theory (MIRT) to pre-calibrate a large "bank" of symptom-items, that are then adaptively selected to match the severity of the person's disorder, which is adaptively estimated from the responses to prior items administered [7; pp. 1377]. As a result, different items are administered to different respondents, targeted to their level of severity on the underlying construct of interest (in our case suicide risk). For further information regarding MIRT-based CAT see Gibbons et al., 2008 [10] and Gibbons et al., 2016 [11].
Thus, Gibbons and colleagues developed and conducted an initial validation study on the Computerized Adaptive Test-Suicide Scale (CAT-SS) [7]. Using data from individuals receiving outpatient psychiatric treatment, the team was able to calibrate the CAT-SS, and demonstrate that the CAT-SS measured suicide risk severity using a mean of 10 items, in under two minutes. Moreover, initial validity was demonstrated comparing CAT-SS and C-SSRS structured clinical interview results among those seeking care in two non-Veterans Affairs emergency departments (University of Chicago and University of Massachusetts). Contrasting the CAT-SS high-risk group to the no-risk group a sensitivity of 1.0 and specificity of 0.92 were found for the C-SSRS active ideation category. Per the authors, additional prospective validation efforts, including prediction of future suicidal events, were warranted. Towards this end, members of this team conducted a longitudinal study among Veterans eligible for Veterans Health Administration (VHA) care to validate the CAT-SS self-report measure in terms of its ability to predict future suicide events based on repeated C-SSRS clinical interviews at 6-months and 12-months following the baseline CAT-SS assessment.

Participants
This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human participants were approved by the Colorado Multiple Institutional Review Board (COMIRB). Participants (n = 305) were recruited from a mountain state metropolitan VA health care system between April 2017 and February 2019. Recruitment strategies included posting flyers at local facilities, contacting Veterans who had participated in previous research or who indicated interest in participating in research, and encouraging providers to tell patients about the study. Veterans were eligible if they were between the ages of 18 and 89 and able to provide written informed consent, which was obtained. The number of veterans who completed measures at each timepoint is as follows: baseline, n = 305; 6-month follow-up, n = 249; 12-month follow-up, n = 185.

Measures
Computerized Adaptive Test-Suicide Scale (CAT-SS) [7] � is an adaptive measure, comprised of 111-items, which dimensionally measures suicide risk severity on a 100-point scale with 5 points of precision. The scores are also thresholded to yield categories of low, moderate, and high risk.
Structured Clinical Interview for DSM-5 Disorders (SCID-5) Research Version [12] is a reliable and valid semi-structured interview used to diagnose Axis I psychiatric disorders in clinical and research settings. The SCID-5 was used to determine current presence of the following disorders: Bipolar I and II; Major Depressive; Alcohol Use; Substance Use, Generalized Anxiety; and, Sleep. The trauma/PTSD L Module of the Structured Clinical Interview for SCID-5 [12] was used to assess Criterion A events. If a Criterion A event and at least one current symptom was endorsed, the Clinician-Administered PTSD Scale for DSM-5 CAPS-5 was administered. The CAPS-5 is the gold standard for assessing PTSD, and was used to determine current PTSD diagnosis [13].
Rocky Mountain MIRECC Demographic Questionnaire was used to gather information on topics such as participant age, gender, race/ethnicity, education, period of military service, and combat exposure.
� Measures administered at baseline, and 6-and 12-month follow-up appointments.

Procedures
Data were collected at three timepoints (baseline, 6-and 12-month follow-up). After confirming eligibility, Veterans were invited to an in-person baseline study visit. Informed consent was obtained prior to administration of clinical interviews listed above, self-report measures (not included in this study), and the CAT-SS. Study team members were clinically trained to administer the measures and interview schedules were reviewed by licensed clinicians.
To facilitate retention, participants were re-contacted at approximately 6 months post the baseline study visit and offered an in-person or telephone visit. During this visit, the CAT-SS was re-administered. In addition, reminder letters to invite completion of the 12-month follow-up were sent 1-3 months prior to their 12-month window to promote retention. The final in-person study visit was conducted approximately 12 months following the baseline assessment, and the CAT-SS was again re-administered. Participants were compensated for all study visits. Two Veterans who had incomplete data at the 6-month visit and one at the 12month visit were removed from analyses. Reasons for attrition were not collected, however, Veterans were invited to complete the 12-month follow-up regardless of their completion of their 6-month follow-up. The final sample size for analysis was n = 265.

Statistical analyses
Logistic regression was used to relate the CAT-SS scores at baseline to the C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined) at 6 months, and the CAT-SS scores at baseline and 6-months to the outcomes at 12-months, and all events between baseline and 12-months. From the logistic regression model, we generated a receiver operating characteristic (ROC) curve and computed the area under the ROC curve (AUC). We also examined the unique contribution of the CAT-SS in predicting suicide-related outcomes over and above what has traditionally been considered a robust predictor, a suicide attempt within the past year. To test this, logistic regression models with: (a) previous suicide attempt in the past year; (b) the CAT-SS; and, (c) previous suicide attempt in the past year and the CAT-SS were fitted to these data and the AUCs statistically compared.
To study longitudinal trends in C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined), a mixed-effects logistic regression model was used to perform a longitudinal analysis of the relationship between the lagged CAT-SS scores and suicide-related outcomes at 6 and 12 months (i.e., CAT-SS score at baseline predicting suicide-related outcomes at month 6 and CAT-SS at month 6 predicting suicide events at 12 months). CAT-SS scores were divided by 10 so that the odds ratios were interpretable as the relationship between a 10-point change in CAT-SS (on a 100-point scale) and the likelihood of a suicide-related outcome. Separate analyses were conducted for each outcome, with and without adjustment of suicide attempt in the past year.
This study was powered to estimate an AUC of 0.8 with a 95% confidence interval of plus or minus 5%. Assuming an event rate of 10%, n = 250 subjects at the 6-month follow-up were required. A total of n = 247 subjects completed the CAT-SS at the 6-month follow-up.

Results
Demographic characteristics of the study sample at baseline are presented in Table 1. Mental health diagnoses (current) at baseline as determined by administration of the SCID-5 [12] included: Bipolar Disorder I and II (3.9%), Major Depressive Disorder (26.6%), Alcohol Use Administration of the CAT-SS resulted in a median administration time of 107 seconds with median administration of 11 items to meet a precision threshold less than 5.0 points on the 100 point scale. At baseline, using CAT-SS thresholds [7], 137 (51.6%) of the participants were categorized as being at low, 125 (47.3%) at moderate, and 3 (1.1%) at high risk. Per the baseline C-SSRS, 91 (29.8%) had lifetime active ideation with a plan and intent, and 97 (32.0%) had a lifetime attempt. Data from the C-SSRS across all three study visits (baseline, 6-month, 12-month) are presented in Table 2.
As a continuous measure the CAT-SS was strongly associated with suicide-related outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; and all outcomes combined) over a 12-month period, with the strength of the associations increasing with repeated longitudinal assessments (see Table 3). Analyses were also conducted to study the added predictive accuracy of the CAT-SS for future suicidal events, above and beyond the predictive accuracy of a suicide attempt within the past year. Findings suggested large increases in AUC for all 4 outcomes (active ideation with plan and intent chi-square = 15.80, df = 1, p<0.0001; attempt chisquare = 5.78, df = 1, p = 0.02; interrupted, aborted or self-interrupted attempt or preparatory acts or behaviors chi-square = 17.92, df = 1, p<0.0001; and, all outcomes combined chisquare = 9.86, df = 1, p<0.002). The ROC curves for active ideation with plan and intent, and attempt for past year suicide attempt, CAT-SS, and past year suicide attempt and CAT-SS are displayed in Figs 1 and 2.
Longitudinal analysis of these data revealed that for every 10-point change in the CAT-SS score there was between a 50 and 77% increase in the likelihood of a suicidal event across the 4 outcomes, all of which were statistically significant, or a 5-fold to almost 8-fold increase over the range of the scale. Moreover, adjusting for suicide attempt in the past year, revealed similar strong associations between the CAT-SS and suicidal event outcomes ranging from 36 to 73% or 4-fold to 7-fold increase across the range of the scale (see Table 4).

Discussion
To address the pressing public health problem of suicide, efforts must be aimed at validating measures that can be used to evaluate suicide risk in both primary and specialty care medical settings. Ideally, such measures would be rapidly administered (e.g., self-report) via an electronic platform, and personalized to individual patients. Among Veterans seeking care at a VAMC, the CAT-SS assessed suicide risk severity with a median of 11 items in under two

6-12 Months
Active minutes (107 seconds); thereby highlighting feasibility of administration similar to that identified among the initial validation cohort (11 items and 110 seconds) [7]. Moreover, results revealed that CAT-SS scores were strongly associated with future suiciderelated outcomes over the 12-month study period. Although results, in terms of such associations, were similar at 6-and 12-months, the strength of associations increased with repeated CAT-SS assessment. These findings highlight the utility of the CAT-SS for both initial identification and continued monitoring of risk. Longitudinal analysis also revealed that for every 10-point change in the CAT-SS score there was between a 50 and 77% increase in the likelihood of a suicidal event across the 4 outcomes, all of which were statistically significant, or a 5-fold to almost 8-fold increase over the range of the scale.
Previous research has shown that history of suicide attempt is one of the most significant risk factors for suicide [14]. Similarly, when clinicians were asked about factors which they considered "most important" in assessing suicide risk, they weighed the presence of suiciderelated behaviors (e.g., preparatory behavior) as well as a history of attempts more heavily than other factors [15]. In fact, prior history of suicide attempt is strongly recommended as one of   [14]. Thus, a critical marker of validity for any suicide risk measure is the degree to which it can predict future suicidal events when compared with other empirically robust variables, such as suicide attempt history. That is, the measure should increase the ability to predict future suicidal behavior, above and beyond known epidemiologic risk factors (e.g., history of a suicide attempt). In this study, CAT-SS scores outperformed history of suicide attempt in the past year as a predictor of future suicide-related thoughts and behaviors. As highlighted above, statistically significant increases in AUC were found in models that that added CAT-SS results to a model that only included a history of suicide attempt; thereby illustrating the added value of the CAT-SS over traditional predictive models based on past suicidal behavior only.
These findings have important clinical implications for suicide risk screening across healthcare settings. The VHA has developed and implemented an enterprise-wide evidenceinformed approach to suicide risk screening and evaluation, VA Suicide Risk Identification process (VA RISK ID) [5]. Currently, universal screening is being implemented using the C-SSRS Screener. However, findings from this study provide compelling evidence regarding both the efficiency and the long-term predictive validity of the CAT-SS in a medically diverse patient population. Further research is warranted to evaluate whether the CAT-SS could be feasibly implemented as part of universal screening efforts like the VA RISK ID [5], and whether CAT-SS dimensional scores could facilitate more accurate identification of suicide risk levels, while reducing patient and provider burden. Doing so, would be expected to provide additional time to facilitate personalized suicide risk-stratified care management.
As noted above, measures were administered as part of a research protocol, additional work is required to evaluate where the CAT-SS could be implemented in clinical settings. Efforts aimed at exploring this are warranted. Nonetheless, findings from this study suggest that if implemented in the electronic medical record, the CAT-SS would be expected to rapidly facilitate precise and personalized screening and assessment of suicide risk severity.