To validate the Computerized Adaptive Test Suicide Scale (CAT-SS), Veterans completed measures at baseline (n = 305), and 6- (n = 249), and 12-months (n = 185), including the CAT-SS (median items 11, duration of administration 107 seconds) and the Columbia-Suicide Severity Rating Scale (C-SSRS). Logistic regression was used to relate CAT-SS scores (baseline) to C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined). A mixed-effects logistic regression model was used to evaluate the relationship between the lagged CAT-SS scores and outcomes (6- and 12-months). The baseline CAT-SS demonstrated predictive accuracy for all outcomes at 6-months, and similar results were found for baseline and all outcomes at and through 12-months. Longitudinal analysis revealed for every 10-point change in the CAT-SS there was a 50–77% increase in the likelihood of suicide-related outcomes. The CAT-SS demonstrated added value when compared to current suicide risk prediction practices.
Citation: Brenner LA, Betthauser LM, Penzenik M, Bahraini N, Gibbons RD (2022) Validation of a Computerized Adaptive Test Suicide Scale (CAT-SS) among United States Military Veterans. PLoS ONE 17(1): e0261920. https://doi.org/10.1371/journal.pone.0261920
Editor: Sarah A. Arias, Brown University Warren Alpert Medical School, UNITED STATES
Received: July 13, 2021; Accepted: December 13, 2021; Published: January 21, 2022
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: The full dataset cannot be made publicly available due to privacy concerns and restrictions imposed by the Colorado Multiple Institutional Review Board. All relevant de-identified data are included in the manuscript. For investigators with appropriate authorizations within the Department of Veterans Affairs, requests for data access can be made to VHAECHMIRECCAdmin@va.gov.
Funding: Funding was provided by the Veterans Health Administration, Office of Mental Health and Suicide Prevention; NIMH Grant #RO1 MH100155-06. The Office of Mental Health and Suicide Prevention did not influence the decision to submit this manuscript for submission.
Competing interests: The views, opinions, and/or findings contained in this article are those of the author(s) and should not be construed as an official Department of Veterans Affairs position, policy, or decision unless so designated by other documentation. Dr. Brenner has received royalties from American Psychological Association Publishing and Oxford University Press. Dr. Gibbons has been an expert witness for the US Department of Justice, Merck, Glaxo-Smith-Kline, Pfizer and Wyeth and is a founder of Adaptive Testing Technologies, which distributes the CAT-MH™ battery of adaptive tests in which the CAT-SS is included. The terms of this arrangement have been reviewed and approved by the University of Chicago in accordance with its conflict of interest policies.
In the United States, rates of suicide have been increasing among military and civilian cohorts [1, 2]. According to work by Ahmedani et al. , almost 30% of individuals who died by suicide had a healthcare visit in the week prior to their death. Recognizing the importance of risk screening within healthcare systems, in 2016 the Joint Commission released a Sentinel Event Alert, which recommended that universal suicide risk screening be implemented . Ideally, such efforts would facilitate identification of those with occult risk (individuals who may disclose suicidal thoughts and behaviors only if they are directly asked) who may not be engaged in mental health treatment . Nonetheless, options and evidence regarding tools which can be used to facilitate universal risk screening remain limited .
As screening for depression frequently occurs in primary care settings, often using the Patient Health Questionniare-9 (PHQ-9) , efforts to evaluate the utility of the PHQ-9 as a suicide risk screener have been undertaken. However, likely related to the measure only containing one item (item 9) specifically focused on suicidal ideation (“bothered by thoughts of being dead or of hurting yourself in some way”), as well as the reality that a sizable number of individuals’ risk for suicide is related to factors other than depression (e.g., chronic pain, anxiety), results have been mixed. In specific, data regarding psychometric properties (e.g., positive predictive value) have been less than ideal [6–8]. Moreover, results from most rapid screeners like the PHQ-9 item 9  often do not provide the clinician with information regarding risk severity or magnitude .
In addition, many suicide risk screening measures (e.g., the Columbia Suicide Rating Scale (C-SSRS)-Screener)  include items solely focused on suicidal ideation and behavior; thereby limiting the ability to measure “the full spectrum of suicidal symptomatology” [7; pp. 1376]. Ideally, suicide risk screening approaches would incorporate personalized items associated with a range of risk factors. Tailoring screening measures while maintaining psychometric properties requires implementation of novel approaches such as computerized adaptive testing (CAT) based on unidimensional or multidimensional item response theory (M/IRT). Traditional mental health measures are based on classical test theory, where all respondents receive all items and which are equally weighted in terms of deriving the test score, which is the often the summation of the individual item scores, rated either dichotomously or as polytomous Likert scale items. In contrast, IRT-based CAT uses unidimensional or multidimensional item response theory (MIRT) to pre-calibrate a large “bank” of symptom-items, that are then adaptively selected to match the severity of the person’s disorder, which is adaptively estimated from the responses to prior items administered [7; pp. 1377]. As a result, different items are administered to different respondents, targeted to their level of severity on the underlying construct of interest (in our case suicide risk). For further information regarding MIRT-based CAT see Gibbons et al., 2008  and Gibbons et al., 2016 .
Thus, Gibbons and colleagues developed and conducted an initial validation study on the Computerized Adaptive Test-Suicide Scale (CAT-SS) . Using data from individuals receiving outpatient psychiatric treatment, the team was able to calibrate the CAT-SS, and demonstrate that the CAT-SS measured suicide risk severity using a mean of 10 items, in under two minutes. Moreover, initial validity was demonstrated comparing CAT-SS and C-SSRS structured clinical interview results among those seeking care in two non-Veterans Affairs emergency departments (University of Chicago and University of Massachusetts). Contrasting the CAT-SS high-risk group to the no-risk group a sensitivity of 1.0 and specificity of 0.92 were found for the C-SSRS active ideation category. Per the authors, additional prospective validation efforts, including prediction of future suicidal events, were warranted. Towards this end, members of this team conducted a longitudinal study among Veterans eligible for Veterans Health Administration (VHA) care to validate the CAT-SS self-report measure in terms of its ability to predict future suicide events based on repeated C-SSRS clinical interviews at 6-months and 12-months following the baseline CAT-SS assessment.
This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human participants were approved by the Colorado Multiple Institutional Review Board (COMIRB). Participants (n = 305) were recruited from a mountain state metropolitan VA health care system between April 2017 and February 2019. Recruitment strategies included posting flyers at local facilities, contacting Veterans who had participated in previous research or who indicated interest in participating in research, and encouraging providers to tell patients about the study. Veterans were eligible if they were between the ages of 18 and 89 and able to provide written informed consent, which was obtained. The number of veterans who completed measures at each timepoint is as follows: baseline, n = 305; 6-month follow-up, n = 249; 12-month follow-up, n = 185.
Computerized Adaptive Test-Suicide Scale (CAT-SS) * is an adaptive measure, comprised of 111-items, which dimensionally measures suicide risk severity on a 100-point scale with 5 points of precision. The scores are also thresholded to yield categories of low, moderate, and high risk.
The Columbia-Suicide Severity Rating Scale (C-SSRS) * is a clinician-administered interview used to evaluate suicidal ideation (including intensity) and suicide-related behavior (e.g., preparatory, attempt).
Structured Clinical Interview for DSM-5 Disorders (SCID-5) Research Version  is a reliable and valid semi-structured interview used to diagnose Axis I psychiatric disorders in clinical and research settings. The SCID-5 was used to determine current presence of the following disorders: Bipolar I and II; Major Depressive; Alcohol Use; Substance Use, Generalized Anxiety; and, Sleep. The trauma/PTSD L Module of the Structured Clinical Interview for SCID-5  was used to assess Criterion A events. If a Criterion A event and at least one current symptom was endorsed, the Clinician-Administered PTSD Scale for DSM-5 CAPS-5 was administered. The CAPS-5 is the gold standard for assessing PTSD, and was used to determine current PTSD diagnosis .
Rocky Mountain MIRECC Demographic Questionnaire was used to gather information on topics such as participant age, gender, race/ethnicity, education, period of military service, and combat exposure.
*Measures administered at baseline, and 6- and 12-month follow-up appointments.
Data were collected at three timepoints (baseline, 6- and 12-month follow-up). After confirming eligibility, Veterans were invited to an in-person baseline study visit. Informed consent was obtained prior to administration of clinical interviews listed above, self-report measures (not included in this study), and the CAT-SS. Study team members were clinically trained to administer the measures and interview schedules were reviewed by licensed clinicians.
To facilitate retention, participants were re-contacted at approximately 6 months post the baseline study visit and offered an in-person or telephone visit. During this visit, the CAT-SS was re-administered. In addition, reminder letters to invite completion of the 12-month follow-up were sent 1–3 months prior to their 12-month window to promote retention. The final in-person study visit was conducted approximately 12 months following the baseline assessment, and the CAT-SS was again re-administered. Participants were compensated for all study visits. Two Veterans who had incomplete data at the 6-month visit and one at the 12month visit were removed from analyses. Reasons for attrition were not collected, however, Veterans were invited to complete the 12-month follow-up regardless of their completion of their 6-month follow-up. The final sample size for analysis was n = 265.
Logistic regression was used to relate the CAT-SS scores at baseline to the C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined) at 6 months, and the CAT-SS scores at baseline and 6-months to the outcomes at 12-months, and all events between baseline and 12-months. From the logistic regression model, we generated a receiver operating characteristic (ROC) curve and computed the area under the ROC curve (AUC). We also examined the unique contribution of the CAT-SS in predicting suicide-related outcomes over and above what has traditionally been considered a robust predictor, a suicide attempt within the past year. To test this, logistic regression models with: (a) previous suicide attempt in the past year; (b) the CAT-SS; and, (c) previous suicide attempt in the past year and the CAT-SS were fitted to these data and the AUCs statistically compared.
To study longitudinal trends in C-SSRS assessed outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; all outcomes combined), a mixed-effects logistic regression model was used to perform a longitudinal analysis of the relationship between the lagged CAT-SS scores and suicide-related outcomes at 6 and 12 months (i.e., CAT-SS score at baseline predicting suicide-related outcomes at month 6 and CAT-SS at month 6 predicting suicide events at 12 months). CAT-SS scores were divided by 10 so that the odds ratios were interpretable as the relationship between a 10-point change in CAT-SS (on a 100-point scale) and the likelihood of a suicide-related outcome. Separate analyses were conducted for each outcome, with and without adjustment of suicide attempt in the past year.
This study was powered to estimate an AUC of 0.8 with a 95% confidence interval of plus or minus 5%. Assuming an event rate of 10%, n = 250 subjects at the 6-month follow-up were required. A total of n = 247 subjects completed the CAT-SS at the 6-month follow-up.
Demographic characteristics of the study sample at baseline are presented in Table 1. Mental health diagnoses (current) at baseline as determined by administration of the SCID-5  included: Bipolar Disorder I and II (3.9%), Major Depressive Disorder (26.6%), Alcohol Use Disorder (8.9%), Substance Use Disorder (9.2%), Generalized Anxiety Disorders (3.6%), and Sleep Disorders (6.9%). Current PTSD was determined by responses to the CAPS-5 , with 28.5% of the sample meeting PTSD criteria (n = 87).
Administration of the CAT-SS resulted in a median administration time of 107 seconds with median administration of 11 items to meet a precision threshold less than 5.0 points on the 100 point scale. At baseline, using CAT-SS thresholds , 137 (51.6%) of the participants were categorized as being at low, 125 (47.3%) at moderate, and 3 (1.1%) at high risk. Per the baseline C-SSRS, 91 (29.8%) had lifetime active ideation with a plan and intent, and 97 (32.0%) had a lifetime attempt. Data from the C-SSRS across all three study visits (baseline, 6-month, 12-month) are presented in Table 2.
As a continuous measure the CAT-SS was strongly associated with suicide-related outcomes (active ideation with plan and intent; attempt; interrupted, aborted or self-interrupted attempt, or preparatory acts or behaviors; and all outcomes combined) over a 12-month period, with the strength of the associations increasing with repeated longitudinal assessments (see Table 3). Analyses were also conducted to study the added predictive accuracy of the CAT-SS for future suicidal events, above and beyond the predictive accuracy of a suicide attempt within the past year. Findings suggested large increases in AUC for all 4 outcomes (active ideation with plan and intent chi-square = 15.80, df = 1, p<0.0001; attempt chi-square = 5.78, df = 1, p = 0.02; interrupted, aborted or self-interrupted attempt or preparatory acts or behaviors chi-square = 17.92, df = 1, p<0.0001; and, all outcomes combined chi-square = 9.86, df = 1, p<0.002). The ROC curves for active ideation with plan and intent, and attempt for past year suicide attempt, CAT-SS, and past year suicide attempt and CAT-SS are displayed in Figs 1 and 2.
Longitudinal analysis of these data revealed that for every 10-point change in the CAT-SS score there was between a 50 and 77% increase in the likelihood of a suicidal event across the 4 outcomes, all of which were statistically significant, or a 5-fold to almost 8-fold increase over the range of the scale. Moreover, adjusting for suicide attempt in the past year, revealed similar strong associations between the CAT-SS and suicidal event outcomes ranging from 36 to 73% or 4-fold to 7-fold increase across the range of the scale (see Table 4).
To address the pressing public health problem of suicide, efforts must be aimed at validating measures that can be used to evaluate suicide risk in both primary and specialty care medical settings. Ideally, such measures would be rapidly administered (e.g., self-report) via an electronic platform, and personalized to individual patients. Among Veterans seeking care at a VAMC, the CAT-SS assessed suicide risk severity with a median of 11 items in under two minutes (107 seconds); thereby highlighting feasibility of administration similar to that identified among the initial validation cohort (11 items and 110 seconds) .
Moreover, results revealed that CAT-SS scores were strongly associated with future suicide-related outcomes over the 12-month study period. Although results, in terms of such associations, were similar at 6- and 12-months, the strength of associations increased with repeated CAT-SS assessment. These findings highlight the utility of the CAT-SS for both initial identification and continued monitoring of risk. Longitudinal analysis also revealed that for every 10-point change in the CAT-SS score there was between a 50 and 77% increase in the likelihood of a suicidal event across the 4 outcomes, all of which were statistically significant, or a 5-fold to almost 8-fold increase over the range of the scale.
Previous research has shown that history of suicide attempt is one of the most significant risk factors for suicide . Similarly, when clinicians were asked about factors which they considered “most important” in assessing suicide risk, they weighed the presence of suicide-related behaviors (e.g., preparatory behavior) as well as a history of attempts more heavily than other factors . In fact, prior history of suicide attempt is strongly recommended as one of the risk factors that should be assessed as part of a comprehensive suicide risk evaluation in the Departments of Veterans Affairs and Defense Clinical Practice Guideline for the Assessment and Management of Suicidal Behavior . Thus, a critical marker of validity for any suicide risk measure is the degree to which it can predict future suicidal events when compared with other empirically robust variables, such as suicide attempt history. That is, the measure should increase the ability to predict future suicidal behavior, above and beyond known epidemiologic risk factors (e.g., history of a suicide attempt). In this study, CAT-SS scores outperformed history of suicide attempt in the past year as a predictor of future suicide-related thoughts and behaviors. As highlighted above, statistically significant increases in AUC were found in models that that added CAT-SS results to a model that only included a history of suicide attempt; thereby illustrating the added value of the CAT-SS over traditional predictive models based on past suicidal behavior only.
Recently, the CAT-SS has shown to be unbiased in a sample of 1,073 sexual and gender minority youth, mean age 20.3 years (SD = 3.2) , and to predict future suicidal events (ideation; plan; ideation, plan or attempt). Similar to our study, the CAT-SS improved predictive accuracy over traditional self-reports of ideation from an AUC of 0.70, 95% CI (0.64, 0.76) to AUC = 0.85, 95% CI (0.79, 0.90); suicide plan from AUC of 0.65, 95% CI (0.56, 0.73) to AUC = 0.84, 95% CI (0.77, 0.92); and, ideation, plan, or attempt from AUC = 0.71, 95% CI (0.65, 0.77) to AUC = 0.83, 95% CI (0.78, 0.88), all of which were statistically significant improvements in fit. The full model that included demographic characteristics, previous suicidal events, and the CAT-SS at baseline predicted suicidal ideation (AUC = 0.86, 95% CI (0.82, 0.91)), suicide plan (AUC = 0.86, 95% CI (0.80, 0.92)), and ideation, plan, or attempt (AUC = 0.84, 95% CI (0.79, 0.89)) at 6 month follow-up. Berona et al.  conducted a separate analysis of these data and showed predictive validity of the baseline CAT-SS in predicting time to suicide attempt during 6 months (HR = 1.34, 95% CI (1.03, 1.74)) overall and HR = 1.51, 95% CI (1.06, 2.15) for the transition from suicidal ideation to suicide attempt for each 10 point increment on the CAT-SS. These findings are remarkably similar to the findings of our study in a very different sample and age group, demonstrating the generalizability and robustness of our results.
These findings have important clinical implications for suicide risk screening across healthcare settings. The VHA has developed and implemented an enterprise-wide evidence-informed approach to suicide risk screening and evaluation, VA Suicide Risk Identification process (VA RISK ID) . Currently, universal screening is being implemented using the C-SSRS Screener. However, findings from this study provide compelling evidence regarding both the efficiency and the long-term predictive validity of the CAT-SS in a medically diverse patient population. Further research is warranted to evaluate whether the CAT-SS could be feasibly implemented as part of universal screening efforts like the VA RISK ID , and whether CAT-SS dimensional scores could facilitate more accurate identification of suicide risk levels, while reducing patient and provider burden. Doing so, would be expected to provide additional time to facilitate personalized suicide risk-stratified care management.
As noted above, measures were administered as part of a research protocol, additional work is required to evaluate where the CAT-SS could be implemented in clinical settings. Efforts aimed at exploring this are warranted. Nonetheless, findings from this study suggest that if implemented in the electronic medical record, the CAT-SS would be expected to rapidly facilitate precise and personalized screening and assessment of suicide risk severity.
- 1. Gordon JA, Avenevoli S, Pearson JL. Suicide Prevention Research Priorities in HealthCare. JAMA Psychiatry. 2020;77(9):885–6. Available from: pmid:32432690
- 2. Department of Defense, Under Secretary of Defense of Personnel and Readiness. Annual Suicide Report, Calendar Year 2018. 2019 Sept. 47 p. Available from: https://www.dspo.mil/Portals/113/2018%20DoD%20Annual%20Suicide%20Report_FINAL_25%20SEP%2019_508c.pdf
- 3. Ahmedani BK, Westphal J, Autio K, Elsiss F, Peterson EL, Beck A, et al. Variation in patterns of health care before suicide: A population case-control study. Preventive Medicine. 2019;127:105796. Available from: pmid:31400374
- 4. Horowitz LM, Boudreaux ED, Schoenbaum M, Pao M, Bridge JA. Universal Suicide Risk Screening in the Hospital Setting: Still a Pandora’s Box? Jt Comm J Qual Patient Saf. 2018;44(1):1–3. Available from: pmid:29290241
- 5. Bahraini N, Brenner LA, Barry C, Hostetter T, Keusch J, Post EP, et al. Assessment of Rates of Suicide Risk Screening and Prevalence of Positive Screening Results Among US Veterans After Implementation of the Veterans Affairs Suicide Risk Identification Strategy. JAMA Network Open. 2020;3(10):e2022531–e. Available from: pmid:33084900
- 6. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13. Available from: pmid:11556941
- 7. Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017;78(9):1376–82. Available from: pmid:28493655
- 8. Uebelacker LA, German NM, Gaudiano BA, Miller IW. Patient health questionnaire depression scale as a suicide screening instrument in depressed primary care patients: a cross-sectional study. Prim Care Companion CNS Disord. 2011;13(1). Available from: pmid:21731830
- 9. Posner K, Brown GK, Stanley B, Brent DA, Yershova KV, Oquendo MA, et al. The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry. 2011;168(12):1266–77. Available from: pmid:22193671
- 10. Gibbons RD, Weiss DJ, Kupfer DJ, Frank E, Fagiolini A, Grochocinski VJ, et al. Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services. 2008; 361–368. pmid:18378832
- 11. Gibbons RD. Computerized adaptive diagnosis and testing of mental health disorders. Annual Review of Clinical Psychology. 2016; 83–104. pmid:26651865
- 12. First M, Williams J, Karg R, Spitzer R. Structured clinical interview for DSM-5—Research version (SCID-5 for DSM-5, research version; SCID-5-RV). Arlington, VA: American Psychiatric Association. 2015;1–94. https://doi.org/10.1186/s40337-020-00314-3 pmid:32821383
- 13. Weathers FW, Blake DD, Schnurr PP, Kaloupek DG, Marx BP, Keane TM. The clinician-administered PTSD scale for DSM-5 (CAPS-5). National Center for PTSD. 2013.
- 14. Department of Veterans Affairs, Department of Defense. VA/DoD Clinical Practice Guideline For The Assesment and Management of Patients at Risk For Suicide. 2019 May. 142 p. Available from: https://www.healthquality.va.gov/guidelines/MH/srb/VADoDSuicideRiskFullCPGFinal5088212019.pdf
- 15. Pease JL, Forster JE, Davidson CL, Holliman BD, Genco E, Brenner LA. How Veterans Health Administration Suicide Prevention Coordinators Assess Suicide Risk. Clin Psychol Psychother. 2017;24(2):401–10. Available from: pmid:28401708
- 16. Mustanski B, Whitton SW, Newcomb ME, Clifford A, Ryan DT, Gibbons RD. Predicting suicidality using a computer adaptive test: Two longitudinal studies of sexual and gender minority youth. J Consult Clin Psychol. 2021;89(3):166–75. Available from: pmid:33829805
- 17. Berona J, Whitton S, Newcomb ME, Mustanski B, Gibbons RD. Prospective risk and protective factors for the transition from suicide ideation to attempt among sexual and gender minority youth. Psychiatric Services. Forthcoming.