Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Validation of Six Short and Ultra-short Screening Instruments for Depression for People Living with HIV in Ontario: Results from the Ontario HIV Treatment Network Cohort Study

  • Stephanie K. Y. Choi,

    Affiliations The Ontario HIV Treatment Network, Toronto, Ontario, Canada, The Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada

  • Eleanor Boyle,

    Affiliations Institute of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

  • Ann N. Burchell,

    Affiliation Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

  • Sandra Gardner,

    Affiliations The Ontario HIV Treatment Network, Toronto, Ontario, Canada, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

  • Evan Collins,

    Affiliations University Health Network, Toronto, Ontario, Canada, Department of Psychiatry, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada

  • Paul Grootendorst,

    Affiliations Division of Social and Administrative Pharmacy, Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, Ontario, Canada, Department of Economics, McMaster University, Hamilton, Ontario, Canada

  • Sean B. Rourke ,

    Affiliations The Ontario HIV Treatment Network, Toronto, Ontario, Canada, Department of Psychiatry, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada, St. Michael’s Hospital, Toronto, Ontario, Canada

  • OHTN Cohort Study Group

    Membership of the OHTN Cohort Study Group is described in the Acknowledgments.

Validation of Six Short and Ultra-short Screening Instruments for Depression for People Living with HIV in Ontario: Results from the Ontario HIV Treatment Network Cohort Study

  • Stephanie K. Y. Choi, 
  • Eleanor Boyle, 
  • Ann N. Burchell, 
  • Sandra Gardner, 
  • Evan Collins, 
  • Paul Grootendorst, 
  • Sean B. Rourke, 
  • OHTN Cohort Study Group



Major depression affects up to half of people living with HIV. However, among HIV-positive patients, depression goes unrecognized 60–70% of the time in non-psychiatric settings. We sought to evaluate three screening instruments and their short forms to facilitate the recognition of current depression in HIV-positive patients attending HIV specialty care clinics in Ontario.


A multi-centre validation study was conducted in Ontario to examine the validity and accuracy of three instruments (the Center for Epidemiologic Depression Scale [CESD20], the Kessler Psychological Distress Scale [K10], and the Patient Health Questionnaire depression scale [PHQ9]) and their short forms (CESD10, K6, and PHQ2) in diagnosing current major depression among 190 HIV-positive patients in Ontario. Results from the three instruments and their short forms were compared to results from the gold standard measured by Mini International Neuropsychiatric Interview (the “M.I.N.I.”).


Overall, the three instruments identified depression with excellent accuracy and validity (area under the curve [AUC]>0.9) and good reliability (Kappa statistics: 0.71–0.79; Cronbach’s alpha: 0.87–0.93). We did not find that the AUCs differed in instrument pairs (p-value>0.09), or between the instruments and their short forms (p-value>0.3). Except for the PHQ2, the instruments showed good-to-excellent sensitivity (0.86–1.0) and specificity (0.81–0.87), excellent negative predictive value (>0.90), and moderate positive predictive value (0.49–0.58) at their optimal cut-points.


Among people in HIV care in Ontario, Canada, the three instruments and their short forms performed equally well and accurately. When further in-depth assessments become available, shorter instruments might find greater clinical acceptance. This could lead to clinical benefits in fast-paced speciality HIV care settings and better management of depression in HIV-positive patients.


Depression affects up to half of people living with HIV [14]. However, depression goes unrecognized in about 60–70% of HIV-positive patients in non-psychiatric healthcare settings [58]. When depression is left untreated in HIV-positive patients, it can reduce immune activity [912] increase the risk of co-morbidities and mortality [13,14], and reduce quality of life [15]. Given the advancements made by highly active antiretroviral therapy (HAART), HIV-positive patients are living longer, and physicians and patients are facing long-term challenges in managing depression [16]. Because of the substantive negative impacts of depression on clinical outcomes normally found among HIV-positive patients, recent guidelines from Canada, U.K. and the U.S. recommend that screening should be undertaken if follow-up in-depth assessments are available [1719].

Over the past several decades, numerous short and ultra-short screening instruments have been developed to assist in examining depressive symptomatology in non-psychiatric healthcare settings [20,21]. Despite ongoing debates about the effectiveness of these instruments, a recent meta-analysis of 113 studies has shown that most instruments demonstrate adequate performance when used in the initial assessment of depression among patients with physical illness [20].

The 9-item Patient Health Questionnaire (PHQ9), the 20-item Center for Epidemiologic Depression Scale (CES-D20), and the 10-item Kessler Psychological Distress Scale (K10) are three screening instruments commonly used with HIV-positive patients [21,22]. The PHQ9 has earned acceptance in primary care and research settings because it is half of the length of most other instruments but maintains comparable sensitivity and specificity [23]. Each item of the PHQ9 also corresponds to specific Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) depression diagnosis criteria [23]. The CESD20 has the longest history of measuring depression in both HIV-positive patients and the general population [21,22]. It was originally designed for community surveys and has extensively demonstrated its reliability and validity [20,24]. The K10 is a short instrument that can broadly screen for both anxiety and depressive disorders [25]. It has strong psychometric properties for distinguishing DSM-IV disorders and its diagnostic accuracy has been shown to have no significant bias by gender or education level [26,27].

Although these three instruments have been extensively evaluated in the general population [24] and in patients with physical illness [20], evaluations of the instruments among HIV-positive patients have been performed mainly in limited-resource settings (i.e., Sub-Saharan Africa) [21,22]. However, the characteristics of the HIV-positive patients in Sub-Saharan Africa—for instance, their literacy levels and their understanding and expression of mental health issues—might be quite different from those of North Americans and affect the evaluation of the instruments. As a result, the psychometric properties of the three instruments and their comparability to a “gold standard” remain unknown for HIV-positive patients in well-resourced settings such as Canada and the United States.

Our multi-centre study sought to determine and compare the diagnostic accuracy and reliability of the three instruments (CESD20, K10, and PHQ9) and their short forms (CESD10, K6, and PHQ2) for current major depression against a gold standard as measured by the Mini International Neuropsychiatric Interview (the “M.I.N.I.”). The study focused on HIV-positive patients receiving HIV primary care in Ontario. Additional study objectives were to determine the optimal cut-points for each screening instrument and to examine potential factors that might affect the diagnostic accuracy of the instruments.

Materials and Methods

Study design

We conducted a cross-sectional validation study nested within a larger cohort of participants in HIV care. The Ontario HIV Treatment Network Cohort Study (OCS) is a multi-site, HIV-positive, clinical cohort. Full details regarding the cohort design can be found in a previous publication [28]. Briefly, participants are HIV-positive patients aged 16 years or older receiving care at one of ten specialty HIV clinics in Ontario. Clinical data recorded during the participants’ routine health care visits are abstracted from clinic records and, since 2008, participants have been interviewed annually.

Three OCS sites were included in this validation study: Maple Leaf Medical Centre in Toronto, St. Joseph’s Health Care in London, Ontario, and Windsor Regional Hospital. Participants who agreed to take part in the study received a $20 CAD honorarium. Ethical approval was received from the University of Toronto Human Subjects Review Committee and from the individual study sites (i.e. Ottawa Health Science Network Research Ethics Board, The University of Western Ontario Research Ethics Board for Health Sciences Research involving Human Subjects, St. Michael's Hospital Research Ethics Board, the Research Ethics Board of Health Sciences North, Sunnybrook Health Sciences Centre Research Ethics Board, University Health Network Research Ethics Board, and Windsor Regional Hospital Research Ethics Board). Our consent procedure was approved by all the ethics boards involved and written informed consent was obtained from each participant.

Recruitment, Data Collection Procedures, and Measures

Between May 1 and December 31, 2014, clinical nurses at each site invited OCS participants to take part in the validation study during their regular appointment. The nurses had received training on how to conduct M.I.N.I. interviews from a psychiatrist specializing in mental disorders and neurocognitive impairments in HIV-positive patients. The nurses were able to consult regularly with the psychiatrist by phone (at the London and Windsor centres) or in person (at the Toronto centre).

Participants completed the three screening instruments (CESD20, K10, and PHQ9). Their short forms (CESD10, K6, and PHQ2) were derived from the long-forms. Details of the three instruments and their short forms are provided in Tables 1 and 2.

Following the completion of the M.I.N.I. interview, and on the same date, the nurses administered an electronic version of the M.I.N.I. [29] to diagnose current major depressive disorder. The M.I.N.I. is a short and widely adopted structured interview that takes about 15 minutes to complete and can be easily administered by a lay interviewer [29]. The M.I.N.I. has high sensitivity (94–96%) and specificity (79–88%) for identifying major depressive disorder when compared to the structured clinical interviews for the DSM-IV (SCID) and the International Classification of Disease, 10th revision (ICD-10) criteria [2931]. Nurses and participants were blinded to the results of the M.I.N.I. interviews.


We also assessed whether certain characteristics of patients might affect the diagnostic accuracy of the screening instruments. Patient information was obtained through interviews administered by the nurses on the study date or during a previous appointment [28]. Measurement details for key characteristics are provided in Table 3.

Statistical Analysis

After the data were collected and de-identified, results from the M.I.N.I. diagnoses and total scores for the three screening instruments were generated at the OCS office by the lead investigator (S. C.) who was independent to the data collection. Our statistical analysis plan was four-fold: 1) To examine the diagnostic accuracy of the three screening instruments and their short forms; 2) To identify optimal cut-points for the screening instruments; 3) To examine the effects of seven previously documented somatic symptoms of HIV infection [32] on the diagnostic accuracy and performance of the screening instruments; and 4) To examine inter-rater agreement for pairs of the three instruments and internal consistency of each instrument.

We first used descriptive statistics to describe baseline characteristics, scores of the screening instruments and their short forms, and the prevalence of DSM-IV defined psychiatric disorders among study participants. We also assessed the differences by age (Student’s t-test) and by sex (Pearson’s chi-squared test) between our sample and the rest of the OCS participants who are currently active in the OCS.

We then used non-parametric crude and adjusted Receiver Operating Characteristic (ROC) analyses to examine the criterion validity and accuracy of the three screening instruments and their short forms as compared to the M.I.N.I. First, overall psychometric property of each instrument was described by a global measure: area under the ROC curve (AUC). In general, values of AUC (ranged: 0.5 to 1) greater than 0.8 and 0.9 indicated either good or excellent performance respectively. Second, we used non-parametric Mann-Whitney U-test to assess for equality of ROC curves of the instruments [33]. For each screening instrument, several criterion validity statistics were reported at each pre-defined cut-point: sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), and negative likelihood ratio (LR-). Finally, adjusted multivariable non-parametric ROC analyses were performed [34] because some covariates may have an impact on the accuracy of the instruments. Bivariate analyses were first performed to examine crude associations between the ROC curve of each instrument and each covariate. Covariates with a p-value<0.25 were entered into the final multivariable model [35]. Coefficients of the adjusted multivariable model generally reflect the impact of a specific covariate on the adjusted ROC curve by assuming a linear relationship exists between diagnostic accuracy of the instruments and each covariate. A value of zero indicates no effect. We also assessed the overall impact of the covariates by comparing crude and adjusted AUCs for each instrument.

There are many criteria for determining optimal cut-points for screening instruments [3640]. In our study, we adopted three common criteria: Youden index (YI) [defined as Se+Sp-1] [41], distance (PROC01) between the optimal point on the ROC curve and the point of (0, 1), which is an ideal point corresponding to a sensitivity and specificity equal to 1 [defined as (NPV-1)2+(PPV-1)2] [37,39], and diagnostic odds ratios (DOR) [defined as LR+/LR-] [40,42]. The YI (ranged:-1 to 1) is a single index that balances the sensitivity and specificity where the greater its value, the better the validity of the cut-point. The PROC01 (ranged: 0 to 2) is a single index that is balanced on both the NPV and PPV and its minimum value indicates the best validity for the cut-point. The DOR (ranged: 0 to infinity) is a summary statistic that indicates the odds for a patient to have a positive result in the screening for depression when compared to a non-diseased patient. The greater the value of the DOR (ranged: 0 to infinity) indicates a better predictive performance. Because we were evaluating the predictive performance of each screening instrument, we made our final decision on the optimal cut-point based on the following order: DOR, PROC01, and YI.

We further examined the diagnostic accuracy of the screening instruments by removing some items (i.e., fatigue, sleep, appetite, not being able to shake the blues, feeling bothered, feeling depressed, and lack of concentration) from the instruments that have been previously reported as somatic symptoms of HIV. It is possible that these items might inflate depression scores [32]. For each instrument, we repeated the adjusted ROC analysis with items related to the somatic symptoms removed. We then used Wald test to determine for the equality between the adjusted ROC curves of the original instruments and their corresponding reduced scales. The standard error of the hypothesis test was obtained from a bias-corrected bootstrap method [43,44].

Finally, we used Cohen’s Kappa statistic (ranged: -1 to 1; 0.6–0.7, 0.8–0.9 and >0.9 representing good, very good and excellent agreement, respectively) to examine the inter-rater agreement of each instrument pair by dichotomizing total scores of the instrument at the optimal cut-points. Cronbach’s alpha (ranged: 0 to 1; 0.7–0.9 and >0.9 representing good and excellent consistency, respectively) was used to examine internal consistency of the instruments.

All reported 95% confidence intervals were constructed by bias-corrected bootstrap method with 2000 replicates [45]. All statistical analyses were 2-sided with statistical significance defined as a p-value less than 0.05 and were performed by using STATA IC v.13.1 [46].

Sample size calculation

Based on two receiver operating characteristics (ROC) curves power analysis, we would have required 177 individuals with complete data to achieve an 80% statistical power (assuming a prevalence of 17% and a difference of 0.15 in AUC to be detected between two ROC curves) [47,48].


Two hundred and thirty-seven HIV-positive patients (aged ≥ 18 years) agreed to participate in the validation study. When we compared the characteristics of the validation study participants to the remainder of the cohort, we found that participants were slightly younger (mean age: 47 v. 51 years; p-value: 0.02) and more likely to be male (86 v. 82%; p-value: 0.08).

Of the 237 HIV-positive patients initially included, we excluded 47 participants on the basis of information missing from either the M.I.N.I. or one of the screening instruments. Our final analytical sample was 190 patients. Of these, 179 had provided demographic, psychosocial, and behavioural information during a regular OCS interview conducted before the validation study began.

Prevalence of Depression and Characteristics of the Sample

Table 4 presents baseline characteristics and the prevalence of DSM-defined psychiatric disorders of the sample. Of the 179 patients who provided demographic information, the mean age was 47 (SD = 11) years and 87% were male. Based on DSM-IV criteria from the M.I.N.I., twenty-nine patients (16%) were identified with current major depression within the past two weeks. The mean and standard derivation of distribution of total scores of the CESD20, K10, PHQ9, CESD10, K6, and PHQ2 were 14(13), 18(8), 5(5), 8(7), 11(5), and 1(2) respectively. About half of the HIV-positive patients reported annual household incomes of less than $20,000 CAD and about half were recipients of Ontario Disability Support Program subsidies. About 40% of patients had at least one of the nine psychiatric disorders that we examined.

Table 4. Baseline Characteristics, the Mean Scores of the Screening Instruments the Sample, and the Prevalence of DSM-defined Psychiatric Disorders of the Sample (N = 179a).

Overall Psychometric Properties and Criterion Validity from ROC Analysis

Fig 1 presents the unadjusted non-parametric AUCs of the screening instruments and their short forms against the M.I.N.I. Overall, we found that all of the instruments were able to discriminate current major depression with excellent accuracy and validity (AUC >0.9). We estimated that AUCs of CESD20, K10, and PHQ9 were approximately 0.96 (95% CI: 0.92, 0.98), 0.93 (95% CI: 0.88, 0.96) and 0.91 (95% CI: 0.83, 0.96) respectively. Their short forms performed comparably: CESD10 (AUC: 0.95; 95% CI: 0.91, 0.98), K6 (AUC: 0.92; 95% CI: 0.87, 0.95), and PHQ2 (AUC: 0.89; 95% CI: 0.81, 0.94). We did not find that the AUCs were significantly different between each pair of instruments (e.g. absolute value of [AUCCESD-20-AUCPHQ-9 = 0.05], p-value>0.1) or between the instruments and their corresponding short forms (e.g. absolute value of [AUC PHQ-9-AUCPHQ-2] = 0.02, p-value >0.3) (Table 5).

Fig 1. Crude ROC Curves of the Index Screening Instruments and their Short Forms for Current Major Depression (N = 190); Footnotes: All reported 95% confidence intervals were constructed by bias-corrected bootstrap method with 2000 replicates (Efron & Tibshirani, 1994)

Table 5. Comparison of AUCs between Pairs of Index Screening Instruments and the AUCs between Original and the Short-form of Each Instrument (N = 190).

Of the 179 patients who provided demographic information, our multivariable ROC analysis indicated that the receipt of Ontario Disability Support Program subsidies might make discriminatory ability of these instruments weaker for CESD10 and PHQ9 (Table 6). Additionally, though the ROC curves and AUCs after controlling for covariates were similar to those without the adjustment, there were differences between the crude and adjusted ROC curves for each instrument (Fig 2)

Fig 2. Adjusted ROC Curves of the Index Screening Instruments and their Short Forms for Major Depressive Disorder (N = 179a); Footnotes: All reported 95% confidence intervals were constructed by bias-corrected bootstrap method with 2000 replicates (Efron & Tibshirani, 1994); AUC = Area under the curve; aOf 190 patients, 179 provided demographic, psychosocial and behavioural information;

Table 6. Multivariable ROC Analysisa for the Index Screening Instruments and their Short Forms for Current Major Depression Disorder (N = 179b).

Optimal Cut-points

Table 7 presents results for the diagnostic accuracy of the instruments at a range of possible cut-points evaluated in prior studies. Based on the best results for DOR, PROC01, and YI, we identified optimal cut-points of 22 (Se:0.97;Sp:0.81) for K10, 23 (Se:1.0;Sp:0.87) for CESD20, 8 (Se:0.86;Sp:0.82) for PHQ9, 13 (Se:0.97;Sp:0.81) for K6, 12 (Se:0.97;Sp:0.82) for CESD10, and 4 (Se:0.45;Sp:0.97) for PHQ2 respectively. Except for PHQ2, these instruments showed an excellent NPV (>0.90) for ruling-out major depression, but moderate PPV (0.49–0.51) for ruling-in the condition at their optimal cut-points. Although PHQ2 showed moderate PPV (0.7), its sensitivity was poor (0.45); hence, it was likely to miss some depression cases.

Table 7. Diagnostic Accuracy of the Index Screening Instruments and their Short Forms by Cut-offs for Current Major Depression (N = 190).

Impacts of Somatic Symptoms of HIV Infection on Diagnostic Accuracy

When we removed items (i.e., fatigue, sleep, appetite, not being able to shake the blues, feeling bothered, feeling depressed, and lack of concentration) [32] that were previously reported as somatic symptoms of HIV infection from the original screening instruments and their short forms for current major depression, we found that the results of adjusted AUCs of CESD20 (p-value = 0.0019), CESD10 (p-value = 0.017) and PHQ2 (p-value = 0.023) were significantly reduced (Fig 3).

Fig 3. Comparison Between Adjusted ROC Curves of the Original Instruments for Current Major Depression and that of their Corresponding Reduced Scales After Removing Items Related to Somatic Symptoms of HIV (N = 179a); Footnotes: All reported 95% confidence intervals were constructed by bias-corrected bootstrap method with 2000 replicates (Efron & Tibshirani, 1994); AUC = Area under the curve; aOf 190 patients, 179 provided demographic, psychosocial and behavioural information; bItems (i.e., fatigue, sleep, appetite, not being able to shake the blues, feeling bothered, feeling depressed, and lack of concentration) correspond to previously reported somatic symptoms of HIV infection (Kalichman, Rompa, &Cage, 2000).


Table 8 presents the results of inter-rater agreement of pairs of the three instruments and internal consistency for each instrument. Each pair of the three instruments demonstrated good inter-rater agreement (Cohen’s Kappa statistics: 0.71–0.79). The instruments also showed good-to-excellent internal consistency (Cronbach’s alpha: 0.87–0.93)

Table 8. Inter-rater Agreement of Pairs of Index Screening Instruments and Internal Consistency for each Instrument.


To our knowledge, this is the first study to examine and compare the diagnostic accuracy and reliability of three common depression screening instruments (CESD20, K10, and PHQ9) and their short forms against a DSM-IV defined gold standard in a HIV-positive population. Overall, each of the screening instruments diagnosed depression with excellent accuracy and reliability. The diagnostic accuracy of the three instruments and their short forms was comparable. Except for the PHQ2, each of the instruments showed good-to-excellent sensitivity and specificity, excellent negative predictive value, and moderate positive predictive value at optimal cut-points. The diagnostic accuracy of all instruments may vary according to presence or absence of physical and mental disability. Previously reported somatic symptoms of HIV infection might have affected the diagnostic accuracy of CESD20, CESD10, and PHQ2.

Our results of overall performance are generally consistent with findings previously reported with HIV-positive patients. First, the AUCs and criterion validity statistics of the CESD20 and PHQ9 were similar to prior findings from HIV-positive patients in Uganda [48]. Although our results were better than the pooled estimates (Se:0.82; Sp:0.73) reported in a recent meta-analysis, substantive between-study heterogeneity was reported in that analysis [22]. Second, the short forms of the three instruments performed comparably, a finding that is consistent with a recent systematic review [21]. Third, as with other studies, most of our test instruments showed moderate rates of false positives when ruling-in for depression [20].

A few differences were noted when we compared our results to the studies conducted in Sub-Saharan Africa. First, unlike Akena et al. (2013) [48], none of the three instruments were diagnostically superior according to AUC values among HIV-positive patients. Additionally, unlike the recent meta-analysis of 113 studies for patients with chronic physical illness, we did not find that the PHQ9 was the most sensitive [20]. However, our results of psychometric properties for the PHQ9 were generally comparable to that of the general population (Se = 0.88; Sp = 0.88) [23]. Second, the performance of K10 in OCS participants was better than previous findings of sensitivity (0.67–0.83) and specificity (0.72–0.77) reported by Akena et al.(2013) and Spies et al.(2009) [48,49]. This may due to systematic differences between the HIV-positive populations in Sub-Saharan Africa and Canada [48,49].

In terms of the optimal cut-points, our results differ from prior findings. For the PHQ9, our optimal cut-point was a total score of 8; previously-reported optimal cut-points have typically been a score of 10 [23,48]. However, results from a recent meta-analysis have shown that cut-points between 8 and 11 all report acceptable diagnostic properties for identifying major depression [50]. For the CESD20, our optimal cut-point was slightly higher than those previously reported (i.e., between 16 and 22) among HIV-positive patients [21,22,48], but an optimal point of 23 has also been reported in diabetic populations [51,52]. For the K10, our optimal cut-point was within the range reported in prior studies [48,49]. These differences may possibly be due to different criteria that we used when identifying the optimal cut-points. Our optimal cut-points were determined based on three common criteria: 1) diagnostic odds ratios; 2) PROC01; and 3)Youden index. The criteria that were used in prior Sub-Saharan Africa studies focused on maximizing sensitivity and specificity; however, these two measures are only one of the methods to measure the diagnostic accuracy and these criteria may not focus on evaluating predictive performance of a screening instrument.

Our results suggest that shorter instruments are desirable in primary HIV care settings because resource constraints are often found in these settings. Therefore, shorter instruments may find a greater acceptance and yield larger clinical benefits. However, similar to the original screening instruments, the shorter screening instruments also come with moderate positive predictive values, indicating that false positives are likely. We advise that the screening instruments should only be administered when in-depth follow-up assessments are available to properly diagnose depression.

Our results from multivariable ROC analysis indicated that in general, the presence of physical and mental disability may reduce the diagnostic accuracy of screening instruments, thereby making the instruments more difficult to detect depression cases. It is possible that the patients who are eligible for the ODSP programs are sicker and may have more severe physical and mental conditions when compared to other patients who were not eligible for the ODSP program. Similar to prior evidence [20,32], our results may imply that symptoms of chronic conditions may overlap with symptoms of depression especially among patients who have received ODSP subsidies. This would result in an inflation of the total scores for the screening instruments and cause a higher number of false positives, which will lead to a lower PPV to detect depression. As we showed in our further analysis, after we removed some items related to HIV somatic symptoms from the screening instruments, the diagnostic accuracy indicated by the adjusted AUCs were reduced. Therefore, our results suggests that careful consideration must be taken and in-depth follow-up assessments should be available when applying these instruments to patients with chronic illness, especially those with severe physical and mental impairments.

Our study has several strengths. First, this was a multi-center study whose participants may represent typical HIV-positive patients receiving care in Ontario [28]. Second, this is the first study to compare three common screening instruments for depression in a developed country. Unlike Akena et al.(2013) [48], our sample size calculation allowed for detecting differences between AUCs of the instruments, thereby allowing for direct comparison of their diagnostic accuracy. Comparing instruments within a single sample may overcome the heterogeneity issues that have been reported in a recent meta-analysis [21,22]. Third, our analysis also considered the potential impacts of somatic symptoms of HIV infection on the diagnostic accuracy of the instruments [32]. Finally, we adopted advanced statistical techniques to examine the impacts of potential factors that might affect the performance of the instruments [34].

There may be some limitations to our results. First, although the M.I.N.I has frequently been adopted as a “gold standard” for validation studies among the general population and HIV-positive patients [20,22], it is an abbreviated structured interview for psychiatric diagnoses; therefore, it is imperfect when compared to the SCID or ICD-10. This may impact on the discriminatory accuracy of the instruments. However, prior evidence has shown the M.I.N.I to have high sensitivity (94–96%) and specificity (79–88%) for identifying major depressive disorders when compared against SCID or ICD-10 criteria [2931]. Misclassification from use of the M.I.N.I. as the gold standard would have produced underestimates of sensitivity and specificity. Second, interviewer bias is likely because the M.I.N.I. interviews were conducted by nurses familiar with the clinical histories of their patients. It is possible that the nurses recalled the mental health conditions of their patients from previous appointments and that these recollections affected the interviews. Third, the completion of the screening instruments may have had a positive impact on the performance of the M.I.N.I. through priming (i.e., exposure to the screening instruments may have influenced how participants responded to their M.I.N.I.). This implies that the subsequent M.I.N.I. may have more likely been able to detect depression. Future studies should replicate our results by randomizing the order of the M.I.N.I and the screening instruments to determine if priming is a possibility. Fourth, our study might have been under-powered when testing for equality of AUCs of the instruments because the difference of the AUCs (0.15) that we assumed from Akena et al. (2013) was bigger than that of our current study [48]. Replication with a larger sample is desirable. Fifth, although efforts were made to ensure that our sample represented typical HIV-positive patients in Ontario, differences have been noted between the overall OCS cohort and non-OCS participants [53].

Despite the limitations noted above, our findings demonstrate excellent diagnostic accuracy and reliability of the CESD20, K10, and PHQ9 for current major depression in HIV-positive patients in Ontario. Additionally, the diagnostic accuracy of three instruments and their short forms was comparable. When follow-up assessments become available, shorter instruments may find greater acceptance and yield clinical benefits in relation to depression when incorporated into fast-paced speciality HIV care.


We gratefully acknowledge all of the people living with HIV who volunteered to participate in the OHTN Cohort Study and the work and support of the past and present members of the OCS Governance Committee (Past: Darien Taylor, Dr. Evan Collins, Dr. Greg Robinson, Shari Margolese, Tony Di Pede, Rick Kennedy, Michael Hamilton, Ken King, Brian Finch, Dr. Ahmed Bayoumi, Dr. Clemon George, Dr. Curtis Cooper, Dr. Troy Grennan, and present: Patrick Cupido (Chair), Anita Benoit, Breklyn Bertozzi, Adrian Betts, Les Bowman, Lisungu Chieza, Tracey Conway, Brian Huskins, Claire Kendall, Nathan Lachowsky, Joanne Lindsay, John MacTavish, Mark McCallum, Colleen Price, Lori Stoltz, Rosie Thein).

We thank all the interviewers, data collectors, research associates and coordinators, nurses and physicians who provide support for data collection and extraction. The authors wish to thank their OHTN colleagues and their teams for professional editing and knowledge translation support (Emily White), the M.I.N.I. training and support (Dr. Adriana Carvalhal), statistical support (Veronika Moravan), data management and IT support (Robert Hudder, Nahid Qureshi), and study Coordinators (Kevin Challacombe, OCS Data & Brooke Ellis, OCS Research). The OHTN Cohort Study is supported by the Ontario Ministry of Health and Long-Term Care. We also acknowledge the Public Health Ontario Laboratories for supporting record linkage with the HIV viral load test database.

The findings, opinions and conclusions are those of the authors and no endorsement of these by the Ontario HIV Treatment Network is intended or should be inferred.

The OHTN Cohort Study Research Team: The OHTN Cohort Study Team consists of Dr. Sean B. Rourke (Principal Investigator), University of Toronto and OHTN; Dr. Ann N. Burchell (Co-Principal Investigator), OHTN; Dr. Sandra Gardner, OHTN; Dr. Sergio Rueda, OHTN; Dr. Ahmed Bayoumi and Dr. Kevin Gough, St. Michael’s Hospital; Dr. Jeffrey Cohen, Windsor Regional Hospital; Dr. Curtis Cooper, Ottawa General Hospital; Dr. Don Kilby, University of Ottawa Health Services; Dr. Mona Loutfy and Dr. Fred Crouzat, Maple Leaf Medical Clinic; Dr. Anita Rachlis and Dr. Nicole Mittmann, Sunnybrook Health Sciences Centre; Dr. Janet Raboud and Dr. Irving Salit, Toronto General Hospital; Dr. Edward Ralph, St. Joseph’s Health Care; Dr. Roger Sandre, Sudbury Regional Hospital; and Dr. Gerald Evans and Dr. Wendy Wobeser, Hotel Dieu Hospital.

Author Contributions

Conceived and designed the experiments: SBR ANB SG EB SKYC. Analyzed the data: SKYC EB SG. Wrote the paper: SKYC. Acquisition, analysis, or interpretation of data: SBR ANB SG EB EC PG SKYC. Critical revision of the manuscript for important intellectual content: SBR ANB SG EB EC PG SKYC. Obtained funding: SBR ANB.


  1. 1. Williams P, Narciso L, Browne G, Roberts J, Weir R, Gafni A. The prevalence, correlates, and costs of depression in people living with HIV/AIDS in Ontario: implications for service directions. AIDS Educ Prev. 2005;17(2):119–30. pmid:15899750
  2. 2. Pence BW, Miller WC, Whetten K, Eron JJ, Gaynes BN. Prevalence of DSM-IV-defined mood, anxiety, and substance use disorders in an HIV clinic in the Southeastern United States. J Acquir Immune Defic Syndr. 2006;42(3):298–306. pmid:16639343
  3. 3. Bing EG, Burnam M a, Longshore D, Fleishman J a, Sherbourne CD, London a S, et al. Psychiatric disorders and drug use among human immunodeficiency virus-infected adults in the United States. Arch Gen Psychiatry. 2001;58(8):721–8. pmid:11483137
  4. 4. Parhami I, Fong TW, Siani A, Carlotti C, Khanlou H. Documentation of psychiatric disorders and related factors in a large sample population of HIV-positive patients in California. AIDS Behav. 2013;17(8):2792–801. pmid:23247363
  5. 5. Burnam MA, Bing EG, Morton SC, Sherbourne C, Fleishman JA, London AS, et al. Use of mental health and substance abuse treatment services among adults with HIV in the United States. Archives of general psychiatry. 2001;58(8):729–36. pmid:11483138
  6. 6. Vitiello B, Burnam MA, Bing EG, Beckman R, Shapiro MF. Use of psychotropic medications among HIV-infected patients in the United States. Am J Psychiatry. 2003;160(3):547–54. pmid:12611837
  7. 7. Cook J a, Burke-Miller JK, Grey DD, Cocohoba J, Liu C, Schwartz RM, et al. Do HIV-positive women receive depression treatment that meets best practice guidelines? AIDS Behav. 2014;18(6):1094–102. pmid:24402689
  8. 8. Asch SM, Kilbourne AM, Gifford AL, Burnam MA, Turner B, Shapiro MF, et al. Underdiagnosis of Depression in HIV: who are we missing? Journal of general internal medicine. 2003; 18(6): 450–60 pmid:12823652
  9. 9. Leserman J. HIV disease progression: depression, stress, and possible mechanisms. Biol Psychiatry. 2003;54(3):295–306. pmid:12893105
  10. 10. Cruess DG, Douglas SD, Petitto JM, Leserman J, Ten Have T, Gettes D, et al. Association of depression, CD8+ T lymphocytes, and natural killer cell activity: implications for morbidity and mortality in Human immunodeficiency virus disease. Curr Psychiatry Rep. 2003;5(6):445–50. pmid:14609499
  11. 11. Leserman J, Petitto JM, Gu H, Gaynes BN, Barroso J, Golden RN, et al. Progression to AIDS, a clinical AIDS condition and mortality: psychosocial and physiological predictors. Psychol Med. 2002;32(6):1059–73. pmid:12214787
  12. 12. Leserman J, Petitto JM, Golden RN, Gaynes BN, Gu H, Perkins DO, et al. Impact of stressful life events, depression, social support, coping, and cortisol on progression to AIDS. Am J Psychiatry. 2000;157(8):1221–8. pmid:10910783
  13. 13. Cook JA, Grey D, Burke J, Cohen MH, Gurtman AC, Richardson JL, et al. Depressive symptoms and AIDS-related mortality among a multisite cohort of HIV-positive women. Am J Public Health. 2004;94(7):1133–40. pmid:15226133
  14. 14. Ickovics JR, Hamburger ME, Vlahov D, Schoenbaum EE, Schuman P, Boland RJ, et al. Mortality, CD4 Cell Count Decline, and Depressive Symptoms Among HIV-Seropositive Women. JAMA J Am Med Assoc. 2001;285(11):1466–1474.
  15. 15. Jia H, Uphold CR, Wu S, Reid K, Findley K, Duncan PW. Health-related quality of life among men with HIV infection: effects of social support, coping, and depression. AIDS Patient Care STDS. 2004;18(10):594–603. pmid:15630787
  16. 16. Kaaya S, Eustache E, Lapidos-Salaiz I, Musisi S, Psaros C, Wissow L. Grand challenges: Improving HIV treatment outcomes by integrating interventions for co-morbid mental illness. PLoS Med. 2013;10(5):e1001447. pmid:23700389
  17. 17. National Collaborating Centre for Mental Health. Depression in adults with a chronic physical health problem. Treatment and management Treatment and management [NICE Clinical Guidelines, no. 91]. London (UK); 2009.
  18. 18. Ramasubbu R, Beaulieu S, Taylor V. TThe CANMAT task force recommendations for the management of patients with mood disorders and comorbid medical conditions: diagnostic, assessment, and treatment principles. Annals of clinical psychiatry. 2012;24(1):82–90. pmid:22303524
  19. 19. U.S. preventive services task force. Screening for depression in adults: U.S. preventive services task force recommendation statement. Ann Intern Med. 2009;151(11):784–92. pmid:19949144
  20. 20. Meader N, Mitchell AJ, Chew-Graham C, Goldberg D, Rizzo M, Bird V, et al. Case identification of depression in patients with chronic physical health problems: a diagnostic accuracy meta-analysis of 113 studies. British Journal of General Practice; 2011;61(593):e808–20. pmid:22137418
  21. 21. Akena D, Joska J, Obuku EA, Amos T, Musisi S, Stein DJ. Comparing the accuracy of brief versus long depression screening instruments which have been validated in low and middle income countries: a systematic review. BMC Psychiatry. 2012;12:187. pmid:23116126
  22. 22. Tsai AC. Reliability and validity of depression assessment among persons with HIV in sub-Saharan Africa: systematic review and meta-analysis. J Acquir Immune Defic Syndr. 2014;66(5):503–11. pmid:24853307
  23. 23. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13. pmid:11556941
  24. 24. Radloff LS. The CES-D Scale: A Self-Report Depression Scale for Research in the General Population. Appl Psychol Meas. 1977;1(3):385–401.
  25. 25. Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SLT, et al. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol Med. 2002;32(6):959–76. pmid:12214795
  26. 26. Brooks RT. Factor Structure and Interpretation of the K10. Psychol Assess. 18(1):62–70. pmid:16594813
  27. 27. Baillie AJ. Predictive gender and education bias in Kessler’s psychological distress Scale (k10). Soc Psychiatry Psychiatr Epidemiol. 2005;40(9):743–8. pmid:16142511
  28. 28. Rourke SB, Gardner S, Burchell AN, Raboud J, Rueda S, Bayoumi AM, et al. Cohort profile: the Ontario HIV Treatment Network Cohort Study (OCS). Int J Epidemiol. 2013;42(2):402–11. pmid:22345312
  29. 29. Sheehan D V, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59 Suppl 2:22–33.
  30. 30. Sheehan D, Lecrubier Y, Harnett Sheehan K, Janavs J, Weiller E, Keskiner A, et al. The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. Eur Psychiatry. 1997;12(5):232–41.
  31. 31. Lecrubier Y. The Mini International Neuropsychiatric Interview (MINI). A short diagnostic structured interview: reliability and validity according to the CIDI. Eur Psychiatry. 1997;12(5):224–231.
  32. 32. Kalichman SC, Rompa D, Cage M. Distinguishing between overlapping somatic symptoms of depression and HIV disease in people living with HIV-AIDS. J Nerv Ment Dis. 2000;188(10):662–70. pmid:11048815
  33. 33. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45. pmid:3203132
  34. 34. Janes H, Pepe MS. Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika. 2009;96(2):371–82. pmid:22822245
  35. 35. Vittinghoff E. Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer; 2005.
  36. 36. Schäfer H. Constructing a cut-off point for a quantitative diagnostic test. Stat Med. 1989;8(11):1381–91. pmid:2692111
  37. 37. Vermont J, Bosson JL, François P, Robert C, Rueff A, Demongeot J. Strategies for graphical threshold determination. Comput Methods Programs Biomed. 1991;35(2):141–50. pmid:1914452
  38. 38. Bohning D, Bohning W, Holling H. Revisiting youden ‘ s index as a useful measure of the misclassification error in meta-analysis of diagnostic studies. Stat Methods Med Res. 2008;17(6): 543–54. pmid:18375457
  39. 39. Gallop RJ, Crits-Christoph P, Muenz LR, Tu XM. Determination and interpretation of the optimal operating point for ROC curves derived through generalized linear models. Understanding Statistics. 2003; 2(4): 219–242.
  40. 40. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–35. pmid:14615004
  41. 41. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5. pmid:15405679
  42. 42. Böhning D, Holling H, Patilea V. A limitation of the diagnostic-odds ratio in determining an optimal cut-off value for a continuous diagnostic test. Stat Methods Med Res. 2011;20(5):541–50. pmid:20639268
  43. 43. Janes H, Longton G, Pepe M. Accommodating Covariates in ROC Analysis. Stata J. 2009;9(1):17–39. pmid:20046933
  44. 44. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–43. pmid:6878708
  45. 45. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. CRC Press; 1994
  46. 46. StataCorp. Statistical Software. College Station, TX: Stata StataCorp LP; 2013.
  47. 47. Obuchowski NA, McClish DK. Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Stat Med. 1997;16(13):1529–42. pmid:9249923
  48. 48. Akena D, Joska J, Obuku EA, Stein DJ. Sensitivity and specificity of clinician administered screening instruments in detecting depression among HIV-positive individuals in Uganda. AIDS Care. 2013;25(10):1245–52. pmid:23398282
  49. 49. Spies G, Kader K, Kidd M, Smit J, Myer L, Stein DJ, et al. Validity of the K-10 in detecting DSM-IV-defined depression and anxiety disorders among HIV-infected individuals. AIDS Care. 2009;21(9):1163–8. pmid:20024776
  50. 50. Manea L, Gilbody S, McMillan D. Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis. CMAJ. 2012;184(3):E191–6. pmid:22184363
  51. 51. Khamseh ME, Baradaran HR, Javanbakht A, Mirghorbani M, Yadollahi Z, Malek M. Comparison of the CES-D and PHQ-9 depression scales in people with type 2 diabetes in Tehran, Iran. BMC Psychiatry. 2011;11(1):61.
  52. 52. Hermanns N, Kulzer B, Krichbaum M, Kubiak T, Haak T. How to screen for depression and emotional problems in patients with diabetes: comparison of screening characteristics of depression questionnaires, measurement of diabetes-specific emotional problems and standard clinical assessment. Diabetologia. 2006;49(3):469–77. pmid:16432706
  53. 53. Raboud J, Su D, Burchell AN, Gardner S, Walmsley S, Bayoumi AM, et al. Representativeness of an HIV cohort of the sites from which it is recruiting: results from the Ontario HIV Treatment Network (OHTN) cohort study. BMC Med Res Methodol. 2013;13(1):31.