The Patient Health Questionnaire (PHQ-9) is a self-report questionnaire commonly used to screen for depression, with ≥8–11 generally recommended as the cut-off. In Japan, studies of the validity of the PHQ-9 and PHQ-2 have been limited. In this study, we examined the utility of the PHQ-9 and PHQ-2 at an outpatient clinic in a Medical University Hospital in Japan.
New consecutive outpatients were included in the study. We administered the PHQ-9 to 574 patients, and acquired complete PHQ-9 and PHQ-2 data for 521 patients. Major depressive disorders were diagnosed according to the DSM-IV-TR.
Forty-two patients were diagnosed with major depressive disorders. The mean PHQ-9 (15.7) and PHQ-2 (3.8) scores of the patients with major depressive disorders were significantly higher than the scores of the patients without depression (6.0 (PHQ-9) and 1.8 (PHQ-2)). The best cut-off points for the PHQ-9 and PHQ-2 summary scores were ≥11 (sensitivity 0.76, specificity 0.81) and ≥3 (sensitivity 0.76, specificity 0.82), respectively. No relationship was observed between the age and PHQ-9 scores.
Citation: Suzuki K, Kumei S, Ohhira M, Nozu T, Okumura T (2015) Screening for Major Depressive Disorder with the Patient Health Questionnaire (PHQ-9 and PHQ-2) in an Outpatient Clinic Staffed by Primary Care Physicians in Japan: A Case Control Study. PLoS ONE 10(3): e0119147. https://doi.org/10.1371/journal.pone.0119147
Academic Editor: James Bennett Potash, University of Iowa Hospitals & Clinics, UNITED STATES
Received: October 30, 2014; Accepted: January 9, 2015; Published: March 19, 2015
Copyright: © 2015 Suzuki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All relevant data are within the paper.
Funding: The authors have no support or funding to report.
Competing interests: The authors have declared that no competing interests exist.
Depression is a mental illness that is associated with disability and a reduced quality of life for the person with the disorder . Patients with unrecognized depression consult with their physician more frequently than those without, and consume greater health care resources . The World Health Organization (WHO) Psychological Problems in General Health Care study reported that primary care physicians diagnosed major depression in only 42% of adult patients who had the condition . Two-thirds of primary care patients with depression presented with somatic symptoms (eg, headache, back problems or chronic pain), making the detection of depression more difficult . Improvements in detection can lead to earlier treatment, and treatment of major depressive disorders is thought to result in improved outcomes, such as a better quality of life, better work life and minimization of the risk of suicide . These findings suggested that an easy and reliable method to detect depression should be used routinely, especially in the primary care setting.
The Patient Health Questionnaire (PHQ-9) is a self-report questionnaire consisting of nine questions asking about depression symptoms, and is commonly used to screen for depression, with a score of 8–11 recommended as the cut-off score, but the optimal cut-off score may differ depending on the setting . The PHQ-2 is comprised of the first two questions from the PHQ-9 For example, sensitivity and specificity of the PHQ-2 for diagnosing major depression were 86% and 76% (cut-off point ≥2) and 61% and 92% with (cut-off point ≥3); for the PHQ-9, they were 74% and 91% (cut-off point ≥10) in New Zealand . In the Unites States, sensitivity and specificity of the PHQ-2 for diagnosing major depression were 91% and 65% (cut-off point ≥1) and 61% and 92% with (cut-off point ≥3); for the PHQ-9, they were 54% and 90% (cut-off point ≥10) . On the other hand, studies of the validity of the PHQ-9 and PHQ-2 have been limited in Japan. Inagaki et al. reported that a PHQ-9 cut-off score of ≥5 (sensitivity 0.77, specificity 0.95) and a PHQ-2 cut-off score of ≥3 was useful for detected depression in the primary care center of a rural Japanese hospital . Inoue et al. reported that the optimal cut-off score was ≥14 (sensitivity 0.86, specificity 0.67) for “current major depressive episodes” in a clinic specializing in psychiatric care . Therefore, a clear cut-off score for the PHQ-9 has not yet been established in Japan. In this study, we examined the utility of the PHQ-9 and PHQ-2 for detecting depression at an outpatient clinic in a Medical University Hospital in Japan.
Materials and Methods
Study design and participants
From October 2013 to July 2014, consecutive outpatients who visited the Department of General Medicine, Asahikawa Medical University Hospital, as new patients were included in the study. Asahikawa Medical University Hospital is located in Asahikawa City, which has a population of approximately 350,000 in the middle of Hokkaido Island, in the northernmost part of Japan. The hospital has 602 beds, and approximately 250 doctors work at the hospital to cover almost all medical problems. Among them, there are five primary care physicians working in the Department of General Medicine.
For ethical reasons, we excluded patients who couldn’t answer the PHQ-9 questionnaire since they were critically physically ill and needed emergency treatments. The PHQ-9 is a self-report questionnaire consisting of nine questions asking about symptoms of depression. The Japanese version of the PHQ-9  was administered to patients who agreed to participate in this study and provided written informed consent before consultation.
We diagnosed major depressive episodes using the Japanese version of the Major Depression Episode module of the Mini-International Neuropsychiatric Interview (MINI). The MINI is a short, structured diagnostic interview used as a tool to diagnose DSM-IV disorders, and the MINI Japanese version had reliably and validly for making DSM-III-R diagnoses, and can be performed in less than half of the time required for the Structured Clinical Interview for the DSM- III-R  Major depressive disorders were diagnosed according to the DSM-IV-TR. This study was approved by the ethics committee of Asahikawa Medical University Hospital in Japan.
To investigate the cut-off scores of the PHQ-2 and PHQ-9 for major depressive disorders, we generated receiver operating characteristic (ROC) curves, and calculated the area under the curves (AUC). The ROC curve plots the true positive rate versus the false positive rate over a range of cut-off values. It is considered that the best cut-off point is at or near the “shoulder” of the ROC curve, because as the sensitivity is progressively increased, there is little or no loss in specificity until very high levels of sensitivity are achieved. We calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), odds ratio, positive and negative likelihood ratios and overall accuracy of the PHQ-9 and PHQ-2. In order to test for an association between age and the PHQ-9 summary scores of the depressed and non-depressed subjects, we calculated Pearson’s product-moment correlation. Student’s t-test was used to compare whether there were significant differences between the depressed and non-depressed subjects in the age, PHQ-9 total scores and PHQ-2 scores.
A total of 650 outpatients who visited the Department of General Medicine, Asahikawa Medical University Hospital as new patients were included in the study. For ethical reasons, we excluded 76 patients who were critically ill from filling out the PHQ-9 forms, or who didn’t agree to provide consent for participation. We administered the PHQ-9 to 574 patients and acquired complete PHQ-9 data for 521 patients. The age of the 521 patients was 51.0 ± 19.4 (mean ± SD) years old. As shown in Table 1, the ICD-10 diagnoses of these 521 patients were widely distributed in almost all disease fields. Among them, 42 patients (8.1%) were diagnosed to have a major depressive disorder. The PHQ-9 and PHQ-2 scores in patients with depression were significantly higher than those in patients without depression (Table 2), strongly supporting that the PHQ-9 and PHQ-2 scores are capable of discriminating depression from other conditions.
Table 3 shows the sensitivity and specificity at different cut-off scores for the PHQ-9 and PHQ-2 summary scores in the patients with a major depressive disorder. Fig. 1 shows the ROC curves of the cut-off points of the PHQ-9 and PHQ-2 summary scores for major depressive disorders based on the sensitivity and specificity of the PHQ scores. The areas under the ROC curves of PHQ-9 and PHQ-2 were 0.880 and 0.845, respectively. The best cut-off points for the PHQ-9 and PHQ-2 summary scores determined to the nearest point from the “shoulder” of the ROC curve were 11 (sensitivity 0.762, specificity 0.806) and ≥3 (sensitivity 0.762, specificity 0.814), respectively. The overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value, odds ratio, positive likelihood ratio and negative likelihood ratio when the cut-off point was set at > 11 (PHQ-9) and > 3 (PHQ-2) are shown in Table 4, thus indicating a good screening performance.
TPF: True-positive fraction, FPF: False-positive fraction
Fig. 2 shows the relationship between the age and the PHQ-9 total scores. The Pearson’s product-moment correlation revealed no correlation between age and the PHQ-9 total scores in not only depressed patients, but also in patients without depression.
Some meta-analyses of the PHQ-9 scores for diagnosing depression have been reported [5, 12]. For example, Laura et al. reported that the PHQ-9 had acceptable diagnostic properties for detecting major depressive disorder for cut-off scores between ≥8 and 11 . Gilbody et al. reported that the PHQ-9 (cut-off point ≥10) is an acceptable instrument for detecting major depressive disorders in primary care patients (sensitivity 0.80, specificity 0.92) .
In Japan, some previous studies have described the cut-off points of the PHQ-9. Inagaki et al. reported that the best PHQ-9 cut-off point was ≥5 for depression in the primary care setting at a Japanese rural hospital (mean age of patients: 73.5 years old). In the present study, the PHQ-9 score of patients without depression was 6.0 ± 5.2 (mean ± SD), suggesting that over a half of the patients without depression would be included in patients considered to have depression if the cut-off point of the PHQ-9 was set at ≥5. We would therefore suggest that the best PHQ-9 cut-off point (≥5) proposed by Inagaki et al. would be too small. As demonstrated in the present report, we calculated that the best cut-off points for the PHQ-9 total score was ≥11 (sensitivity 0.76, specificity 0.81) in an outpatient clinic at a Japanese Medical University Hospital (mean age 51.7). This cut-off point was similar to that of the meta-analysis of studies from other countries, suggesting that the best cut-off point for the PHQ-9 total score should also be around 10 in Japan.
It was considered that the different cut-off points in Japanese studies may have been due to the differences in the mean age of the study populations. With regard to this point, we demonstrated in the present study that there was no relationship between age and the PHQ-9 total scores in not only depressed patients, but also the patients without depression. These results may suggest that the age difference is not a key factor to explain the different finding with regard to the best cut-off point for the PHQ-9 score between the present study and the study by Inagaki et al. . Further studies are needed to explain the differences and to confirm the best cut-off point.
In our study, the PHQ-2 (cut-off point ≥3) had a sensitivity of 0.76 and specificity of 0.76, which were similar to those of the PHQ-9 (sensitivity 0.81, specificity 0.76, cut-off point ≥11). Lowe et al. reported that the PHQ-2 (cut-off point ≥3) had a sensitivity of 0.87 and a specificity of 0.78 for major depressive disorder in several outpatient clinics . In Japan, Inagaki et al. reported that the PHQ-2 (cut-off point ≥3) had a sensitivity of 0.77 and a specificity of 0.95, and mentioned that the PHQ-2 may be preferred in screening for patients with major depression in internal medicine outpatient clinics . These findings clearly indicate that the PHQ-2 cut-off point should also be ≥3 in Japan.
The PHQ-9 and PHQ-2 are both useful instruments for screening for major depressive disorder in an outpatient clinic in a Japanese hospital. In this study, the PHQ-2 (cut-off point ≥3) and the PHQ-9 (cut-off point ≥11) should be applied to identify patients with depression in the primary care setting in Japan.
Conceived and designed the experiments: KS TO. Performed the experiments: KS SK MO TN TO. Analyzed the data: KS TO. Contributed reagents/materials/analysis tools: KS TO. Wrote the paper: KS TO.
- 1. Simon GE, Chisholm D, Treglia M, Bushnell D, LIDO Group. Course of depression, health services costs, and work productivity in an international primary care study. Gen Hosp Psychiatry. 2002;24(5): 328–35. pmid:12220799
- 2. Sartorius N, Ustün TB, Lecrubier Y, Wittchen HU. Depression comorbid with anxiety: results from the WHO study on psychological disorders in primary health care. Br J Psychiatry Suppl. 1996;30: 38–43. pmid:8864147
- 3. Tylee A, Gandhi P. The importance of somatic symptoms in depression in primary care. Prim Care Companion J Clin Psychiatry. 2005;7(4): 167–76. pmid:16163400
- 4. National Collaborating Centre for Mental Health. Depression:management of depression in primary and secondary care [NICE Clinical Guidelines, no.23]. London (UK) National Institute for Health and Clinical Excellence. 2004;Rep. no. CG90.
- 5. Manea L, Gilbody S, McMillan D. Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis. CMAJ. 2012;21;184(3): e191–6 pmid:22184363
- 6. Arroll B, Goodyear-Smith F, Grengle S, Gun J, Kerse N, Fishman T, et al. Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Ann Fam Med. 2010;8(4): 348–53. pmid:20644190
- 7. McManus D, Pipkin SS, Whooley MA. Screening for depression in patients with coronary heart disease (data from the Heart and Soul study). Am J Cardiol. 2005;96(8): 1076–81.8. pmid:16214441
- 8. Inagaki M, Ohtsuki T, Yonemoto N, Kawashima Y, Saitoh A, et al. Validity of the Patient Health Questionnaire (PHQ)-9 and PHQ-2 in general internal medicine primary care at a Japanese rural hospital: a cross-sectional study. Gen Hosp Psychiatry. 2013;35(6): 592–7. pmid:24029431
- 9. Inoue T, Tanaka T, Nakagawa S, Nakato Y, Kameyama R, Boku S, et al. Utility and limitations of PHQ-9 in a clinic specializing in psychiatric care. BMC Psychiatry. 2012;3;12: 73. pmid:22759625
- 10. Muramatsu K, Miyaoka H, Kamijima K, Muramatsu Y, Yoshida M, Otsubo T, et al. The patient health questionnaire, Japanese version: validity according to the mini-international neuropsychiatric interview-plus. Psychol Rep. 2007;101(3 Pt 1): 952–60. pmid:18232454
- 11. Otsubo T, Tanaka K, Koda R, Shinoda J, Sano N, Tanaka S, et al. Reliability and validity of Japanese version of the Mini-International Neuropsychiatric Interview. Psychiatry Clin Neurosci. 2005;59(5): 517–26. pmid:16194252
- 12. Gilbody S, Richards D, Brealey S, Hewitt C. Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis. J Gen Intern Med. 2007;22(11): 1596–602. pmid:17874169
- 13. Löwe B, Kroenke K, Gräfe K. Detecting and monitoring depression with a two-item questionnaire (PHQ-2). J Psychosom Res. 2005;58(2): 163–71. pmid:15820844