Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

PHQ-9 and PHQ-2 for Screening Depression in Chinese Rural Elderly

  • Zi-wei Liu,

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China

  • Yu Yu,

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China

  • Mi Hu,

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China

  • Hui-ming Liu,

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China

  • Liang Zhou,

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China

  • Shui-yuan Xiao

    Affiliation Department of Social Medicine and Health Management, Xiangya School of Public Health, Central South University, Changsha, Hunan, China



This study aimed to explore cut-off scores of the 9-item Patient Health Questionnaire (PHQ-9) and 2-item Patient Health Questionnaire (PHQ-2) for depression screening in Chinese rural elderly.


A cross-sectional study was conducted on 839 residents aged 60 years and above in rural areas of Liuyang County. PHQ-9 was adopted to evaluate depression. The Structured Clinical Interview for DSM Disorders (SCID-I) was adopted to diagnose major depressive disorder (MDD) as a golden standard. Sensitivity, specificity, positive and negative predictive value, positive and negative likelihood ratio, Youden’s index and the receiver operating characteristic (ROC) curve were analyzed on PHQ-9 and PHQ-2.


The Cronbach's alphas of PHQ-9 and PHQ-2 were 0.82 and 0.76, respectively. The score of 8 of the PHQ-9 showed the highest Youden’s index of 0.85, with a sensitivity of 0.97 and specificity of 0.89 respectively, and the area under the ROC curve (AUC) was 0.97 (95% CI: 0.96–0.98). The score of 3 of PHQ-2 showed the highest Youden’s index of 0.79, with both sensitivity and specificity were 0.90 and the AUC was 0.94 (95% CI: 0.90–0.97).


Both PHQ-9 and PHQ-2 are valid screening instruments for depression in the rural elderly in China, with recommended cut-off scores of 8 and 3 respectively.


Depression is common among the elderly. In China, the prevalence of depressive symptoms was 22.7% among the elderly, rural areas higher than urban areas [1]. Depression is associated with various severe health-related outcomes as function impairment and suicide [25]. Catastrophic outcomes are preventable if depression was detected and treated timely and appropriately. Because of too complicated and time consuming, questionnaires as Geriatric Depression Scale (GDS), Beck Depression Inventory (BDI), Centre for Epidemiologic Studies Depression Scale (CES-D), hindered the effect and efficiency of depression screening among rural elderly [68]. PHQ-9 is relatively favored on depression screening among rural elderly due to its simplicity and time efficiency [9, 10].

The PHQ-9 is a brief, self-explanatory questionnaire developed for depression symptoms evaluation [11]. The items were designed according to the diagnosis criteria of major depressive disorder (MDD) in the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) [11]. PHQ-9 had been used widely in rural elderly due to its good reliability and validity in this population [1215]. PHQ-2, the abridged version of PHQ-9, was composed of the first two items of the PHQ-9 [16]. There is evidence showing good reliability and validity of PHQ-2 in urban population, but none in rural elderly [17].

A cut-off score of PHQ-9 is used to differentiate between a subject with or without MDD for the purpose of early intervention. Determining a cut-off score should not only consider indicators such as sensitivity and specificity, but also the aims and settings of utilization. Ten points as the cut-off score for depression screening was usually adopted in most studies in China [12, 1820]. One study conducted in the general population in Shanghai recommended 7 points as the cutoff of PHQ-9 [21]. Another study conducted in urban elderly of Hangzhou suggested the cut-off score of PHQ-9 at 9 [17]. For PHQ-2, the score of 3 was recommended as the cutoff for screening depression in most published studies [12, 20, 22, 23]. However, there is little evidence to recommend optimal cut-off values of PHQ-2 and PHQ-9 in rural areas of China. Considering the differences in social-economic situation and depression-related factors between rural and urban elderly [13, 24], the aim of the present study was to fill in the knowledge gap by exploring cut-off scores of PHQ-9 and PHQ-2 for detecting depression among rural elderly in China.

Materials and Methods

Study setting

Liuyang County locates in the northeast of Hunan Province, China, with a population of 142 million. The number of people aged 60 and above was 194,000 in the year of 2010 [25]. Administratively, Liuyang is divided into 4 districts in urban areas and 33 towns in rural areas. The average annual family income in Liuyang was approximately 24,236 CNY ($3,740 USD) in urban areas and 13,193 CNY ($2,036 USD) in rural areas in 2011 [26].

Design and participants

This was a cross-sectional study, conducted as a part of “the 2010 National Science and Technology Support Program: The Assessment, Warning and Intervention Study on the Emotional Problems of the Chinese Population”. Ethics approval was granted by the Institutional Review Board of the Xiangya School of Public Health, Central South University. The target population was residents aged 60 and above who have lived in rural areas of Liuyang for over 6 months. Eligibility criteria of participants included being 60 years of age and above at the time of interview and a resident in the survey site for half a year or more. A multistage cluster-sampling method was adopted to identify subjects (Fig 1). In the first stage, two towns (Gaoping and Yong’an) were randomly selected from 33 towns in the rural areas. In the second stage, two administrative villages were randomly selected from each town. Administrative village was the basic administrative organization in the rural areas, composed by several geographically adjacent natural villages. In the third stage, two natural villages were randomly selected from each administrative village. Natural villages refer to villages that were naturally formed by residents living together for a long time in a certain natural environment. Finally, all elderly (n = 1228) within eight natural villages were invited to take part in the study. Those who were not living in the areas during the research period, those with difficulty in communication due to serious physical or mental illness were excluded, resulting in a final sample of 860 residents. Among the 860 subjects, 15 refused to participate and 6 quit the study midway. In sum, 839 subjects completed the surveys with a response rate of 97.6%.


The survey was conducted from November 2010 to August 2011. Interviewers were composed of eight postgraduate students with medical education background, and two psychiatrists with SCID training experience. One of the psychiatrists qualified as SCID trainer was responsible for SCID training of all interviewers. All investigators had received consistent training before the investigation. Investigator training included understanding the objectives of the study, scales, the principle and requirements of interview, skills of asking questions and use of words. Interviewers conducted face-to-face interviews with each participant in their household after obtaining written informed consent. Approximately one hour was spent for the total interview and each household was reimbursed with small gifts such as kitchen utensils (equivalent to about USD $2). Since the first two items of the PHQ-9 were extracted to comprise the PHQ-2, we only administered the PHQ-9 and did not administer PHQ-2 separately. After the respondents completed the survey, a quality control person checked all information from interviews to ensure that there were no inconsistencies or missing items.



The PHQ-9 is a nine-item scale which was used to assess depressive symptoms. Each item in PHQ-9 asked about the frequency of a depressive symptom experienced in the two weeks prior to survey administration. The score of each item ranges from 0 (never) to 3 (almost every day) and the total score is 27. We used the Chinese print version of PHQ-9 in this study [27].


The SCID-I is a semi-structured diagnostic interview that is used to determine DSM-IV Axis I disorders (major mental disorders). The major depressive disorder episode was used for diagnosis of MDD. The Chinese version of SCID-I has been validated in the Chinese population [28, 29]. In this study, the result obtained through MDD module in SCID-I was considered as a gold standard. The current version of the SCID-I is based on the DSM-IV. The diagnostic criteria closely resemble those of the DSM-V.

Statistical analysis

Statistical analyses were performed using SPSS 13.0 software (SPSS/IBM, Chicago, IL). Cronbach’s alpha coefficients were calculated for assessing the internal consistency of PHQ-9 and PHQ-2. The distribution of data was identified by histogram plot combined with Kolmogorov-Smirnov test. The total score of PHQ-9 and individual item scores were treated as a continuous variable and ordinal variables respectively. The relationship between the total score and individual items was explored by Speaman’s correlation analyses. We used receiver operating characteristic (ROC) curve analysis to measure the overall accuracy of the tools, and used sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) to evaluate the validity of the tools. Cutoffs scores balancing sensitivity and specificity were determined by using the Youden index, which was calculated by (sensitivity + specificity– 1) [30, 31].


Participant characteristics and scores on the PHQ-9

The distribution of socio-demographic characteristics of the 839 subjects is shown in Table 1. Subjects ranged in age from 60 to 90 years with a mean age of 69.0 ± 7.1 years (±SD). Four hundred and forty eight (53.4%) were male and 391 (46.6%) were female. Diagnosed by the SCID-I, the prevalence of MDD was 6.8% (CI: 5.1% − 8.5%). The total score of PHQ-9 ranged from 0 to 27 with a median score of 2. The median score of each item was 0, with inter-quartile range of 0–1. The total score of PHQ-2 ranged from 0 to 6 with a median score of 0.

Reliability and item analysis

The PHQ-9 had a Cronbach’s alpha of 0.82, with the correlations between the total score and each item ranging from 0.45 to 0.71(p < 0.001) (see Table 2). Cronbach’s alpha of PHQ-2 was 0.76. The correlations between the total scores of the PHQ-2 and each item were 0.81 and 0.90 (p < 0.001), respectively.

Cut-off score for PHQ-9

For PHQ-9, the sensitivity, specificity, PPV, NPV and likelihood ratios of different cut-off scores are presented in Table 3. The score of 8 on PHQ-9 showed the highest Youden’s index of 0.85, with sensitivity and specificity were 0.97 and 0.89 respectively. The area under the ROC curve (AUC) was 0.97 (standard errors = 0.01, 95% confidence interval: 0.96–0.98), which also supported the criterion validity of PHQ-9 at the score of 8 (Fig 2).

Fig 2. The receiver operating characteristic (ROC) curve of the PHQ-9 and PHQ-2 versus the SCID-I for a depression diagnosis.

Table 3. Sensitivity, specificity, predictive values, and likelihood ratios at various cut-off scores of the PHQ-9.

Cut-off score for PHQ-2

For PHQ-2, the sensitivity, specificity, and likelihood ratios of different cut-off scores are presented in Table 4. The score of 3 on PHQ-2 showed the highest Youden’s index of 0.79, with both sensitivity and specificity were 0.90. ROC curve analysis showed that AUC = 0.94, standard errors = 0.02, 95% CI: 0.90–0.97.

Table 4. Sensitivity, specificity, predictive values, and likelihood ratios at various cut-off scores of the PHQ-2.

Comparison of screening performance between PHQ-9 and PHQ-2

The Cronbach’s alpha was higher in PHQ-9 than in PHQ-2. The AUC was higher in PHQ-9 than in PHQ-2. PHQ-9 with 8 as cutoff showed higher sensitivity and accuracy than PHQ-2 with 3, but similar PPV, NPV, and PLR values as PHQ-2 with 3. When the score of 8 for PHQ-9 and 3 for PHQ-2 were adopted, 16.9% and 15.7% of subjects were detected to have possible depression.


In the present study, the results of this study suggest that the PHQ-9 and PHQ-2 are valid instruments for depression screening among the elderly population in the rural areas of China. Based on examination of indicators as Youden’s index, sensitivity, specificity and AUC, the score of 8 for PHQ-9 and the score of 3 for PHQ-2 were recommended as cut-off scores.

It has been proved that PHQ-9 was suitable for the elderly population with good reliability and validity [7, 9]. The score of 10 was commonly adopted as a cut-off score to distinguish individual with MDD from those without it in most cases [32]. In this study, the score of 8 showed the highest Youden’s index with a better balance between sensitivity and specificity than on the score of 10. This indicates PHQ-9 may have a better performance on identifying MDD among elderly in rural China when the cut-off set at 8. However, determining a cut-off score should not only consider indicators such as sensitivity and specificity, but also consider the aims and settings of utilization. When used as a screening instrument to identify elderly at high risk for MDD, there are potential dangers if subjects with MDD risk had not been identified [3, 5, 33]. Higher false positive rate was acceptable under this circumstance. According to our results, in this case scores of 6 and 7 could be adopted as cut-off, as the sensitivity of screening reached 1.00 and 0.98 respectively. On the other hand, when used in research purpose and high specificity is demanded, the scores of 8 or higher could be used as cut-offs to avoid too many false positives.

In our study, PHQ-2 had balanced sensitivity of 0.90 and specificity of 0.90 at the cut-off score of 3, which were consistent with previous studies in urban Chinese elderly [17] and in other population [20, 27, 34]. Based on results of the present study and previous evidence, the score of 3 is recommended for PHQ-2 to screen depression in rural elderly in China.

The screening performance of both PHQ-9 and PHQ-2 is good in identifying elderly with depression in rural areas. The choice of PHQ-9 or PHQ-2 should depend on the purposes and settings. As PHQ-2 only use the first two items of PHQ-9, the administrative time of it is markedly shorter than PHQ-9. When time is the priority of consideration, for instance used by busy primary care providers, PHQ-2 may be more suitable. In addition, when PHQ-9 on the score of 8 and PHQ-2 on the score of 3, the Youden’s index both reach to the best, which means they both have the highest performance on screening. However, all indicators, including sensitivity, Youden’s index and AUC higher in PHQ-9 than PHQ-2 indicating that PHQ-9 has a better accuracy than PHQ-2 and more suitable for research purposes.

There may be a limitation in this study. For each individual case, both PHQ-9 and SCID-I were conducted by the same interviewer. Results of SCID-I may be influenced by the PHQ-9 due to the priming effect. Although in our study PHQ-9 scores were not calculated during the interview, and there were 30 minute interval between the administration of PHQ-9 and the interview of SCID-I, priming effects could not be excluded completely.


The results of this study suggest that the PHQ-9 and PHQ-2 are valid screening tools for depression in Chinese rural elderly, with a recommended cut-off score of 8 for the PHQ-9 and a cut-off score of 3 for the PHQ-2.

Author Contributions

Conceived and designed the experiments: SYX LZ. Performed the experiments: MH HML. Analyzed the data: ZWL. Contributed reagents/materials/analysis tools: YY. Wrote the paper: ZWL YY.


  1. 1. Zhang L, Xu Y, Nie H, Zhang Y, Wu Y. The prevalence of depressive symptoms among the older in China: a meta-analysis. International journal of geriatric psychiatry. 2012;27(9):900–6. pmid:22252938.
  2. 2. Reinlieb M, Ercoli LM, Siddarth P, St Cyr N, Lavretsky H. The patterns of cognitive and functional impairment in amnestic and non-amnestic mild cognitive impairment in geriatric depression. The American journal of geriatric psychiatry: official journal of the American Association for Geriatric Psychiatry. 2014;22(12):1487–95. pmid:24315561.
  3. 3. Beekman AT, Penninx BW, Deeg DJ, de Beurs E, Geerling SW, van Tilburg W. The impact of depression on the well-being, disability and use of services in older adults: a longitudinal perspective. Acta psychiatrica Scandinavica. 2002;105(1):20–7. pmid:12086221.
  4. 4. Hall CA, Reynolds-Iii CF. Late-life depression in the primary care setting: challenges, collaborative care, and prevention. Maturitas. 2014;79(2):147–52. pmid:24996484; PubMed Central PMCID: PMC4169311.
  5. 5. Schulz R, Drayer RA, Rollman BL. Depression as a risk factor for non-suicide mortality in the elderly. Biological psychiatry. 2002;52(3):205–25. pmid:12182927.
  6. 6. Lakkis NA, Mahmassani DM. Screening instruments for depression in primary care: a concise review for clinicians. Postgraduate medicine. 2015;127(1):99–106. pmid:25526224.
  7. 7. Phelan E, Williams B, Meeker K, Bonn K, Frederick J, Logerfo J, et al. A study of the diagnostic accuracy of the PHQ-9 in primary care elderly. BMC family practice. 2010;11:63. pmid:20807445; PubMed Central PMCID: PMC2940814.
  8. 8. Maurer DM. Screening for depression. American family physician. 2012;85(2):139–44. pmid:22335214.
  9. 9. Li ZH, Xiao YZ, Xie Z, Chen LZ, Xiao SY. Use of Patient Health Questionnaire-9(PHQ-9) among Chinese Rural Elderly. Chinese Journal of Clinical Psychology 2011;19(2):171–4(in Chinese).
  10. 10. Bland P. Tackling anxiety and depression in older people in primary care. The Practitioner. 2012;256(1747):17–20, 2–3. pmid:22720455.
  11. 11. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. Journal of general internal medicine. 2001;16(9):606–13. pmid:11556941; PubMed Central PMCID: PMC1495268.
  12. 12. Inagaki M, Ohtsuki T, Yonemoto N, Kawashima Y, Saitoh A, Oikawa Y, et al. Validity of the Patient Health Questionnaire (PHQ)-9 and PHQ-2 in general internal medicine primary care at a Japanese rural hospital: a cross-sectional study. General hospital psychiatry. 2013;35(6):592–7. pmid:24029431.
  13. 13. Su D, Wu XN, Zhang YX, Li HP, Wang WL, Zhang JP, et al. Depression and social support between China' rural and urban empty-nest elderly. Archives of gerontology and geriatrics. 2012;55(3):564–9. pmid:22776885.
  14. 14. Naik AD, White CD, Robertson SM, Armento ME, Lawrence B, Stelljes LA, et al. Behavioral health coaching for rural-living older adults with diabetes and depression: an open pilot of the HOPE Study. BMC geriatrics. 2012;12:37. pmid:22828177; PubMed Central PMCID: PMC3542105.
  15. 15. Daniulaityte R, Falck R, Wang J, Carlson RG, Leukefeld CG, Booth BM. Predictors of depressive symptomatology among rural stimulant users. Journal of psychoactive drugs. 2010;42(4):435–45. pmid:21305908; PubMed Central PMCID: PMC3320035.
  16. 16. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Medical care. 2003;41(11):1284–92. pmid:14583691.
  17. 17. Chen S, Conwell Y, Vanorden K, Lu N, Fang Y, Ma Y, et al. Prevalence and natural course of late-life depression in China primary care: a population based study from an urban community. Journal of affective disorders. 2012;141(1):86–93. pmid:22464006; PubMed Central PMCID: PMC3566241.
  18. 18. Manea L, Gilbody S, McMillan D. A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. General hospital psychiatry. 2015;37(1):67–75. pmid:25439733.
  19. 19. Li C, Friedman B, Conwell Y, Fiscella K. Validity of the Patient Health Questionnaire 2 (PHQ-2) in identifying major depression in older people. Journal of the American Geriatrics Society. 2007;55(4):596–602. pmid:17397440.
  20. 20. Yu X, Stewart SM, Wong PT, Lam TH. Screening for depression with the Patient Health Questionnaire-2 (PHQ-2) among the general population in Hong Kong. Journal of affective disorders. 2011;134(1–3):444–7. pmid:21665288.
  21. 21. Wang W, Bian Q, Zhao Y, Li X, Wang W, Du J, et al. Reliability and validity of the Chinese version of the Patient Health Questionnaire (PHQ-9) in the general population. General hospital psychiatry. 2014;36(5):539–44. pmid:25023953.
  22. 22. Arroll B, Goodyear-Smith F, Crengle S, Gunn J, Kerse N, Fishman T, et al. Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Annals of family medicine. 2010;8(4):348–53. pmid:20644190; PubMed Central PMCID: PMC2906530.
  23. 23. Lowe B, Kroenke K, Grafe K. Detecting and monitoring depression with a two-item questionnaire (PHQ-2). Journal of psychosomatic research. 2005;58(2):163–71. pmid:15820844.
  24. 24. Wu XL, Li J, Wang L. The Analysis of Depressive Symptom among Chinese Elderly People. POPULATION JOURNAL. 2010;183(5):43–7(In Chinese).
  25. 25. Bureau of Statistics of Changsha. The Aging in Liuyang Count. 2012:
  26. 26. Bureau of Statistics of Changsha. Statistical Communiqué of the Liuyang county on the 2011 National Economic and Social Development. 2011:
  27. 27. Yeung A, Fung F, Yu SC, Vorono S, Ly M, Wu S, et al. Validation of the Patient Health Questionnaire-9 for depression screening among Chinese Americans. Comprehensive psychiatry. 2008;49(2):211–7. pmid:18243896; PubMed Central PMCID: PMC2268021.
  28. 28. Qin X, Wang W, Jin Q, Ai L, Li Y, Dong G, et al. Prevalence and rates of recognition of depressive disorders in internal medicine outpatient departments of 23 general hospitals in Shenyang, China. Journal of affective disorders. 2008;110(1–2):46–54. pmid:18261805.
  29. 29. Xu J, Jiang C, Gao Y, Liu Q, Jia S, Zhou L. The Research of DSM -IV SCID in Psychological Autopsy Journal of International Psychiatry. 2011;38(4):201–4 (in Chinese).
  30. 30. McGuire AW, Eastwood JA, Macabasco-O'Connell A, Hays RD, Doering LV. Depression screening: utility of the patient health questionnaire in patients with acute coronary syndrome. American journal of critical care: an official publication, American Association of Critical-Care Nurses. 2013;22(1):12–9. pmid:23283084.
  31. 31. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5. pmid:15405679.
  32. 32. Kroenke K, Spitzer RL, Williams JB, Lowe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General hospital psychiatry. 2010;32(4):345–59. pmid:20633738.
  33. 33. Phillips MR, Zhang J, Shi Q, Song Z, Ding Z, Pang S, et al. Prevalence, treatment, and associated disability of mental disorders in four provinces in China during 2001–05: an epidemiological survey. Lancet. 2009;373(9680):2041–53. pmid:19524780.
  34. 34. Suzuki K, Kumei S, Ohhira M, Nozu T, Okumura T. Screening for major depressive disorder with the Patient Health Questionnaire (PHQ-9 and PHQ-2) in an outpatient clinic staffed by primary care physicians in Japan: a case control study. PloS one. 2015;10(3):e0119147. pmid:25789476; PubMed Central PMCID: PMC4366166.