People with diabetes need a lower cut-off than others for depression screening with PHQ-9

Aims This study evaluated the psychometric characteristics of the Polish version of the PHQ-9 in detecting major depression (MDD) and ‘MDD and/or dysthymia’ in people with and without type 2 diabetes. Methods Participants were randomly selected from a diabetes outpatient facility (N = 216) and from among patients admitted to a medical center and psychiatric hospital (N = 99). The participants completed the PHQ-9. The Hamilton Depression Rating Scale and the Mini International Neuropsychiatric Interview were used to identify the presence of psychiatric symptoms. The optimal cut-offs for PHQ-9 in people with and without type 2 diabetes were investigated based on two methods: 1) Youden’s index which identifies cut-off points useful in scientific research; 2) a second method of two-stage screening for depressive disorders to provide guidance for clinical practice. Results The Polish version of the PHQ-9 is a reliable and valid screening tool for depression in people with and without type 2 diabetes. An optimal cut-off of ≥ 7 was indicated by Youden’s index and ≥ 5 by the two-stage method for screening for MDD and ‘MDD and/or dysthymia’ in the group with type 2 diabetes. A cut-off of ≥ 11 was optimal for screening for both MDD and ‘MDD and/or dysthymia’ among people without diabetes (Youden’s index). The two-stage approach suggested a ≥ 10 score for screening for MDD and ≥ 9 for screening for ‘MDD and/or dysthymia’ in people without diabetes. Conclusions A lower cut-off score of the PHQ-9 is recommended for people with type 2 diabetes as compared to the general population.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 current major depression but may rather take the form of dysthymia. Given the higher risk of depression in people with diabetes, it is essential to routinely screen for depressive symptoms in this group, as prompt provision of appropriate treatment will facilitate the mental and physical condition of patients with comorbidities [2,15].
Short, simple self-report screening instruments for depression are available, e.g. the Beck Depression Inventory (BDI) [27], the Hamilton Depression Rating Scale (HDRS) [28] and the Patient Health Questionnaire-Nine Item (PHQ-9) [29]. Nevertheless, the recognition of mental health problems such as depression or anxiety does not exceed 50% [29,30]. Most screening tools have limited capability to discriminate between the overlapping symptoms of mental and somatic disorders [31,32]. The overlapping symptoms between mood disorders and diabetes, e.g. sleep problems, feeling tired/loss of energy or appetite changes are included in the BDI [27], HDRS [28], and the PHQ-9 [29]. However, these problems are commonly reported by people with diabetes [33][34][35]. The effective detection of depressive symptoms requires a suitable threshold, i.e. the lowest possible score on a standardized test that a patient must achieve to be considered as having significant depressive symptoms. However, such cutoffs vary depending on the population that is being considered [36]. It is thus possible that there is under-recognition of depression in people with diabetes caused by using inadequate cut-offs, even if these cut-off scores can be useful for people without diabetes. There is thus a need to examine whether these cut-offs might be different in people with diabetes and in addition establish the appropriate balance between sensitivity and specificity for the screening tool in this specific population. Given the partial overlapping of some of the symptoms of diabetes and depression it is possible that the optimal balance between the sensitivity and specificity for people with T2DM may be achieved with a different cut-off score than in healthy people. As shown by van Steenbergen-Weijenburg et al. [37], the cut-off point may be higher in chronically ill patients to correctly identify MDD in the chronically ill than in a population with less severe illnesses [37]. However, depending on the population of patients, there might also be an opposite tendency to neglect the symptoms of depressive disorders and attribute the depressive symptoms to the worsening course of diabetes-by both patients and clinicians. In such cases depression may be overlooked which in turn may lead to further deterioration of glycaemic control.
Given the high prevalence of depressive disorders in people with T2DM and the detrimental interactions between the two disorders, it is suggested that providing the ideal therapy of depression for this population would be an intervention that reduces the symptoms of depression and improves glycemic control concurrently [38]. Therefore it is important to establish a cut-off point which will provide good sensitivity and allow an effective detection of comorbid depressive disorder. The optimal cut-off point may be lower in people with diabetes when compared to the general population, considering the sensitivity of the screening tool is likely to improve with a lower cut-off point.
The recognition of depressive symptoms is crucial in order to provide appropriate health care and the PHQ-9 was developed to fulfill this purpose [29,39]. The PHQ-9, in comparison with other tools for screening for depressive symptoms, complies with five requirements-it is brief, self-administered, multipurpose, in the public domain, and easy to score [40,41]. The PHQ-9 has shown adequate reliability and validity in various populations and is widely used as a validated screening tool in primary care [29,39,42]. Despite its widespread use and availability in many languages, there is a lack of research evaluating the psychometric properties of the adapted tool, and the English standards are commonly used [43]. This practice does not always allow investigators to achieve the right results.
Using the structured mental health professional interview as the criterion standard, a PHQ-9 score of � 10 was reported as the optimal cut-off for major depression with 88% sensitivity and 88% specificity in primary care [29] and medical settings (91.7% sensitivity, 78.3% specificity) [39]. The validation of the PHQ-9 for screening for MDD in the general population in Brazil determined a value > 9 as the most optimal cut-off point [44], while among the Chinese general population a score of > 7 was reported when compared with the Mini International Neuropsychiatric Interview (MINI) as the diagnostic standard [41]. In the Polish sample of the general patient population, the sensitivity and specificity of the PHQ-9 in detecting a MDD were 82% and 89%, respectively [42,43]. The authors reported the best optimal cut-off score of > 12 (� 13).
In their very interesting study, Stafford et al. [45] validated the PHQ-9 among cardiac patients in general hospitals. The participants were assigned to two groups, i.e. to MDD or to 'any depressive disorder', according to the MINI as the criterion standard. The optimal cut-off score for 'any depressive disorder' among cardiac patients is � 5 (81.5 sensitivity and 80.6 specificity), whereas an optimal cut-off score of � 6 was recommended for MDD (sensitivity = 82.9%; specificity = 78.7%). Thus the standard cut-off score of � 10 becomes inappropriate for the recognition of depression in cardiac patients. Also, in the Polish sample for hospitalized elderly patients, a score of > 6 was the optimal cut-off point [46].
It is worth noting that there is evidence to suggest that when screening for depression in diabetes patients in specialized outpatient clinics, a cut-off point of � 12, has been reported to have a sensitivity of 75.7% and a specificity of 80.0% [38]. However, it is important to take into account the characteristics of the population in order to achieve accuracy in depression recognition by physicians, nurses and researchers.
Therefore, the main aims of our study were: (1) to assess both the reliability and validity of the Polish version of the PHQ-9 in patients with and without diabetes; (2) to determine the optimal cut-off point that would indicate a high probability of recognizing a disorder of major depression or of MDD and/or dysthymia, and to verify the hypothesis that the cut-off point is different for the Polish version (for Polish patients) than for the original English version in the general population.

Participants
The data reported here pertains to the Polish sample derived from the International Prevalence and Treatment for Diabetes and Depression (INTERPRET-DD) study which was a collaborative study among invited outpatient clinic attendees with T2DM in 14 different countries [47,48]. The investigators were psychiatrists recruited from leading university centers in Poland. The exclusion criteria were: diagnosis of Type 1 diabetes; diabetes lasting for less than 12 months; incomplete set of measures due to communication and/or cognitive difficulties; any life-threatening or severe conditions, currently admitted or planning admission for inpatient care to a hospital; pregnancy or childbirth in the last 6 months, clinical diagnosis of alcohol or other substance (not tobacco) dependence, or a diagnosis of schizophrenia. A total of 216 Polish individuals with T2DM (101 females, 115 males) took part in this study.
The data for the comparison group is derived from another study carried out by the authors of this paper [43]. The sample consisted of casual selected outpatients at the Department of Internal Diseases, Nephrology and Transplantology of the Central Clinical Hospital of the Ministry of Interior and Administration in Warsaw and of randomly selected outpatients of the Department of Psychiatry at Bródno Hospital. The comparison group was diverse in terms of health and consisted of 99 persons (54 women and 45 men).

Procedure
As part of the INTERPRET-DD study, each eligible individual completed a survey recording his/her age, duration of diabetes, family history of diabetes and presence/history of diabetes complications, any medications for mental health problems or documented diagnosis or treatment of any psychiatric condition(s), the most recent blood pressure measurement, HbA1c, height, weight, location of his/her accommodation (rural or urban area), level of education, marital status, and financial status.
The diagnostic status of all the participants was determined by the MINI Version 5.0.0 in both studies [49,50]. A written informed consent form was obtained from each participant.
Finally, in both studies the participants completed a set of questionnaires [43,48] including the same versions of the PHQ-9 [29] and were assessed by a clinician using the HDRS [27,28]. The original English version of PHQ-9 has been developed by Drs. Robert L. Spitzer, Janet B. W. Williams and Kurt Kroenke. According to the Instruction manual for Patient Health Questionnaire (available at: https://www.phqscreeners.com/images/sites/g/files/g10016261/f/ 201412/instructions.pdf), the translations have been developed by the MAPI Research Institute using standard forward/back-translation procedures and are linguistically valid. The Polish version of PHQ-9 was downloaded from: https://www.phqscreeners.com (accessed at 19th of July 2020). It has also been validated in the hospitalized elderly population in Poland [46]. In addition, the Polish investigators made sure that the PHQ-9 was culturally applicable through a discussion on the contents of the translated items and by testing them among healthcare professionals and people with T2DM, with a focus on the semantic meanings of the expressions and language used in the questionnaire.

Measures
In order to validate the Polish version of the PHQ-9, relevant data were extracted from the INTERPRET-DD study dataset [47] and from a study by Kokoszka et al. [43]. We took into consideration the occurrence of depression established using the MINI Version 5.0.0 and the participants' results in the PHQ-9 and HDRS scale, which was used as an external scale to verify the convergent validity of the Polish version of the PHQ-9.
The PHQ-9 is a 9-item depression module from the full PHQ [29]. This is a self-report screening tool that recognizes the presence and severity of depressive symptoms. The items are based on DSM-IV criteria for the assessment of depressive symptoms during the previous two weeks [29]. Each item of the PHQ-9 scores from 0 (not at all) to 3 (nearly every day), with a summed score ranging from 0 to 27. The severity of a depressive disorder can be assessed as follows: 5-9 (mild), 10-14 (moderate), 15-19 (moderately severe), and 20-27 (severe) [29]. So far, the psychometric qualities of the Polish version of the PHQ-9 in the diabetes population have not been examined.
The HDRS [28] is a commonly used tool to assess the severity of depression symptoms administered by interview and observation. The patient is rated by a clinician according to the specified criteria on a scale from 0 to 4. The most commonly used 17-item version of the HDRS was employed in this study.
The diagnostic status of all the participants at the time of the PHQ-9 assessment was determined by the MINI Version 5.0.0 [49,50], which has been widely used among different populations, including those with serious illnesses. It is a reliable diagnostic instrument according to Diagnostic and Statistical Manual of Mental Disorders 5th edition (DSM-5) criteria [49]. MDD was diagnosed when participants fulfilled at least one core criterion of the DSM-IV (depressed mood or loss of interest/pleasure) and required additional criteria with a 2-week duration almost every day/night: significant weight loss (or poor appetite) or weight gain; insomnia or hypersomnia; psychomotor retardation/fatigue/loss of energy; feelings of worthlessness/guilt; diminished ability to think/concentrate, indecisiveness; recurrent thoughts of death/suicidal ideation, plan, or attempt. Dysthymia was diagnosed if participants felt sad, low or depressed most of the time for the last two years and fulfilled at least two additional criteria in the past 2 years: weight loss (or poor appetite)/weight gain; insomnia or hypersomnia; fatigue or loss of energy; low self-esteem; diminished ability to think or concentrate, or indecisiveness; feelings of hopelessness. For comparison, participants were classified to the 'major depressive disorder' group, or to a more general category, i.e. 'MDD and/or dysthymia'. The second group consisted of participants who fulfilled the diagnostic criteria of current major depression or/and dysthymia according to the MINI 5.0.0.

Statistical analyses
2.4.1. Internal consistency reliability and convergent validity of the Polish version of the PHQ-9. The statistical analyses were carried out using SPSS version 25 for Windows. In order to determine the internal consistency reliability of the PHQ-9, Cronbach's alpha was conducted with α values between 0.80 and 0.90 usually indicating good internal consistency [51].
Pearson product-moment correlations were applied to measure convergent validity. We assumed that the PHQ-9 scores would be positively associated with the HDRS. A strong or moderate strength of the relationship (r value from |0.50| to |0.80|) between the PHQ-9 scores and the HDRS indicates satisfactory convergent validity [52].

Screening accuracy for likely depression.
Logistic regression was performed to assess the discriminatory validity of the Polish version of the PHQ-9 as a screening tool for current 'MDD and/or dysthymia' in the general population and in those with T2DM separately for these two groups. A positive predictive value (PPV) and negative predictive value (NPV) were calculated using logistic regression. The PPV is the probability of disease for positive test results, whereas the NPV means the probability of being healthy when the test results are negative [53,54]. Then we employed the Wald statistic to test whether the PHQ-9 score is a significant predictor of depression. Odds ratios (ORs) and their confidence intervals (CIs) were estimated.
Criterion validity was investigated by computing sensitivity and specificity for all the cutoff scores on the PHQ-9 for MDD and separately for MDD and/or dysthymia diagnoses based on the MINI as the criterion standard. To determine sensitivity and specificity of the PHQ-9, the Receiver Operating Characteristic (ROC) curve was mapped and the area under the curve (AUC), as an effective measure of accuracy of the PHQ-9 for identifying 'major depressive disorder' and 'MDD and/or dysthymia' for the two groups, was calculated. In most of the previous studies, researchers did not report the applied criteria for the choice of the optimal cut-off [29,39,41,43,46]. We identified the optimal cut-off values in one step using Youden's index (sensitivity+specificity−1), which ranges between 0 to 1, with higher values indicating greater diagnostic performance [55]. We also used a second method of two-stage screening for depressive disorders. According to this approach, cut-off scores demonstrating maximal sensitivity and specificity � 75% are recommended [45,56]. The two-stage approach is more appropriate in clinical settings where positive screening results are usually further verified with a diagnostic interview, observation and treatment [45]. The one-stage method is more suitable in research studies where the results are used to estimate depression prevalence rates and do not lead to clinical decisions [56]. It is worth mentioning that if we used screening to assess if the study's eligibility criteria were met, a two-stage approach would be more appropriate [45]. Statistical significance for all of the conducted analyses was established at p < .05.

Demographic, clinical and psychological sample characteristics
The demographic, clinical, and psychological characteristics of the participants are presented in Table 1.
Quantitative data (e.g. age) were presented in the form of mean (M) and standard deviation (SD). To test whether gender and diabetes were independent the chi-square test of independence was used. The result indicates that diabetes is does not associated with gender, χ 2 (1) = 1.647, p = .199. The one-sample chi-square test was used to verify whether a gender variable follows a hypothesized population distribution. Both in the control group (χ2(1) = 0.818, p = .366) and group of patients with T2DMs (χ2(1) = 0.907, p = .341) the gender ratios were consistent with expected distributions. The comparisons of age between control group and patients with T2DM were conducted using a nonparametric Mann-Whitney U test because of unequal sample size. Thus, the rank mean was presented. The result indicated that the age of control group was significantly lower than age of patients with T2DM, U = 2752.50, p < .001 (see Table 1).
PHQ9 scores were not significantly associated with age in either the group of people with T2DM, (r = -0.103, p = .130) or in the control group (r = 0.065, p = .521). There was no difference in PHQ-9 scores with regard to either gender (t(214) = 1.79, p = .075) or education level (H(2) = 2.925, p = .232) in the group of patients with T2DM. PHQ-9 scores were also not significantly associated with diabetes duration (r = 0 .07, p = .279).
The comparison group was diverse in terms of health and consisted of 99 persons (54 women and 45 men). PHQ-9 scores in this group were not significantly associated with age (r = 0.065, p = 0.521). There was no difference in results of PHQ-9 with regard to gender (t(97) = -0.569, p = .57) (see Table 2).
We have also tested the differences and associations between participant's characteristics and presence of MDD as well as MDD and/or dysthymia (see Table 3). The groups with and without MDD as well as MDD and/or dysthymia were not different with regard to gender, age, education level and diabetes duration.

Reliability and validity of the PHQ-9
We assessed the reliability of the PHQ-9 scale by calculating Cronbach's alpha reliability coefficients separately for patients with T2DM and the comparison group. Cronbach's alpha for the Polish version of this tool yielded 0.858 and 0.883, respectively, for these groups. Thus internal consistency of the PHQ-9 is satisfactory, indicating a homogeneous structure of the measure.

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes In terms of convergent validity, the PHQ-9 scores indicated a strong significant positive correlation with the HDRS (patients with diabetes: r = 0.781. p < .001; group without diabetes: r = 0.846. p < .001; whole group: r = 0.882, p < .001).

Screening accuracy for major depressive disorder
In the first step we performed logistic regression analysis for patients with T2DM. The analysis indicated that the model containing the PHQ-9 as a predictive factor for current MDD was statistically significant, namely χ 2 (1) = 99.183; p < .001. The Hosmer-Lemeshow test indicated goodness of fit of the prediction model (H-L χ 2 (6) = 6.51; p = .369). The results indicated that approximately 65% of variability in MDD was explained by the PHQ-9 scores in this group (Nagelkerke's R 2 = 0.65). The Wald test showed that the PHQ-9 score was a significance predictor of the prevalence of MDD, W(1) = 41.42; p < .001 (OR = 1.588; 95%CI: 1.380-1.829).

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes specificity 81.58% (see Table 5). A cut-off point of � 10 was selected by the two-stage screening approach (sensitivity 100%; specificity 78.95).

Screening accuracy for MDD and/or dysthymia
The PHQ-9 total score was a statistically significant predictor of MDD and/or dysthymia, The ROC curve is presented in Fig 1C). The AUC was 0.950; p < .001 (95%CI 0.920-0.980). Youden's index (0.75) indicated that a cut-off score of � 7 yielded the best diagnostic effectiveness: sensitivity 85.29% and specificity 90.11% (see Table 6). A cut-off of � 5 points was selected by the two-stage screening approach (sensitivity 94.12%; specificity 78.57%).

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes Logistic regression analysis indicated that the model including the PHQ-9 total score as a predictor for 'MDD and/or dysthymia' was statistically significant (χ2(1) = 57.62; p < .001) and the prediction model had a good fit with the data (H-L χ 2 (6) = 4.80; p = .57). Approximately 65% of variability in 'MDD and/or dysthymia' was explained by the PHQ-9 scores in the group without diabetes (Nagelkerke's R 2 = 0.646, W(1) = 24.15; p < .001; OR = 1.401; 95% CI: 1.225-1.603).

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes The operating characteristics for the diagnosis of 'MDD and/or dysthymia' in the group without diabetes are shown in Fig 1D and Table 7. The AUC was 0.942; p < .001 (95%CI 0.898-0.985). For the PHQ-9, an optimal cut-off score of � 11 (sensitivity = 96.0%, specificity = 83.56%) was equal to the cut-off score suggested for Youden's index (YI = 0.80) and of �9 (sensitivity = 100%; specificity = 75.34%) for two-stage screening for 'MDD and/or dysthymia'.

Discussion
The main aims of this study were to assess both the reliability and validity of the Polish version of the PHQ-9 in patients with and without T2DM and determine the optimal cut-off point for

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes recognizing a disorder of major depression or of MDD and/or dysthymia for people with and without T2DM. To our knowledge, this is the first such study to identify both the psychometric and screening properties of the PHQ-9 simultaneously in these two groups.
The results of our analysis provide empirical evidence for the internal consistency, reliability and convergent validity of the Polish version of the PHQ-9, with a high Cronbach's alpha and expected strong positive correlation with the HDRS in both groups of people with and without T2DM.
To our knowledge this is the first attempt to indicate criterion validity for the PHQ-9 in order to identify both MDD and MDD and/or dysthymia in people with T2DM. This issue is critical as there is a need for improved recognition and treatment of depressive symptoms in order to prevent severe depression among patients and reduce treatment-related costs [57]. In Table 5. Accuracy of the PHQ-9 cut-off values for detecting major depression (diagnosed with the MINI) in adults without diabetes for the Polish sample (N = 99). ( � ) If the sample sizes in the positive (Disease present) and the negative (Disease absent) groups do not reflect the real prevalence of the disease, then the Positive and

PHQ-0 raw score cut-off
Negative predicted values, and Accuracy, cannot be estimated and you should ignore those values. a) Optimal cut-off scores according to maximal Youden Index (sensitivity+specificity−1). b) Recommended cut-off scores for a two-stage screening (maximal sensitivity and �75% specificity).

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes previous research it was shown that among individuals with dysthymia, the risk of MDD within a year was 5.5 times more likely [58]. The stepped care model for the treatment of depression in diabetic patients assumes that the first step to take for patients with MDD and/ or dysthymia identified as being at the highest risk of major depression is to monitor their symptoms and then re-evaluate their mental health [59]. The operating characteristics of the screening instrument for 'MDD and/or dysthymia' were very similar to those for 'MDD', however, the first ROCs were slightly lower for both the non-diabetic group and for patients with diabetes. One possible reason for this may be that the diagnostic criteria for 'MDD and/or dysthymia' are more heterogeneous than those for and they may provide a diagnostic challenge [45].

PLOS ONE
The lower cut-off on the PHQ-9 when screening for depression in people with diabetes In our research, the optimal cut-offs for PHQ-9 in people with and without diabetes were investigated based on two methods: 1) a one-step method using Youden's index [55]; the cut-off points based on this method can be useful in scientific research where the results are aimed to estimate depression prevalence rates and have no impact on clinical decisions; 2) a second method of twostage screening for depressive disorders to provide guidance for clinical practice. According to this approach, cut-off scores demonstrating maximal sensitivity and specificity of �75% are appropriate [45,56]. The ROC analysis supports the use of the PHQ-9 as a screening tool for verifying likely current MDD and any depression in people with Type 2 diabetes and in a non-diabetic group.
Youden's index indicated that a cut-off of � 7 yielded the best sensitivity/specificity tradeoff and it is useful for screening MDD for research purpose among people with diabetes. On

PLOS ONE
the other hand, the results obtained using the two-stage method show that the cut-off score of � 5 was appropriate to screen for MDD in clinical practice among people with diabetes. Thus, for MDD, both the optimal cut-off score indicated by Youden's index (� 7) and the two-stage (� 5) cut-off scores were lower than the generally recommended cut-off score of � 10 (sensitivity = 68.75%, specificity = 95.65%). Similar results were obtained in a study carried out among patients with coronary artery disease where a PHQ-9 cut-off score of �10 was 54% sensitive and 90% specific [45]. Among elderly hospitalized patients, a > 6 score as the cut-off point was indicated (sensitivity 70.4% and specificity 78.2%) [46]. In a study of patients with diabetes, Khamseh [60] looked for utilizing the PHQ-9 in screening for depression in Iran. A cut-off score for the PHQ-9 of � 13 provided an optimal balance between sensitivity (73.80%) and specificity (76.20%). High sensitivity is more important for screening purposes than high specificity, and the low sensitivity of a cut-off score of �10 makes it inappropriate for this objective. Youden's index indicated that a cut-off score of � 11 yielded the best diagnostic effectiveness in scientific research and a � 10 score was selected by the two-stage screening approach as adequate for screening for MDD among people without diabetes in clinical practice. Hence, for this group, the cut-off scores are very similar to the generally recommended cut-off score of �10.
Concerning recognizing 'MDD and/or dysthymia' in the group with diabetes, the one-step approach indicated that a cut-off score of � 7 yielded the best diagnostic effectiveness for scientific purpose. Our analyses of the PHQ-9 showed that a cut-off score of � 5 was appropriate for clinical practice to screen for 'MDD and/or dysthymia' in people with diabetes according to the two-stage method. This is consistent with the recommended threshold demarcating the lower limits of mild depression [28,29]. For the PHQ-9, an optimal cut-off score of � 11 was equal to the cut-off score in scientific researches suggested by Youden's index and of � 9 in clinical practice according to the two-stage method for screening 'MDD and/or dysthymia' in the non-diabetic group.
Therefore, the cut-off scores for both MDD and 'MDD and/or dysthymia' are lower for people with diabetes as compared to the group without diabetes. To our knowledge, this is the first study that applied Youden's index and a two-stage approach to find the optimal cut-off values among people with diabetes, thus making an important contribution to existing studies in which discrepancies in sensitivity and specificity for the conventional cut-off score of � 10 have been found up to now [45]. Our research compared the PHQ-9 with a structured diagnostic interview, i.e. depression was diagnosed using DSM-IV criteria assessed by a structured interview (MINI), which is considered the gold standard.
This research provides support for the Polish version of the PHQ-9 that may be useful for both clinical practice and empirical research on people with diabetes, as both have satisfactory psychometric properties.
It is important to note that the previous analyses of the Polish version of the questionnaire [43], suggested an optimal cut off point for depression in healthy people at 12. In our study, non-diabetic subjects had an average PHQ-9 score of 8.67, markedly below this threshold. A score relatively high, yet below the threshold for depression, is a result which is consistent with the findings of the most robust assessment of the prevalence of mental disorders conducted in Poland: the EZOP Poland study [61]. This study showed the prevalence of MDD (diagnosed by CIDI) in the general population in Poland was 3.0%, whereas the prevalence of individual depressive symptoms ranged up to 40.2%.
Validation of each language version is necessary for clinical practice to ensure that any employed screening instrument is adapted to the patient's culture, language, and literacy abilities [62]. Thus, future studies should be aimed at establishing appropriate cut-offs for people with other conditions. Indeed, a recent meta-analysis (including 36 studies) indicated that the optimal cut-off score of the PHQ-9 can vary from one population to another (ranging from 4 to 16) [63]. However, it was emphasized that it was difficult to draw any firm conclusions because the cut-off points were selectively reported [63]. The authors suggested that using a uniform threshold might not be adequate with regard to all settings [63]. Moreover, the authors recommended that reporting the data on all cut-off points in the validation studies should be mandatory and most studies did not meet these criteria. Most previous studies have not indicated the approach that was used to obtain an optimal trade-off between sensitivity and specificity [e.g. [29,39,41,43,46]]. Our study fulfilled these requirements. Note should be taken that the PHQ-9 cannot be considered a sufficient tool for the diagnosis of depression, as screening instruments cannot replace a full clinical examination.
A limitation to our study with regard to patients with diabetes is that most of the participants were inhabitants of urban rather than rural areas, where accessibility to psychological support might differ. Patients in specialist clinics may differ from those in the wider diabetes population with respect to severity of depressive symptoms. In the second study [43], the limitation was the relatively small size of the group, and the sample was not selected by a random sampling. However, the study included random participants, patients with chronic kidney disease and patients of a daily psychiatric ward. The control group includes patients with other, mostly chronic diseases. Thus, a lack of a group consisting of more healthy participants is the limitation of this study. Despite the relatively small group of people, the results obtained in this study are very similar to those obtained in other validation studies and confirm the high psychometric properties of the PHQ-9, thus providing important clinical implications.