Validation of the Spanish Center for Epidemiological Studies Depression and Zung Self-Rating Depression Scales: A Comparative Validation Study

Background Depressive disorders are leading contributors to burden of disease in developing countries. Research aiming to improve their diagnosis and treatment is fundamental in these settings, and psychometric tools are widely used instruments to support mental health research. Our aim is to validate and compare the psychometric properties of the Spanish versions of the Center for Epidemiological Studies Depression Scale (CES-D) and the Zung Self-Rating Depression Scale (ZSDS). Methodology/Principal Findings A Spanish version of the CES-D was revised by 5 native Spanish speaking psychiatrists using as reference the English version. A locally standardized Spanish version of the ZSDS was used. These Spanish versions were administered to 70 patients with a clinical diagnosis of DSM-IV Major Depressive Episode (MDE), 63 without major depression but with clinical diagnosis of other psychiatric disorders (OPD), and 61 with no evidence of psychiatric disorders (NEP). For both scales, Cronbach's alpha (C-α) and Hierarchical McDonald Omega for polychoric variables (MD-Ω) were estimated; and receiver operating characteristics (ROC) analysis performed. For the CES-D and ZSDS scales, C-α was 0.93 and 0.89 respectively, while MD-Ω was 0.90 and 0.75 respectively. The area under the ROC curve in MDE+OPD was 0.83 for CES-D and 0.84 for ZSDS; and in MDE+NEP was 0.98 for CES-D and 0.96 for ZSDS. Cut-off scores (co) for the highest proportions of correctly classified (cc) individuals among MDE+OPD were ≥29 for CES-D (sensitivity (ss) = 77.1/specificity (sp) = 79.4%/(cc) = 78.2%) and ≥47 for ZSDS (ss = 85.7%/sp = 71.4%/cc = 78.9%). In the MDE+NEP, co were ≥24 for the CES-D (ss = 91.4%/sp = 96.7%/cc = 93.9%) and ≥45 for the ZSDS (ss = 91.4%/sp = 91.8%/cc = 91.6%). Conclusion Spanish versions of the CES-D and ZSDS are valid instruments to detect depression in clinical settings and could be useful for both epidemiological research and primary clinical settings in settings similar as those of public hospitals in Lima, Peru.


Introduction
Globally, unipolar depression is a major contributor of burden of disease, and its impact is growing [1,2]. Also, research studies regarding risk factors, treatment, and associations of depressive disorders with other chronic and acute diseases show an increase on the importance of these mental disorders in global health and their harmful effect on people's health [3][4][5].
Depression scales such as the Beck Depression Inventory (BDI) [6], the Center for Epidemiologic Studies Depression Scale (CES-D) [7], the Zung Self-Rating Depression Scale (ZSDS) [8], and the Patient Health Questionnaire (PHQ) [9,10] are widely used as depression screening tools for both diagnosis support and research. Recent meta-analytic studies report a similar performance in discriminative properties of the Hospital Anxiety and Depression Scale (HADS) (cut off score (co) $8; sensitivity (ss) = 0.82; specificity (sp) = 0.74) and of the Geriatric Depression Scale (GDS) (ss = 0.92; sp = 0.77) [11,12]. A meta-analysis of Hamilton Rating Scale for Depression (HRSD), which is one of the most widely used, revealed that the intra class correlation coefficient was 0.94, the kappa coefficient 0.81, and the Pearson correlation coefficient 0.94 [13]. Current evidence suggests that there are no differences between them as screening tools for major depression [14] and their scores have showed good correlation [15].
The performance of various scales aiming to screen for depressive disorders has been reported with good results in Spanish language, including Latin America. For example, the Depressive Psychopathology Scale (DPS) reported good internal consistency (Cronbach's alpha (C-a) = 0.86) and discriminative power (ss = 77.7% and sp = 72.3) for the detection of major depressive disorder in a sample of Peruvian patients attending the National Institute of Mental Health [16]. Both 5 and 15 item versions of the GDS showed good psychometric properties in Spanish elderly (ss = 86.4/81.8%; 85.6/97.7%) [17], as well as an custom questionnaire for the elderly used in Mexico (C-a = 0.74; co$5; ss = 80.7%, sp = 68.7%) [18] and the Edinburgh Postpartum Depression Scale (EPDS) in Puerperal Mexican Woman (Ca = 0.75; area under receiver operating characteristics curve (auROC) = 0.80, co$7; ss = 75%; sp = 84%) [19].
In 2009, Reuland et al reviewed the diagnostic accuracy of Spanish language depression screening instruments. Only, three studies from non-US Spanish were selected by the authors based on quality appraisal (EDPS, MHAS and PHQ-9). The overall conclusion was that, based on their review (including US-based validations), the CES-D and the PRIME-MD-9 might be useful for detecting depressed patients in primary care on the United States of America. Despite this, there was little evidence on primary care performance of depression scales in non-US Spanish speakers [34].
The availability of free to use, valid instruments might encourage independent research initiatives addressing the diagnosis, treatment and prevention of MDE in low and middle income countries, as well as contribute to alleviate the under diagnosis of depressive disorders in primary care [35]. Besides the DPS, which have showed to be appropriate for the discrimination of MDE on a population with a high prevalence of psychiatric disorders, such as the Peruvian National Institute of Mental Health population, few studies have studied the psychometric properties of depression screening scales in general hospital settings.
Both the CES-D and ZSDS scales have already been validated on university and middle school adolescents, elderly and general population in countries of Latin America; however, we could not find studies that specifically validated these scales on general hospital settings. We choose to validate the CES-D and ZSDS scales as both of them have already been validated in Latin America, their use poses no cost for researchers and there is evidence supporting their use albeit at different settings [28,29,33,36]. This study aimed to evaluate the internal consistency and validity of an adapted Spanish version of the CES-D and ZSDS scales for the detection of Major Depressive Episode (MDE) using the diagnosis of a psychiatrist as defined by DSM-IV criteria as reference, and to compare the psychometric properties of the adapted scales.

Objectives
This study aimed to estimate the internal consistency and cutoff points for a maximum number of persons correctly classified as well as highest sensitivity and specificity of the CES-D and ZSDS on a sample of participants with clinical diagnosis of MDE, other mental disorders and persons without evidence of mental disorders on waiting room of a public hospital in a developing country. Discriminative properties of both scales are also compared.

Participants
All of the patients who were waiting for services at the psychiatric and internal medicine outpatient consultations at the ''Hospital Nacional Cayetano Heredia'' (HNCH) from January to December 2006 were invited to participate in the study. The HNCH is a third level public hospital that serves three of the most populated districts in the northern part of Lima. Mental health care in Peru is usually provided in the psychiatry outpatient consultation areas in hospitals like HNCH. All patients who participated spoke Spanish as their native language. Patients attending HNCH are usually of low socio-economic status and participants were between 18 and 65 years of age. In terms of the groups of participants, those recruited from the psychiatry consultation included 70 people with Major Depressive Episode (MDE) and 63 people with other DSM-IV diagnostics (OPD), and those recruited from the internal medicine outpatient consultation included 61 people with no evidence of psychiatric disorders (NEP). Consecutive patients on the waiting room of the psychiatry and internal medicine consultation were invited to participate in the study until the calculated sample size for the study was achieved. Everyone gave informed consent prior to initiating participation. MDE diagnosis by a psychiatrist using DSM-IV criteria was an inclusion criterion for the MDE group, while the DSM-IV diagnosis of other psychiatric disorders but not MDE by the same psychiatrists was a criterion for the OPD group. We ruled out major depressive episode in the NEP group using the DSM-IV Structured Clinical Interview for Major Depression (SCID-I), and excluded those that resulted positive for MDE. We also excluded patients with physical or mental pathologies that could prevent them from completing the psychometric tests. Literacy was checked as part of the informed consent process by the investigators.
For the sample size calculation, a sensitivity and specificity of 80% was assumed, and the minimal acceptable value was 70% for both. To calculate the sensitivity and specificity with a confidence interval of +/210 and a probability of Type I error of 5%, we needed to recruit at least 61 participants in each group (MDE, OPD and NEP).

Study Design
A cross sectional design was used to establish the validity, internal consistency, and psychometric differences of the Spanish versions of CES-D and the ZSDS. Three groups of participants were established, one with no evidence of psychiatric disorders (''NEP'' group), one with major depressive episode (''MDE'' group), and the other without major depressive episode but with any other psychiatric disorder (''OPD'' group). A sample of outpatients from the psychiatric and internal medicine consultations from the Hospital Nacional Cayetano Heredia (HNCH) in Lima, Peru was used to establish these groups.

Data Collection
All of the groups of participants completed the pen and paper versions of the CES-D and ZSDS while on the waiting room of medicine or psychiatry outpatient consultation, and their sociodemographic data was also registered by the investigators. The psychiatric diagnosis, based on the DSM-IV classification, any other medical disease, and the Clinical Global Impression-Severity Scale (CGI-S) were made and registered by the medical assistant in charge of the consulting room, except in the NEP group where no CGI-S values were recorded.

Instruments
The CES-D is a 20-item scale designed to be a case-detection instrument for depressive disorders in the general population. It has been validated in several languages, including Spanish [29,37,38], and in a wide range of populations, including adolescents [39][40][41][42] and elderly people [43][44][45][46]. Previous studies support the use of CES-D as a good psychometric test in crosscultural contexts such as Latin America [47]. The CES-D is composed of 20 items; each one scored in a scale from ''0'' to ''3'' according to the amount of days on the previous days that the person felt according to the item's premise, thus the total score varies from 0 to 60.
The ZSDS is also a 20-item scale to help physicians in primary care settings to identify depressive symptoms [48,49]. The ZSDS has been validated and its factor structure analyzed in different languages [50][51][52] and in diverse specific populations such as oncology patients [53], people with cognitive impairment [54] and Parkinson's disease [55], and college and university students [56,57]. The ZSDS is also composed of 20 items, each one with punctuation from 1 to 4, making the range of the completed scale range from 20 to 80.
The CES-D scale used was the Spanish online version from the Center of Epidemiological Studies adapted by the Patient Education Center from the Medical School of Stanford University. It was revised by a group of 5 native Spanish speaker psychiatrists having as reference the English version provided by the same organization and a consensus version was reached. The psychiatrists suggested a few minor changes in the use of some words. The used ZSDS version corresponds to a standardized revised version used by the Peruvian National Mental Health Institute (PNMHI) in Lima.

Statistical Methods
Statistically significant differences in age, educational level, gender and scores on the CGI, CES-D and ZSDS were analyzed across the MDE, OPD and NEP groups using a non-parametric test (Kruskall Wallis test and Spearman's rank correlation coefficient, ''Rho'') since the data did not fulfill the normality assumption for the use of parametrical tests. Sensitivity and specificity were assessed for each scale independently using ROC curves, and the best cut-off points were determined using the score with the highest percentage of correctly classified individuals.
We also examined statistically significant differences in the area under the ROC curve between the CES-D and the ZSDS. ROC curve analyses were performed using the MDE versus the NEP (MDE+NEP), and then MDE versus the OPD (MDE+OPD). Differences between ROC curves were performed using a two sided hypothesis test of equality of the auROC (''roccomp'' command of STATA V.10). Internal consistency analyses were performed using the C-a and hierarchical McDonald's Omega coefficient using maximum likelihood estimation for polychoric

Ethics
Prior to their participation in the study, all persons gave written informed consent. The research protocol was reviewed and approved by the Universidad Peruana Cayetano Heredia Ethics Committee.

Results
A total of 194 participants were recruited, 70 for the MDE, 63 for OPD and 61 for the NEP group. Only three potential participants on the NEP group refused to participate in the study. Among OPD participants, 22 received a DSM-IV diagnosis of any anxiety disorder, 15 of any psychotic disorder, 17 any mood disorder different than MDE, 3 were diagnosed with any substance use disorder and 6 other DSM-IV diagnoses.
Comparison of age, gender, educational level as well as CGI, CES-D and ZSDS median scores between groups are shown in Table 1. We found statistically significant differences regarding the proportion of women in the depression group and in the severity of symptoms according to the CGI-S and the CES-D and ZSDS scores. The proportion of women was significantly higher in the MDE group than in the OPD and NEP groups (p,0.01) while the scores on the CES-D and ZSDS were also significantly higher (p,0.01) in the MDE than in the OPD and NEP groups. The median of both CES-D and ZSDS on the MDE group (35.5 & 53.5) was statistically different (p,0.01) than on the OPD (20 & 40) and NEP (9 & 33). CGI-S scores were found to be significantly different (p,0.01) between MDE (p50: 4, p25: 4, p75: 5) and OPD (p50: 4, p25: 4, p75: 5) groups.
For the CES-D and ZSDS scales, C-a was 0.93 and 0.89 respectively. Inter-item covariance, item-test and item-rest correlation as well as the alpha change in the absence of each item for both scales, as shown in Table 2 (Figure 2). The CES-D and ZSDS showed good sensitivity and specificity; however the best cut-off scores for the MDE+OPD were higher for both the CES-D ($29) and ZSDS ($47), when compared with the cut-off points for the MDE+NEP ($24 and $45, respectively; see Tables 3 and 4).
No statistically significant differences were found between the area under the ROC curve in the CES-D and ZSDS for the MDE+OPD (p = 0.94) or the MDE+NEP (p = 0.14) groups.
We also found a good correlation between the ZSDS and CES-D scores (rho = 0.86, p,0.001), and a statistically significant correlation was also found between the CES-D and CGI scores (rho = 0.51, p,0.001), as well as for the ZSDS and CGI scores (rho = 0.50, p,0.001).

Discussion
We studied the psychometric properties of the ZSDS and CES-D in two contexts, one in which we analyzed the performance of the scales in a general hospital population (MDE+NEP) and one in which major depression has to be detected in a setting with high prevalence of other psychiatric disorders (MDE+OPD).

Internal Consistency and Discriminant Validity
Cronbach's Alpha results for both scales (CES-D = 0.93, ZSDS = 0.89) were consistent with reports for the CES-D both in a Spanish-language study in a population with affective disorders (0.90) [29] and also with the internal consistency of the DPS (0.86) which was validated on the PNMHI, with a similar population [16]. Studies with non-general hospital or specialized health care center report internal consistency estimations ranging from 0.85 to 0.93 for the CES-D in Spain [25,27], 0.80 for a 10  item Colombian version of the ZSDS and 0.85 for the full scale in Puerto Rico general population [31,33]. The McDonald's Omega coefficient (CES-D = 0.90, ZSDS = 0.75) has been proposed as a more appropriate alternative to Cronbach's Alpha as index of how items on an instrument measure the same latent variable [59,60]; however, the availability of literature to compare the performance of the studied scales in different settings is limited.
On the MDE+OPD group comparisons, sensitivity and specificity (,80%) was below the results reported by Soler et al (ss and sp ,90%) and a little above to the findings on the DPS validation (ss ,77%; sp ,72%) when using the clinical diagnosis of a psychiatrists as gold standard [16,29]. The difference found with the study of Soler et al, might be the result of the use of a different gold standard, in this case the HRSD. On the MDE+NEP group, sensitivity and specificity for both scales was ,92%, which appears to be slightly above the findings on other Latin America studies in general population for both the CES-D and the ZSDS (ss = 73-95.5%, sp = 70.7-70.4%) [25][26][27]31,33].
In previous studies including non-Spanish validations, the recommended cut-off score for the CES-D has varied among populations, from $12 to $21 for clinically significant depressive symptoms [61][62][63] and from $23 to 26 for major depression [38,40,64,65]. As the clinical assessment by psychiatrists according to DSM-IV criteria was considered the gold standard for MDE diagnosis, our results (co$24) are fairly consistent. For the ZSDS, the best cut-off score (co$45) was similar to the results for the Greek validation of the ZSDS [52]. The recommended cut-off scores for the MDE+OPD groups were higher (CES-D: $29; ZSDS; $47) and with lower sensitivity and specificity than the ones for the MDE+NEP groups (Tables 3 and 4).
No statistical differences (p,0.05) were found in the area under the ROC curve comparison between the CES-D and the ZSDS in the MDE+NEP or the MDE+OPD groups. These findings suggest that both scales could have the same predictability of depression between the studied groups, and are valid and consistent psychometric tools in both a general hospital and high prevalence Table 3. Sensitivity, specificity and percentage of correctly classified subjects for CES-D Cut-off scores.  of mental disorders settings, although discriminative properties seems to be diminished for the latter group, as discussed below.

Variation of discriminative properties in non-population settings
Evidence suggests that comorbidity with pathologies with symptoms similar or related to depressive syndromes might affect the performance of psychometric instruments. In studies with patients suffering of rheumatoid arthritis, seven items (items number 2, 4, 8, 11, 12, 16 and 18) had to be removed from the CES-D to reach a good fit for the scale; once retired, the 13 item scale showed a ss = 89.6% and sp = 95.8%, with a co$9 [66]. The influence of somatic components affecting rheumatoid arthritis, regardless of presenting or not MDE was also supported by Callahan et al [67]. In another study that compared depressive symptoms using the ZSDS in a group of patients with chronic pain and patients from psychology service with comparable depressive symptoms, chronic patients endorsed items referring to psychomotor retardation, sleep disturbance, constipation and fatigue, and thus, might overestimate the depressive symptoms in a similar sample [68]. In a sample of patients suffering from multiple sclerosis, the cut-off score of 16 on the CES-D, which is recommended as cut-off score for detection of clinically significant depressive symptoms, yielded low positive predictive values both for any depressive disorder (74.5%) and major depressive disorder (59.6%), thus suggesting that the score used for general population might not be appropriate for this particular setting [69]. These studies suggest that somatic symptoms that present similitudes to those characteristic of depressive disorders might have a negative effect on the performance of both scales, and also increase their optimal the cut-off scores. Taking this into account, symptoms common with major depression found in various psychiatric disorders could diminish the discriminate properties of the scales, as evidenced by the similar findings of discriminative properties of CES-D and ZSDS described in this study and the DPS validation, which found similar values in similar, even if not identical settings [16].

Limitations
The main limitation for this study was that the gold standard for the diagnosis of major depression was not a standardized interview but clinical diagnosis by a psychiatrist. Further, the inter observer concordance for the clinical DSM-IV based diagnosis could not be assessed. This and the possible re-arrangement of items for a high prevalence of mental disorder settings should be engaged in future studies. The results of this study cannot be generalized to other samples that do not meet the conditions and characteristics of this specific setting.
With respect to the use of gold standard, Vega-Dienstmaier et al used both a clinical interview (SCID) and clinical diagnosis of a psychiatrist as gold standard for the validation of the DPS. The auROC was slightly higher when using the SCID as gold standard (0.87 vs. 0.83); the same was true for sensitivity (81.3% vs. 77.7%) and specificity (80% vs. 72.3%). The optimal cut-off score was 1 point lower when using the clinical diagnosis as gold standard ($27 vs. $26) [16]. While the psychometric estimates appear to favor the SCID, the study does not report if this difference is statistically significant different, particularly in the auROC for both assessments. In general, studies validating Spanish versions of depression screening instruments used as gold standards structured interviews [26,27], clinical diagnosis of MDE [18,30], or both [16]. The most reasonably approach might be to collect information of both clinical and structured interview based diagnosis during future evaluations of psychometric tools.
While the psychometric properties of both CES-D and ZSDS in the MDE+OPD group appears to be different than reported performance on general population settings; on the MDE and NEP group, both cut-off score and discriminant properties appears to be congruent with estimates calculated based on general population data.

Use of Results and Further Studies
The availability of validated screening instruments such as the CES-D and ZSDS might represent an important contribution to both research and screening of MDE. This becomes especially important if we take into account that an important amount of sub-diagnosis of depressive disorders in primary care settings of developing countries has been reported [35] and that very limited resources for independent research have been identified as obstacles to proper mental health policies supported by solid scientific evidence [70,71]. The amount of reported interventions focusing on prevention, treatment and rehabilitation of important contributors to burden of disease, such as depressive disorders, are still scarce in Latin America [72]. We expect that the availability of free to use instruments will encourage independent research focused on depressive disorders, as interventions designed in more developed countries might not be useful in Latin American or other developing settings [73].
Factor structure, test-retest reliability and general population estimates of the psychometric properties of both CES-D and ZSDS might be considered as target for research. Alternative forms of these instruments, such as shorter, telephone or online versions might also be useful in facilitating independent research in developing countries.
We conclude that both CES-D and ZSDS are reliable and consistent instruments for detection of MDE in psychiatric and general hospital settings.