Full evaluation of the psychometric properties of COPSOQ II. One-year longitudinal study on Polish human service staff

Purpose The aim of the study was the full evaluation of the psychometric properties of the COPSOQ II in one-year longitudinal study on human service staff in Poland. Data were collected from 599 employees representing three occupational groups related to human service work. Methods CFA was conducted in the structure proposed by the author of the original tool, based on one model, which included 119 observable variables forming 33 latent variables (single item subscales were excluded from analysis). To our knowledge, this was the first complete validation of the entire model using CFA. Reliability analysis was performed using two methods: internal consistency analysis and test-retest analysis. Predictive validity was assessed by correlating COPSOQ II variables with ten criterion variables related to job demands, job resources, work-family conflicts, mental health and well-being. Results According to the results, CFA supported the original structure of the COPSOQ II. Most of the 33 subscales were characterized by good or very good psychometric parameters. The obtained results confirmed also the fairly high reliability, as well as high convergence validity of all subscales of COPSOQ II. Conclusion The final conclusion is that COPSOQ II is characterised by satisfactory psychometric properties and could be successfully used to fulfil the demand for reliable and comprehensive assessment methods also in Polish job market settings.


Introduction
Psychosocial hazards are serious risk factors in the work environment. A number of metaanalyses [1][2][3][4], prospective studies [5][6][7] and large national population studies [8][9][10][11] reveal that harmful working conditions can become a source of severe physical diseases and mental disorders. According to the European Commission [12], a healthy and safe work environment may contribute to increased work efficiency, general social well-being and the economic development of a country. The European Framework Directive on Safety and Health at Work (89/391 EEC) [13] obliges employers to provide healthy and safe working conditions, including identification and assessment psychosocial risk factors in work environment as well as implementation of preventive actions. One of the most difficult problems for an employer to cope with when diagnosing risk factors is how to measure psychosocial risks.
Leka and Jain [14] made a critical analysis of 37 popular tools for measuring psychosocial hazards in the workplace. Most of these tools have been validated in other countries and also used in international research. A certain limitation is, however, that they are based on classic job stress models [15], which have been developed relatively long ago and do not fully include the current labor market challenges, such as new forms of work organization, technological changes, information overload, multitasking, the need for continuous learning and faster pace of life [16]. Moreover, they were developed from an analysis of industrial work; therefore, they emphasize mainly quantitative workload and disregard the aggravating role of other types of job demands, e.g. emotional demands or demands for hiding emotions [14], crucial for professionals working in human service organizations [17], whose "principal function is to protect, maintain, or enhance the personal well-being of individuals by defining, shaping, or altering their personal attributes" ( [18] p. 1).
The Copenhagen Psychosocial Questionnaire (COPSOQ II, the revised version of COP-SOQ) is a measurement tool that covers a wide spectrum of psychosocial work conditions and includes the specificity of the modern labor market and professions [19]. The International Labor Organization [20] and World Health Organization [14] have referred to it as an available measurement tool to evaluate psychosocial hazards in the workplace. In developing COPSOQ II, the authors followed the recommendations formulated for the previous questionnaire version, as well as other tools for studying psychosocial working conditions [19,21]. First of all, they wanted the questionnaire to cover the widest possible number of areas of psychosocial working environments. This is why they did not base the questionnaire on one theoretical model-as is the case for the majority of questionnaires-but referred to several different concepts. The concepts included the ones which Kompier [15] listed as the seven most influential models of occupational stress. COPSOQ II covers a wide range of psychosocial working conditions and can, therefore, be used in all labor market sectors, e.g. industry and social service. Secondly, in developing the questionnaire, the authors took into account recommendations concerning the length of the developed tools and the number of questions for each dimension of psychosocial working conditions [21]. COPSOQ II consists of 41 subscales (related to seven work domains), the majority of which include two to four questions (in total: 127 questions). Thirdly, COPSOQ II contains questions which refer to different levels of human functioning at work (e.g. organization, department, employee) and, as such, analyses can be carried out on different levels of generality, ranging from general demands at work to particular ones (e.g. emotional demands). Moreover, the questionnaire refers not only to potential sources of job stress but also to a human's own resources (e.g. social support and self-efficacy), as well as mental health (e.g. depression) and wellbeing at work (e.g. job satisfaction).

Current study
The COPSOQ has been translated into at least 25 languages and has been validated in a number of countries worldwide [22]. However, none of them fully confirmed the factor structure of the entire instrument, taking into account both the number of subscales and domains. As stated before, COPSOQ II consists of 41 subscales (including a total of 127 items) which are assigned to one of the seven work domains, i.e.: Demands at work; Work organization and job content; Interpersonal relations and leadership; Work individual interface; Values at the workplace; Health and well-being; and Offensive behavior. Therefore, testing its structure in a single model raises technical and computational problems. It is very difficult to test such an expanded model using Confirmatory Factor Analysis (CFA). The review of the literature suggests that other authors have tried to overcome the technical problems in various ways. Some used Exploratory Factor Analysis [23], while others analyzed the tool by means of CFA, but within each of the seven major domains separately [24,25]. Moreover, some previous studies did not include examination of theoretical relevance at all [25,26] or they investigated it with only a few criteria variables, which did not apply to all work domains [27]. The reliability of the tool, in turn, was usually assessed on the basis of set of data collected in cross-sectional studies, using Cronbach's Alpha measure, not the test-retest method [21,28,29]. The majority of earlier studies on the validation of COPSOQ II tested the short [27][28][29] or medium [23,26] version of COPSOQ, mainly in the countries of Western (but not Eastern) Europe and in not social-service sectors, e.g. mechanics, food production, cleaning, textiles, garment and trading [29], education, construction, wholesale, manufacturing, financial and insurance [25].
The purpose of the present study is to complex evaluate the psychometric properties of the full version of COPSOQ II in human service institutions, in Poland. This study focused on testing the general structure of the tool using Confirmatory Factor Analysis, based on one model containing 119 observable variables forming 33 latent variables. Eight COPSOQ II subscales measured with a single question were not included in the analysis. It is worth noting that none of the studies the authors came across have extensively tested the original structure of the COPSOQ II in this way. The issue of theoretical validity has been widely covered. The relevance of COPSOQ II variables were tested based on correlations with ten criteria variables representing four areas (i.e.: job demands, job resources, work-life balance and mental health/ well-being) that seem to be compatible with the seven work domains of COPSOQ II. The reliability of the COPSOQ II was assessed by means of two methods: one based on internal consistency analysis and the other using the test-retest method.

Participants and procedure
The study population (N = 599) included human service staff from the following professions: teachers in resocialization centers for children and youth (n = 200); care workers in centers for intellectually disabled children and youth with chronic mental illnesses (n = 199); medical psychiatric staff for children and youth (n = 200). The occupation specificity, involving intensive and direct contact with other people, related to the need to provide different forms of aid such as saving life and health and regular care of people who are ill, have social problems or are in conflict with the law, and formed the selection criterion for the study population groups.
A longitudinal study was carried out, with a one-year interval between the two measurement points (T1 and T2). The study was conducted in the period between September-November 2017 (T1) and September-November 2018 (T2) at the premises of the facilities where the respondents were employed. All participants were treated in accordance with the ethical guidelines of the Helsinki Declaration. Full confidentiality of the data and anonymity were secured. Participants were asked to fill out the questionnaires and seal them in envelopes, which were subsequently collected by research assistants. Out of 1,000 distributed questionnaires, 751 (75%) were completed in the first step of the study (T1) and 599 (60% of the original pool) in the second stage (T2). Finally, 599 subjects were included in the analysis. The analyzed group consisted of 494 women (82.6%) and 105 men (17.4%), between 20 and 70 years of age (M = 42.5, SD = 9.39). Work experience ranged from 1 to 39 years (M = 14.40, SD = 9.96). There were no significant differences in the distribution of age, F(2, 550) = 2.33, p = .099, η p 2 = .01, in the three analyzed occupational groups. There were, however, small (judged by the effect size) but significant differences in the length of service, F(2, 532) = 6.62, p = .001, η p 2 = .02. Care workers (M = 14.23, SD = 9.36) on average had less seniority in comparison to the medical staff (M = 18.26, SD = 11.37, p = .001).

Measures
Apart from the subscales of the COPSOQ II, a few other variables were included in the study and used in the criteria analysis. These variables were categorized into four general groups: job demands, job resources, work-life imbalance, and health and well-being. Job demands. Job demands included interpersonal conflicts at work and workload. Interpersonal conflicts relate to the quality of relationships at work and involve burdensome interactions with superiors and colleagues. These can be of varying severity, from minor quarrels to mental struggles [30]. Workload covers the physical and psychological costs incurred by the employee in carrying out tasks, and is usually measured by the number of working hours, the amount of work performed, the number of activities performed per unit of time, and the subjectively assessed physical and mental effort put into work. These variables were measured with two instruments developed by Spector and Jex [30]: Interpersonal Conflicts at Work Scale (ICAWS) and Quantitative Workload Inventory (QWI). The tools consist of four and five items respectively (e.g. "How often do you get into arguments with others at work?"; "How often does your job require you to work very fast?"), with a five-point response scale, ranging from 1 (less than once a month) to 5 (a few times daily).
Job resources. Job resources included social support at work from supervisors and coworkers. In order to assess these variables, a subscale of the Psychosocial Work Conditions was used [31]. This 16-item subscale measures two sources of social support: from supervisors and from co-workers (e.g. "To what extent can you count on your superiors to give you directions on how to resolve a difficult situation?"). All items are scored from 1 (very limited extent) to 5 (very large extent).
Work-life imbalance. This is represented by work-family and family-work conflicts. These are types of role conflict, in which role requirements related to one area of life make it difficult or impossible to fulfill role requirements related to another area of life [32]. These variables were assessed with the Work-Family and Family-Work Conflict Scale [33]. This is a tenitem instrument (e.g. "The demands of my work interfere with my home and family life"; "I have to put off doing things at work because of demands on my time at home") with a sevenpoint response scale, ranging from 1 (strongly disagree) to 6 (strongly agree).
Health and well-being. These include three variables-depression, job burnout and job satisfaction. Depression was assessed using the Center for Epidemiological Studies Depression Scale (CES-D: [34]). The CES-D consists of 20 statements, which measure the frequency of depressive symptoms experienced in the past week. The statements refer to depressed mood, feelings of guilt and hopelessness, psychomotor slowdown and sleep disorders (e.g. "I didn't want to eat; I didn't have an appetite"). Answers are provided on a four-point scale, from 0 = rarely or not at all (less than one day) to 3 = most of the time or all the time (five to seven days).
Job burnout was measured with the Oldenburg Burnout Inventory (OLBI: [35]). This 16-item scale consists of two subscales for exhaustion and disengagement from work. Exhaustion is a response to intensive physical, affective, and cognitive strain; it manifests in fatigue, weariness, and a decrease in energy. Disengagement is expressed by distancing oneself from work and by experiencing negative affect towards it [36]. A five-point response scale ranged from 1 (I completely disagree) to 5 (I completely agree). Job satisfaction was measured with the Job Satisfaction Survey [37]. This is a 36-item scale with a six-point response scale, ranging from 1 (highly disagree) to 6 (highly agree), that concerns employee attitudes towards the job and different aspects of the job, such as pay, promotion, communication or nature of work (e.g. "There is really too little chance for promotion on my job").

Analytical procedures
For each subscale, the following descriptive statistics were calculated at both measurement points: mean (M), median (Me), standard deviation (SD) and the value of standard  [19] was followed. Eight COPSOQ II subscales measured with a single question (related to: general health, rumors, conflicts at work, disruptive behavior, sexual harassment, threatening and physical violence, and mobbing) were not included in the analysis. Hence, CFA was carried out on 33 subscales based on one model, which included 119 observable variables forming 33 latent variables. Next, reliability analysis was performed using two methods: one based on internal consistency analysis and the other using the test-retest method. An analysis of theoretical validity was calculated by the means of coefficients of COP-SOQ II variable correlations with criteria variables and covered interpersonal conflict at work, workload, co-worker and supervisor support, work-family and family-work conflicts, depression, job burnout (including exhaustion and disengagement from work) and job satisfaction. The subscales of COPSOQ II and all criteria variables fell within the scope of the questionnaire. Descriptive statistics and reliability analyses were carried out for the data collected in both measurements. Other analyses were based on the results obtained in the first measurement.

Ethics
The study was approved by the Regional Ethics and Bioethics Committee of the Cardinal Wyszyński University in Warsaw (KEiB-31/2020 of June 10, 2020) and informed consent was obtained from all individual participants included in the study.

Confirmatory factor analysis
In order to perform the CFA analysis on the model with 119 items and 33 scales, it was necessary to replace the missing data with an expectation-maximization (EM) algorithm. Missing data constituted only 1.8% of all data points. The analyses were carried out using the lavaan and semTools libraries in the R statistical environment. Confirmatory analysis was carried out using the extraction of factors, such as ML-maximum likelihood. The following standard fit parameters were computed: Root Mean Square Error of Approximation (RMSEA), Standardized Root Mean Squared Residual (SRMR), Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI). Although there is no complete consensus among authors as to the criteria for a "good fit" of the model [38], most assume that CFI and TLI > .90, RMSEA < .06 and SMRM < .08 indicate that the model represents a good or very good fit to the data. According to more relaxed criteria, the acceptable values are: CFI and TLI > .85, RMSEA < .08 and SMRM < 0.10 [39][40][41]. Table 2 presents the indices of the model fit to the data. Despite the restrictive assumptions of the confirmatory analysis, the model showed a good fit. Although the χ 2 statistic proved to be statistically significant-a typical result for large sample sizes-all other parameters indicated acceptable or good fit. The CMIN (χ 2 /df) coefficient showed almost ideal fit values (concentrating around 2), RMSEA was below .05, and SRMR was below .08, which indicates a very good fit. The CFI and TLI values were slightly lower than acceptable but considering that they were obtained without the necessity of additional "interfering" with the model by means of modification indexes-e.g. imposing measurement error covariance or removing items from subscales-they do not seem to be that problematic.   Further, the factor loadings for each item were analyzed. According to one standard, acceptable factor loadings should be greater than .32 [42]. All factor pathways proved to be significant, which indicates a good fit of the model. The values of factor loadings were mostly high or very high and ranged from .29 to .95 for individual subscales. Although the value of .29 may seem relatively low, given the complexity of the model and the number of analyzed parameters, it can be assumed that it is an acceptable value. Table 3 presents factor loadings for all subscales included in the model. In all tables, factor numbers reflect the subscale numbers of the original COPSOQ II. Subscale no 27 (General Health) was excluded from all analyses since it consisted of only one item.

Reliability analysis
Next, we assessed the reliability of each subscale. The measures of the internal consistency analysis were Cronbach's Alpha (α) and three coefficients Omega (ω), according to Raykov, Bentler and McDonald. According to Nunnally's criterion, an internal consistency ratio above .60 is considered acceptable in some cases, above .70 is considered good and above .80 is considered very good [43]. Table 4 presents the reliability coefficients. The results of the analysis indicate that the vast majority of the COPSOQ II subscales demonstrate acceptable, good or very good measures of reliability. Four subscales have proved problematic in both measurements: variation, influence and commitment to the workplace, as well as demands for hiding emotions. Their measures of internal consistency yield below acceptable values; therefore, these subscales should be used with caution.
Although the model fit was acceptable, we have found some discriminatory problems with several subscales. Table 1A in the Appendix presents Pearson's r correlational coefficients for the examined subscales/factors. In eight cases, the correlational coefficient exceeded the standard criterion of r = .85, which might indicate discriminatory problems and, thus, excessive similarities among some subscales. Cognitive demands (F3) and Emotional demands (F4) had a correlational coefficient value of r = .98; Meaning of work (F9) and Commitment to the workplace (F10), r = .92; Rewards (F12) and Trust regarding management (F24), r = .86; Quality of leadership (F15) and Social support from supervisor (F17), r = .86; Trust regarding management (F24) and Justice and respect (F25), r = 1.00 (especially problematic); Stress (F30) and Depressive symptoms (F31), r = .89; Stress (F30) and Cognitive stress symptoms (F33), r = .90; Depressive symptoms (F31) and Cognitive stress symptoms (F33), r = .98. Generally, the results obtained indicate a good fit of the model to the data, despite its complexity and the discriminatory problems of several subscales. In all likelihood, the removal of questions with low factorial loadings from the questionnaire and, especially, the modification of the model based on modification indexes (e.g. imposition of the measurement errors covariance), would allow for a better fit. These measures, however, were decided against since the model already had an acceptable fit. Also, interference in the structure of the tool would make it difficult to compare the results of the survey using the Polish version of the questionnaire with the results obtained in other countries. In addition, the parameters of fit obtained in the study are satisfactory enough to confirm the structure of the tool in the version postulated by its authors [19].
The results of the test-retest reliability analysis are presented in Table 5. All observed r-Pearson correlation coefficients between the measured variables in the first and second measurements turned out to be positive and statistically significant, at the level of p < .001. The value of the correlation coefficients, however, is not high and ranges from r = .15 to r = .34. A broader reference to these results can be found in the discussion. Table 6 presents the results of the correlation analysis for the convergence validity of the COP-SOQ II Polish version. In general, high levels of five types of demands at work (measured with the COPSOQ II) were related to high quantitative workload (from r = .29 for quantitative demands to r = .56 for cognitive demands), work-family conflict (from r = .14 for cognitive demands to r = .45 for quantitative demands) and exhaustion (except cognitive demands, from r = .12 for emotional demands to r = .50 for quantitative demands). Work organization and job contents variables correlated positively with job satisfaction (from r = .21 for variation to r = .62 for commitment to the workplace) and negatively with disengagement from work (from r = -.27 for influence to r = -.60 for commitment to the workplace). Interpersonal relations and leadership variables were associated negatively with interpersonal conflicts at work (except predictability, from r = -.13 for social support from colleagues to r = -.36 for social community at work), and positively with high job satisfaction (from r = .28 for social support from colleagues to r = .48 for rewards), supervisor support (from r = .34 for role conflicts to r = .60 for quality of leadership) and coworker support (r = .29 for predictability to r = .53 for social community at work). Work-Individual interface variables were associated with high work-family conflict (from r = .14 for job insecurity to r = .63 for work-family conflict) and family-work conflict (from r = -.18 for job satisfaction to r = .42 for family-work conflict), as well as with low job satisfaction (except job insecurity, from r = -.19 for family-work conflict to r = .46 for job satisfaction). Values at workplace variables were positively related to job satisfaction (from r = .21 for mutual trust between employees to r = .45 for trust regarding management) and negatively related to exhaustion (from r = -.24 for social inclusiveness to r = -.39 for trust regarding management) and disengagement from work (from r = -.25 for mutual trust between employees to r = -.51 for trust regarding management). Health and well-being variables correlated with high level of depression (from r = .31 for burnout to r = .51 for depressive symptoms) and exhaustion (from r = .34 for sleeping problems to r = .48 for stress).

Convergence validity
The results of the analysis have confirmed the high theoretical accuracy of the Polish version of COPSOQ II.

Discussion
The aim of the paper was the full evaluation of the psychometric properties of COPSOQ II among social service professionals work in direct relationships with other people in Eastern part of Europe. The paper has examined the reliability measures, criteria accuracy and theoretical accuracy of the 33 COPSOQ II subscales. The conducted analyses have excluded singleitem theoretical constructs. Reliability has been tested using two methods: internal consistency analysis and an absolute stability estimation method (or test-retest method). The analyses have confirmed the validity of internal consistency parameters (Cronbach's alpha coefficients and three Omega coefficients) for the vast majority of subscales. Four subscales (demands for hiding emotions, variation, influence, and commitment to work), which yielded a reliability of less than .60, should be used with caution. Slightly poorer results have been obtained for the absolute method stability. The r-Pearson correlation coefficients for each subscale in the first and second measurements ranged from r = .15 to r = .34. All were statistically significant at the p < .001 level, although they cannot be considered high. The quite low correlation coefficients may be related to the length of the interval between measurements. The optimal time interval has not been determined, as it depends to a large extent on the characteristics of the subject of measurement, the number of test items and the specificity of the research sample; however, it is recommended that the interval between measurements should range from a few weeks to a few months [44]. In the present study, the adopted interval was about 12 months, so it was slightly longer than recommended. Regarding the accuracy of the criteria, the CFA has supported the 33-subscale design of the tool. Importantly, the analyses of this study have been based on the model proposed by the authors of the original version of the questionnaire [19]. The single model analysis, including 119 observable variables and 33 latent variables, produced satisfactory or good individual COPSOQ II subscales matching results. Note that no other studies are known to have confirmed the original COPSOQ II design in a similar way. Moreover, no other studies, to our knowledge, have even attempted to verify the structure of COPSOQ II treated as a complete instrument (without peaking particular subscales). To some extent, this is understandable since COPSOQ II is a very extensive and multidimensional tool, and this type of analysis requires lots of effort and computational power. Conversely, without a doubt, each psychological instrument requires psychometric validation. Thus, we believe that our study is without precedence and delivers important knowledge on the psychometric properties of the COP-SOQ, also confirming its structure.
In our study, the theoretical accuracy of COPSOQ II, measured by the correlation of tencriterion variables, has been confirmed. For example, demands at work variables in COPSOQ II were highly correlated with both job stressors (quantitative workload and work-family conflict) and poor occupational health (exhaustion). Work organization and job contents variables were related to high job resources (supervisor and coworker support), while health and wellbeing variables were negatively associated with depression and job satisfaction. Notably, a sufficiently high and steady pattern of correlation results with variables relating to the same phenomena but measured with different measurement tools: depressive symptoms-depression, burnout-exhaustion/disengagement from work, colleagues' support-co-workers' support, has been observed. Conversely, the relationships of some variables measuring very similar constructs were not very high (e.g. quantitative demands-quantitative workload).
A limitation of the Polish COPSOQ database and the derived reference values, however, is that the data were collected from a relatively small population and are not based on a representative sample of the employed population. Therefore, the results may be limited to human service workers only. The overall conclusions are: (1) the first full validation confirmed the structure proposed by the authors of the tool; (2) COPSOQ II is characterized by good psychometric parameters and could be successfully used to fulfill the demand for good and comprehensive assessment methods, also in the Polish job market settings. The research outcomes may enable further international comparative studies, particularly in the Polish context, where there has been very limited availability of developed measurement methods accounting for the changing nature of the work environment, including working conditions, contemporary requirements of modern organizations, the specificity of the current labor market and new forms of work.
The present study has been conducted among workers of the so-called social mission professions. Recently political and economic changes in Poland have modified the proffessions. There was rapid professionalization and bureaucratization of human services institutions. From small local agencies, these gradually evolved into large, modern organizations with extensive bureaucratic structures [45]. This process was accompanied by weakening formalized relations between employees. The professional attitude of human service workers also changed from being vocation-based and exhibiting strong commitment to more business-based, profit-oriented services. For many employees, who had a sense of mission, these changes resulted in the weakening of the importance of their work and their role in it, a decrease in identification with their profession and institution, as well as a tendency to leave the profession. According to the latest data on Poland, these occupations have been affected by personnel shortages [46]. According to the report Health at a Glance. Europe 2018 [47] prepared by the OECD, Poland has a lower number of employed nurses per 1,000 inhabitants than the average in the EU (5.2 compared to 11.1 in Sweden, 14.3 in Finland, 16.9 in Denmark and 17.5 in Norway). Similar disproportions occur in the case of practicing doctors per 1,000 citizens (2.4 in Poland compared to 4.3 in Sweden, 3.2 in Finland, 3.7 in Denmark and 4.5 in Norway). Regarding doctors, the problem has been caused by medical studies admission quotas, long career paths and qualified specialists undertaking employment abroad. The shortage of nurses and midwives has been due to the reluctance of young people to enter the profession, the lack of valid permissions of qualified employees returning to work after a career break, and the retirement of experienced workers.