The Polish COVID Stress Scales: Considerations of psychometric functioning, measurement invariance, and validity

The COVID Stress Scales (CSS) were developed to measure stress in response to the COVID-19 pandemic. To further investigate the psychometric properties of the CSS, we used data collected in Poland across two waves of assessment (N = 556 at T1 and N = 264 at T2) to evaluate the factor structure, reliability (at the item and scale level), measurement invariance (across the Polish and Dutch translations of the CSS, and time), over time stability, and external associations of the Polish-language version of the CSS (CSS-PL). Overall, results suggest that the CSS-PL is psychometrically robust, largely invariant across the countries and time-lags considered. The CSS-PL was also positively related to other measures of COVID-19 fear, health anxiety, obsessive compulsive symptoms, anxiety, depression, and intent to receive a COVID-19 vaccine. This study thus provides considerable information about the CSS’s items and scales, and lays the foundation for future investigations into COVID stress across time and different populations.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic has been one of the largest, most widespread pandemics since the end of World War 2 [1,2]. The COVID-19 pandemic, by nature of its scope and complexity, represents a psychosocial stressor [3] that is distinct from other natural disasters-such as earthquakes or tsunamis-and previous epidemics such as severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS) and Ebola [4]. The current pandemic has disrupted healthcare systems, political dynamics, economic stability, and social functioning around the world [4]. The COVID-19 pandemic has also negatively affected mental health and well-being in a variety of ways [1,[5][6][7]. One specific psychological consequence of COVID-19 may be the emergence of a multidimensional stress response linked to the pandemic. This has been termed the COVID Stress Syndrome, and is measured by the COVID Stress Scale (CSS) [7,8].
Given the global nature of the COVID-19 pandemic we anticipate an increasing amount of cross-cultural research into its psychological consequences. This research will require psychometrically vetted instruments that can be used and linked across diverse countries and . We also considered how strongly the CSS-PL was related to a similar assessment of COVID related stress-the Fear of COVID-19 Scale  [21]-and, to examine divergent associations, the extent to which the CSS-PL was associated with impulsive behavior measured by the Impulsive Behavior Scale Short Version (SUPPS-P) [22]. Finally, we examined concurrent and lagged (across a 1-month interval) associations between the CSS-PL and intent to receive a COVID-19 vaccine.
We emphasized intent to be vaccinated as an important outcome since hesitancy toward COVID-19 vaccination may constitute a major obstacle in achieving herd immunity [23], and the issue of vaccination has been intensively investigated in the recent literature [23][24][25]. Although links between the CSS and intent to receive a COVID-19 vaccine was not examined in the original study [8], prior research on the H1N1 influenza (also known as swine flu) pandemic [26], as well as other research on the COVID-19 pandemic [27], is instructive. In these instances the protective motivation theory (PMT) has been invoked. This theory postulates that protection motivation, i.e., "a behavioral intention to perform a maladaptive or adaptive behavior" [26: 816] is a result of two factors-threat and coping appraisals [BISH]. Indeed, there is evidence that higher general anxiety is associated with increased likelihood of protective behaviors [26], and that threat-appraisal of potential infection is also associated with the motivation for social distancing in the COVID-19 pandemic [27]. Based on this past work and the protective motivation theory we expected that stress and anxiety as measured by the Polish version of the CSS at Time 1 would be positively related to intent to have a COVID-19 vaccine at T1, and after four weeks later at T2.

Participants and procedure
All questionnaires were administered to the Polish sample via an online survey between March 29, 2021 and April 29, 2021. During this period there was a significant increase in COVID-19 infections in Poland, and several new safety measures were imposed, including the closure of big-box stores and shopping malls, limits on people attending religious worship, closure of hair and beauty salons, nurseries and kindergartens (www.gov.pl).
The data collection protocol was approved by the Research Ethics Committee at Faculty of Psychology and Cognitive Science, Adam Mickiewicz University in Poznań (decision number: 1/02/2021) and all respondents consented prior to beginning the survey. The inclusion criteria required potential participants to be at least 18 years old and provide informed consent. No exclusion criteria were applied in the current study. Compensation for participation was provided in the form of a lottery in which participants could win vouchers ranging from 10 PLN to 25 PLN for an online store. We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.
The minimal sample size of the validation study was determined based on an a priori power calculation (https://sample-size.net/correlation-sample-size/). Specifically, to detect smallsized correlation coefficients (.20) with sufficient statistical power (.80) the study would require at least 194 respondents. Further, since our analyses employed the IRT-based methodology that requires a large sample size (N > 500) [28], we allowed a larger reference sample size to be recruited to ensure the requirements of IRT analysis.
The Polish sample originally consisted of 556 respondents (62.10% women; aged 18-85 years with an average age of 43.55). Among 556 participants at T1, 19.10% (n = 106) had had COVID-19, whereas 274 participants (49.30%) participants definitively endorsed not having had coronavirus. At the second wave of assessment 264 participants from the first assessment completed the CSS-PL (attrition rate of 52.50%). In turn, the Dutch sample come from Vos and colleagues' study [2] and includes 382 Dutch individuals (45.80% women; aged 19-76 years with an average age of 30. 49). The participants in the study by Vos and colleagues [2] were recruited through the Prolific online working platform and the social networks of involved students. The materials were in the Dutch language, and respondents mainly originated from the Netherlands and Belgium. The final sample in the study by Vos and colleagues [2] included 546 participants (M age = 29.81, SD = 10.36). Among the eligible 546 respondents, 382 individuals were from the Netherlands, and 112 were from Belgium. In the present investigation, for IRT analysis that requires larger samples [2], we utilized only the subset of the data collected from the respondents from the Netherlands. For a detailed description of the Dutch sample and the procedures used to collect this sample, see Vos and colleagues [2].
The data for the Polish sample is freely available through the Open Science Framework (https://osf.io/k5234/?view_only=b3a0973e00664747b148f661d67c11d4). The data collected in the Netherlands by Vos and colleagues [2] are also available through the Open Science Framework (https://osf.io/xb865/).

Measures
The COVID-Stress Scales [8] is a 36-item self-report instrument consisting of five subscales: (1) COVID danger and contamination (12 items); (2) COVID fears about economic consequences (6 items), (3) COVID xenophobia (6 items), (4) COVID traumatic stress symptoms (6 items) and (5) COVID compulsive checking and reassurance seeking (6 items). All the 36 items are rated on a five-point Likert scale from 0 to 4. The CSS was originally developed in English. With the permission of the original authors of the CSS [8], the CSS was translated into the Polish language following the recommendations of the "ISPOR Task Force for Translation and Cultural Adaptation" [29]. A series of pilot studies suggested an effective translation process, and that the CSS-PL functioned similar to the CSS in terms of the metrics used in the original development paper (e.g., factor structure, reliability). Greater detail about the initial translation and evaluation of the CSS-PL can be found in (along with the translation itself; see S1, S2 Files and S1-S3 Tables). The mean scores of each subscale were calculated and used in the analyses.
The Fear of COVID-19 Scale  [21] includes seven items assessing fear related to the COVID-19. Participants are asked to rate each item using a five-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). In the present study a Polish translation of the FCV-19S was used, and the Omega coefficient for the scale was .89. The mean score of the FCV-19S was calculated and used in the analyses.
The Patient Health Questionnaire-4 (PHQ-4) [18] (Polish translation for derived from the website: https://www.phqscreeners.com/select-screenert). The PHQ-4 includes four items assessing anxiety (2 items) and depression (2 items) over the past two weeks. Participants rate four items a on a four-point Likert scale ranging from 0 (not at all) to 3 (almost every day). In the current study, the Omega coefficient of the PHQ-4 was .90. The mean score of the PHQ-4 was calculated and used in the analyses.
The Short Health Anxiety Inventory (SHAI) [16] in Polish translation [30] assess health anxiety independently of physical health status. Participants rate items on a four-point scale from 0 to 3 describing their anxiety over the past six months. In this study, the Omega coefficient for the SHAI was .93. The mean score of the SHAI was calculated and used in the analyses.
The Social Desirability Scale  [20] in Polish adaptation [31] includes 16 items assessing social desirability. Respondents rate each statement as either true (1) or false (0) for them. In the current study, the Omega coefficient for the SDS-17 was .75. The mean score of the SDS-17 was calculated and used in the analyses.
The Obsessive Compulsive Inventory-Revised (OCI-R) [17] in Polish adaptation [32] is an 18-item instrument measuring obsessive-compulsive symptoms. Participants rate 18 items using a five-point Likert scale ranging from 0 (not at all) to 4 (extremely). In the current study, we used items pertaining to compulsive washing (3 items) and checking (3 items). In the current study, the Omega coefficients of the washing and checking scales were .82 and .80, respectively. The mean scores of the OCI-R was calculated and used in the analyses.
The Political Beliefs Questionnaire (PBQ) [19] was designed and validated in Poland to measure religious fundamentalism, xenophobia, and economic beliefs [19]. The PBQ consists of 19 items rated a five-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). In the current study, xenophobia subscale was used; the Omega coefficient was .91. The mean score of the xenophobia subscale was calculated and used in the analyses.
The Impulsive Behavior Scale Short Version (SUPPS-P) [22] in Polish adaptation [33] assesses different facets of impulsivity, including urgency, premeditation, perseverance, sensation seeking. Items which are rated on a four-point Likert scale from 1 (strongly agree) to 4 (strongly disagree). In the current study, the sensation seeking subscale (4 items) was used; the Omega coefficient was .87. The mean score of the impulsive behavior subscale was calculated and used in the analyses.

Demographic information.
Respondents were asked to indicate their age, the gender they identify with the most, highest educational level, place of residence, whether they have been infected by the virus, whether they knew anyone that is/was infected by the virus, if they have received a COVID-19 vaccine, and if they intend to receive a COVID-19 vaccine (rated on a 5-point Likert scale).

Data analytic strategy
The analytic strategy consisted of five major steps. In the first step we used exploratory structural equation modeling (ESEM) [34] to examine the dimensional structure of the final Polish CSS (CSS-PL). ESEMs were fit in both the Polish (Time 1) and Dutch samples. Based on the CSS' construction (i.e., 6 distinct scales tapping into an overarching syndrome), results from the original dimensionality assessment, and pilot data on the CSS-PL, single factor, 4-factor, 5-factor, and 6-factor solutions were all specified. In the 5-factor solution the Danger and Contamination scales were specified to target the same factor. In the 4-factor solution the Traumatic Stress and Checking scales were also combined into a single factor. The ESEMs were fit in Mplus version 8.5 [35] with oblique target rotations using mean and variance adjusted weighted least squares (WLSMV) estimation.
In the second step, we examined the extent to which the CSS demonstrated measurement invariance across the Polish and Dutch translations. Tests of measurement invariance were based on the graded response model [36], and two variants of the improved Wald Test for DIF [37,38]. The graded response model (GRM) is an IRT model for polytomous items (i.e., items with more than two response categories); the GRM is effectively a categorical confirmatory factor model in the responses to a given CSS item are model as a probabilistic function of the latent trait of interest [39]. For the DIF analysis, all items and item parameters were first simultaneously tested for measurement non-invariance (NI) in an initial sweep. An advantage of this approach is that all items are simultaneously tested for NI; however, it is prone to an inflated false-positive rate [38]. Thus, this was used primarily to identify anchor items, and flag items that might contain NI across groups. More focused, robust tests of NI were subsequently conducted based on these initial results. All items that showed no evidence of NI in the initial sweep were constrained to equality across groups in the subsequent NI model, while all items that exhibited evidence of NI were freely estimated across groups. Additional NI models were then fit in which the item parameters that did not exhibit NI in a prior model were constrained to equality across groups, while those item parameters that did evidence NI were freely estimated. This was done in order to identify the most parsimonious multi-group model that accounted for NI, and because more anchor items increases the power and robustness of NI tests. The item response models and NI tests were conducted in flexMIRT version 3.6 [40] using full information maximum likelihood estimation with the supplemented expectation maximization (SEM) algorithm [37].
In the third step, we considered the reliability of the CSS-PL at the item the scale level. Item discrimination values, and scale information functions, from the final graded response model identified in the previous step (i.e., the most justifiably constrained model across groups) were used to help characterize the reliability of the CSS-PL. Coefficient omegas (ω) [41] were also computed for each scale of the CSS-PL in order to provide a single value, holistic summary of scale reliability.
In the fourth step, we examined measurement invariance in the CSS-PL from Time 1 to Time 2, as well as rank-order stability over time. Tests of measurement invariance here followed the same procedure outlined in step two, however, instead of comparing the Polish and Dutch samples, the comparisons were between the Polish sample at times 1 (reference assessment) and 2 (focal assessment).
In the last step, we assessed the convergent, discriminant, and criterion validity of the CSS-PL by calculating Pearson's r correlations using the SPSS v. 27.00. Convergent associations were examined between the CSS-PL and measures of anxiety-related traits (the SHAI), obsessive-compulsive symptoms (the OCI-R), as well as COVID-19-related fear (the FCV-19S). These correlates have notably been included in other examinations of the CSS and its translations [8,11]. Discriminant associations were examined between current anxiety (the PHQ-4), depression (the PHQ-4), xenophobia (the PBQ), social desirability (the SDS-17) and sensation seeking (the Short UPPS-P). Concurrent and prospective criterion associations were examined (among unvaccinated participants) between the CSS-PL and intent to receive a COVID-vaccine within Time 1, and from Time 1 to Time 2.

Attrition rate
In the first step of analysis, we determined the attrition rate between the first and second assessment after four weeks. The attrition rate was estimated to be 52.50% (292 participants dropped out of the study by T2). On average, participants who did not complete the second assessment were older, less willing to receive a COVID-19 vaccine, and more likely to be unsure about if they knew anyone infected with COVID-19 compared to those who completed the second assessment (see Table 1).

Factor structure of the CSS in the Polish and Dutch samples
Model fit for the 1, 4, 5, and 6 factor ESEM solutions in the Polish and Dutch samples are presented in Table 2. The factor loadings associated with the various solutions are presented in S4-S7 Tables.
Scree plots for COVID Stress Scales item factor analyses in the Polish and Dutch samples are provided in Fig 1. The 4, 5, and 6 factor solutions all fit well on the basis of conventional fit thresholds (i.e., RMSEA/SRMR < .08; CFI/TLI > .90) [42], suggesting that these three solutions represent reasonable characterizations of the CSS in both samples. In both samples the pattern of factor loadings in the 6-factor solution was consistent with the original six scales of the CSS. Consistent with the 5-factor solutions reported by Taylor and colleagues [8], in the 5-factor solutions the Danger and Contamination scales loaded onto a single factor. However, in both the Polish and Dutch samples the Traumatic Stress and Checking scales also appeared to cohere into a single factor in the 5-factor solutions. Four factors thus appeared adequate to characterize the CSS in both the Polish and Dutch samples, however the 5-factor solution was ultimately retained for the remainder of the analyses. This was done to remain consistent with the original results provided by Taylor and colleagues [8], as well as with other non-English language versions of the CSS [11,43,44]; that is, most applications of the CSS use the 5-factor solution, and so it was retained here to maintain conceptual coherence across the relevant literature. Also, the 5-factor solution still appeared reasonable in the Polish and Dutch samples, even if it was not the most parsimonious (it simply involved splitting the Traumatic Stress and Checking Scales into distinct factors). As a follow-up to the ESEMs a correlated factors model was fit based on the 5-factor solution. The results for this model can be found in S8 Table. Items loaded strongly on their associated factors, with average factor loadings of λ = .83 for the Polish sample, and λ = .82 for the Dutch sample.

Measurement invariance in the CSS across Poland and the Netherlands
The results from the MI tests are presented in Table 3.

PLOS ONE
The Polish COVID Stress Scales Table 3 includes the Wald test statistics for the total item MI tests (I χ 2 ), item discrimination value MI tests (a χ 2 ), and item difficulty value MI tests (all item difficulty values are tested simultaneously; b χ 2 ). Degrees of freedom were either 5 (for the total item tests), 1 (for the discrimination tests), or 4 (for the difficulty tests). Reliable NI was detected for three items of the Danger-Contamination Scale, 0 items of the Socioeconomic Consequences Scale, two items of the Xenophobia Scale, four items of Traumatic Stress Scale, and two items of the Checking Scale. Overall, 11 out of 36 CSS items (31%) demonstrated NI in discrimination and/or difficulty values across countries; on average 35% of the items in a given scale demonstrated some NI.
Item parameter estimates from the final, constrained multi-group graded response models are presented in Table 4; item parameters that were not constrained to equality across groups are presented in bold.
To provide a sense of the practical impact of the NI that was detected, Table 4 also includes Cohen's ds for the mean scale differences across countries while assuming MI (i.e., with all item parameter estimates constrained across groups), and with NI modeled (i.e., with the pattern of parameter constrains presented in Table 4). The items that demonstrated NI tended to be more discriminating, and less difficult, for the Polish sample (average a = 2.63; average b = .50) compared to the Dutch sample (average a = 2.56; average b = .97). That is, these items were somewhat more informative, or better at differentiating between individuals, in the Polish sample; the exception here were the items in the Danger-Contamination scale, which were more discriminating for the Dutch sample. Lower values on the latent factors were also generally needed for endorsing higher categories in the Polish sample compared to the Dutch sample. That is, among individuals with similar levels of COVID stress, those in the Polish sample were more likely to endorse a higher response category than those in the Dutch sample.
Although there was evidence of NI in several items, it generally did not appear to have a large effect on group comparisons across countries. Compared to the Dutch sample the Polish sample scored higher on the Danger-Contamination, SES consequences, Traumatic Stress, and Checking scales, and lower on the Xenophobia scale, regardless of if NI was modeled or not (Table 4). NI generally had the effect of slightly exaggerating these group differences. Modeling NI reduced the observed (MI assumed) Cohen's ds by between 19% (from d = .37 to d = .30 for the Xenophobia Scale) and 46% (from d = -.26 to d = -.14 for the Danger-Contamination Scale) (there was no NI in the SES Consequences Scale, and NI actually increased the      overall test of measurement invariance across items; a χ 2 = test of measurement invariance in item discrimination values; b χ 2 = test of measurement invariance in item difficulty parameters. Chi square values from specific measurement invariance tests presented in cells; bold denotes a statistically significant chi square value at p < .05 (the degrees of freedom for the total item tests were 5, 1 for the a tests, and 4 for the b tests). Significant values here suggest that an item or item parameter may be noninvariant across groups (i.e., significant differences across groups). The initial "All Items" sweep was conducted to identify anchor items and items that may demonstrate non-invariance. This process may over-identify non-invariance however, and so more targeted follow-up tests were conducted using the items and parameters that demonstrated invariance at a previous stage as anchors. The exception was that the presence of non-invariance in the discrimination value suggests that the whole item should be treated as functioning differently across groups, even if there is no evidence for non-invariance in the difficulty parameters (i.e., equal difficulty in the absence of equal discrimination values is not particularly meaningful). In all models the Polish sample was treated as the reference group (factor mean and variance fixed to 0 and 1, respectively) and the Dutch Sample was treated as the focal group (factor and variance freely estimated). https://doi.org/10.1371/journal.pone.0260459.t003

PLOS ONE
The Polish COVID Stress Scales Table 4. Results from COVID Stress Scales item response models with parameter constraints across Polish and Dutch samples supported by measurement invariance tests.

PLOS ONE
The Polish COVID Stress Scales mean difference in the Traumatic Stress Scale by 7%). Overall, when MI was assumed the average Cohen's d was d = ± .48, and when NI was modeled the average Cohen's d was d = ± .38. Thus, although NI inflated the mean differences, it did not fundamentally change the conclusions regarding the initially observed differences between the two samples.

Reliability of the CSS-PL
The item parameters presented in Table 4 highlight that in general the CSS scales are psychometrically robust. Item discrimination values were universally large (i.e., greater than a = .80; which roughly corresponds to a standardized factor loading of λ = . 40), suggesting that all items effectively index the target construct of the specific scales (average a across samples and scales = 2.60). Furthermore, item difficulty values were generally spread across a wide range of latent traits. The high discrimination values of the items, and spread of item difficulty values, are jointly captured in Fig 2A-2E, which depicts the scale information functions across samples for each CSS scale (information functions are based on the parameter estimates presented in Table 4).
For ease of interpretability, the original information logits have been converted into rough estimates of reliability [45]. The scales generally provide a reliability of at least r xx = .75 from one standard deviation below the mean to up to three standard deviations above the mean). Reliability was generally lowest below one standard deviation below the mean, suggesting that the CSS scales are least effective at measuring (or making finer grained distinctions between) individuals who are not experiencing much COVID-19 stress; on the other hand, these scales appear quite reliable when it comes to measuring individuals experiencing average and higher levels of COVID-19 stress.
The general reliability of the scales across the latent trait were further reflected in the coefficient omega values, which were .94, .91, .93, .93 and .80 for the Danger and Contamination, Socio-Economic Consequences, Xenophobia, Traumatic Stress, and Compulsive Checking, respectively. These scales were all moderately to strongly intercorrelated (see S9 Table), with correlations ranging from r = .26 (between Danger and Contamination and Socio-Economic Consequences) to very large (r = .69 for Traumatic Stress and Compulsive Checking).

Test-retest reliability and measurement invariance of the CSS-PL over time
The results from the NI tests across time are presented in S10 Table. This table follows the  same format as Table 3. Reliable NI was detected for three items of the Danger-Contamination Scale, zero items of the Socioeconomic Consequences Scale, one item of the Xenophobia Scale, zero items of Traumatic Stress Scale, and two items of the Checking Scale. Overall, 6 out of 36 CSS items (17%) demonstrated NI in discrimination and/or difficulty values across time; on average 15% of the items in a given scale demonstrated some NI.
Item parameter estimates from the final, constrained NI graded response models across time are presented in S11 Table. In general the items that demonstrated NI tended to be more difficult at Time 2 compared to Time 1, especially for the items of the Danger-Contamination scale. Overall though only a small number of items demonstrated NI, and the NI that was identified appeared to be relatively modest in size. Differences between parameter estimates were small, and mean differences over time were largely unaffected (SX). Scores on all 5 scales

Convergent, discriminant and criterion associations of the CSS-PL
Correlations between the CSS-PL scales and the various criterion measures are presented in Table 5.
The associations between the five CSS-PL scales and COVID-19-related fear (measured by the Fear of COVID-19 Scale; FCV-19S), health anxiety-related traits (measured by the Short Reliability is on the Y axis, latent factor scores are on the X axis in standard deviation units (i.e., 0 corresponds to the factor mean, 1 is one standard deviation above the mean, etc.). Reliability curves are based on the Information Functions from the item response models with parameter constraints supported by DIF tests (see Table 4). Health Anxiety Inventory; SHAI), and obsessive-compulsive washing and checking (measured by the Obsessive Compulsive Inventory-Revised; OCI-R) were all positive and strong (Table 5), confirming the convergent validity of the CSS-PL.
The divergent associations between the CSS-PL scales and general anxiety and depression (measured by the Patient Health Questionnaire-4; PHQ-4), xenophobia (measured by the Political Beliefs Questionnaire; PBQ) and social desirability (measured by the Social Desirability Scale; SDS-17) were all consistent with the pattern of associations expected based on the results of the original CSS development study by Taylor and colleagues [8] (see S12 and S13 Tables for tests of the differences between correlations). All the CSS-PL sales were positively correlated with anxiety and depression, and these correlations ranged from small to medium to large and very large in magnitude. In turn, the CSS-PL Xenophobia Scale was the only CSS-PL scale that was correlated with xenophobia. None of the CSS-PL scales were correlated with the SDS-17 assessing social desirability. Finally, an additional discriminant variable-sensation seeking as measured by the Impulsive Behavior Scale Short Version (SUPPS-P)-only demonstrated a moderate, negative correlation with the COVID Danger-Contamination scale.
With respect to the concurrent criterion validity, correlations between most of the CSS-PL scales at T1 (the exception was the COVID Socioeconomic Consequences scale) and intent to receive a COVID-vaccine at T1 were small-to moderate to strong. With respect to the predictive criterion validity, correlations between the CSS-PL at T1 and intent to receive a COVIDvaccine after four weeks at T2 were significant and positive (the exception was the non-significant correlation with the COVID Xenophobia scale), and their magnitude ranged from small to strong.

PLOS ONE
The Polish COVID Stress Scales

Discussion
The present study described the initial development and evaluation of the Polish-language version of the COVID Stress Scales (CSS-PL) and in so doing also managed to extend prior psychometric vetting of the CSS more broadly [8].

Psychometric functioning of CSS-PL across countries
Consistent with the original CSS [8], as well as other language versions of the CSS (e.g., Persian and Arabic) [11,43,44] a 5 factor structure of the CSS-PL appeared reasonable. This further suggests that across cultural context the items of the CSS tend to cohere into a similar handful of factors (though in certain samples more or less factors may also be defensible), and that these factors are moderately to strongly interrelated, providing evidence for the COVID Stress Syndrome across diverse nations. Our evaluation of measurement invariance also helped to highlight areas of potential similarity and difference when measuring COVID stress across distinct cultural contexts. Overall, slightly less than a third of the items demonstrated some non-invariance (NI) across the Polish and Dutch samples. However, this NI did not seem to be particularly large or impactful in magnitude. Still, results illustrate the importance of at least considering MI when examining and comparing COVID stress across cultural contexts (including developing different language versions), and accounting for it in subsequent analyses when necessary (again, by either revising items, dropping them, or incorporating NI into the measurement model). Tests of NI can also help identify potentially important cultural differences across contexts (i.e., NI may be substantively meaningful), and be used to detect when certain items of the CSS have not translated well across contexts [46,47].
Despite the minor differences in psychometric functioning, both the Polish and Dutch versions of the CSS appeared very reliable. Item discrimination values were universally large (i.e., a > .80), and reliability estimates were greater than r xx = .75 across the range of all scales, especially in the higher range. That is, the CSS well-suited to measuring individuals with slightlybelow average to well-above average levels of COVID stress; conversely, the scales are least reliable for those individuals experience little to no COVID stress. This somewhat asymmetric distribution of the reliability curves is consistent with the overall intention of the scale, however. That is, the goal is the CSS is primarily to identify and rank-order individuals experiencing moderate to large degrees of COVID stress. Relatedly, the pattern of correlations between the CSS-PL scales (ranging from medium to very large) was consistent with the pattern of correlations obtained by Taylor and colleagues [8] and Khosravani and colleagues [11] in regard to the Persian CSS. This, along with our other psychometric results, provides further support for the notion that symptoms measured in the CSS (here CSS-PL) constitute a coherent COVID Stress Syndrome that appears to be a cross-cultural experience [8,11].

Psychometric functioning of CSS-PL across time
Measurement invariance is critical to consider both when comparing groups, and when analyzing change over time. Overall the CSS-PL appeared largely invariant over time. Few items demonstrated NI, and for those that did the differences in the parameter estimates across time, and the impact on estimates of mean change, were negligible. Admittedly four weeks is not a large gap, and so major psychometric changes might not be expected. However, in the context of the COVID-19 pandemic there has a been a steady, rapid stream of developments regarding the threat level, appropriate safety behaviors, vaccine availability, that give rise for hopes for effective treatment options of the COVID-19 [14]. The dynamically changing nature of pathogenesis, diagnosis, prognosis, and treatment options in the COVID-19 pandemic [14] may, therefore, encourage researchers to use shorter intervals between assessments in future studies to capture how the rapidly changing nature of the COVID-19 pandemic may affect stress and anxiety related to the COVID-19.
Although the measurement parameters were generally stable over time, the scale means were lower at T2. That is, the psychometric properties of the CSS might not change much in response to pandemic developments-which is preferable all else being equal-but the scales themselves still detected a considerable amount of non-trivial change over a relatively short period. Rank-order stability was generally high though, which was consistent with what was observed in the recent examination of the Persian CSS [11]. It is important to note though that the CSS was created as a measure of symptoms not as a measure of traits [7] so it is likely that CSS scores may fluctuate in response to increasing or decreasing rates of infections [6]. Taken together results suggest that on average COVID stress decreased over the assessment period, however, individuals experiencing more COVID stress than others at T1 were more likely to score high on COVID stress at T2 as well.

Correlates of the CSS-PL
Convergent associations between the five CSS-PL subscales and the Fear of COVID-19 Scale (FCV-19S) assessing fear of COVID-19 [21] were moderate to strong in size. Positive correlations between the FCV-19S and CSS have been also demonstrated in other studies aiming at developing non-English versions of the CSS-PL (e.g., the Persian and Arabic versions of the CCS [11,44]. This pattern of associations between the CSS and FCV-19S also suggest that the FCV-19S appears to capture two constructs measured by the CSS, i.e., the danger and contamination and the traumatic symptoms and checking constructs [48]. The CSS-PL scales also demonstrated moderate to large associations with measures of anxiety-related traits and obsessive-compulsive symptoms. This is consistent with past findings on the CSS [8,11] and further support the convergent validity of the CSS-PL.
Discriminant associations for the CSS-PL were also consistent with past findings [8,11]. That is, the CSS-PL was not significantly correlated with measures of social desirability or impulsivity, the CSS-PL Xenophobia scale was more strongly correlated with the General Xenophobia Scale than the other scales, while the other CSS scales were more strongly correlated with anxiety and depression than the Xenophobia scale. Relatedly, most of the CSS scales were more strongly correlated with current anxiety than depression, however, this difference was only statistically significant for the Traumatic Stress scale. The lack of a consistent difference between associations with anxiety and depressive symptoms may be related to the timing of data collection. Data on the CSS was originally collected at the beginning of pandemic [8] while the current data was collected roughly a year after the outbreak of the pandemic. It could be the case that fear and anxiety surrounding the outbreak was more distinct from depression in the early days of the outbreak compared to after one year of dealing with the outbreaks consequences.
Finally, we assessed both concurrent and predictive associations between the CSS-PL and intent to receive a COVID-19 vaccine. We found moderate to strong correlations between three CSS-PL scales at T1 (Danger and Contamination, Traumatic Stress Symptoms, Compulsive Checking) and intent to receive a COVID vaccine at T1. That is, higher levels of these fears were associated with greater intent to be vaccinated. Further, the Danger and Contamination, Socioeconomic Consequences, Traumatic Stress Symptoms and Checking scales assessed at T1 were associated with intent to receive a COVID-19 vaccine after 1-month interval. Thus, the intent to be vaccinated at T2 was stronger among those experiencing more COVID stress at T1. These results are consistent with prior research showing that higher anxiety is related to a greater likelihood of protective behaviors in past pandemics [26], and greater motivation for social distancing in the case of the current COVID-19 pandemic [27]. Thus, our findings help to extend prior work concerning protective hygiene behaviors by demonstrating how the CSS can predict important, COVID-related hygiene behaviors such as vaccination, while highlighting the role of psychological factors in helping to understand the intent to receive a COVID-19 vaccine [23-25].

Limitations
Despite a number of strengths, the present work also has limitations that need to be considered when interpreting the findings. First, although the pandemic in Poland as in other countries is likely to be a source of extreme stress [9,10], it is also probable that state of the pandemic vary across countries and as a consequence, the levels and experiences of stress and fears related to COVID-19 also may differ across countries.
Second, the sample utilized in the current study may not be representative of the entire Polish population in several aspects. The recruitment of participants via Facebook did not reach participants who do not use this website, as well as those individuals who do not use the Internet (as may be the case for older individuals). Indeed, although our investigation included participants across a wide age range (from 18 to 85 years), participants 60 years and older represented only 17.30% of the total sample (n = 96). This age group is particularly salient given the mortality rates of COVID-19, and potentially more COVID-19 fear, among older individuals [48]. Further, our sample was over 60% female, and there is evidence for gender differences in regard to attitudes and behaviors towards COVID-19 (e.g., women were more likely to perceive COVID-19 as a very serious health problem and support/comply with restrictive public policies in response to COVID-19 [49]. Third, data for this study was collected between February and April, 2021, one year since the COVID-19 pandemic outbreak, whereas the original study on the CSS was conducted in parallel with the beginning of the pandemic. Moreover, the current investigation was performed after the development and distribution of several highly-effective COVID-19 vaccines. These circumstances are likely to influence the stress and fears experienced by our study participants at the time of the assessment and could affect certain aspects of the findings (e.g., item difficulty estimates; some fears may be more difficult to endorse given the availability of vaccines in certain countries). Further, in the current investigation we used a 4-week between assessments, which is relatively short. Future research would benefit from employing both much shorter and longer intervals to better understand how the measurement and manifestation of COVID-19 stress changes over different stretches of time in the context of a rapidly changing pandemic [14].
Fourth, although we asked participants about type of employment they were engaged in we did not ask about their type of employment. Essential workers for example, experience a greater risk of being potentially infected, which likely has consequences for COVID-19 stress. Participants were also not asked about whether they take care of a child/children, older parents, or other dependents, which also might contribute to various facets of stress related to COVID-19, especially given how prominent concerns for the health of loved ones have been in the current pandemic [6]. Future research would benefit from inclusion of these two additional variables.
Finally, the high attrition rate (52.50%) between the first and second assessment is a major limitation. However, although we detected some differences in baseline demographic variables across time, these differences were very and small in magnitude and the possibility of bias should not be serious. Relatedly, as noted above, 4-weeks may be too short of a time interval to expect NI to become a major concern, which leaves the broader implications of the over-time tests of invariance ambiguous (i.e., to what extent are the measurement properties of the CSS stable over longer periods). However, given the rapidly changing situation regarding the pandemic, short-term measurement invariance is still a potentially important issue worth examining.

Conclusion
The current study demonstrates that the Polish version of the CSS (CSS-PL)-like the original CSS-has a defensible 5-factor structure and robust psychometric properties in terms of reliability and validity. Furthermore, the CSS may be invariant across countries and time, at least over and brief time intervals. The CSS-PL thus appears to be a promising multidimensional instrument for assessing COVID-related stress in the Polish population. The current findings based on the Polish CSS also extend past work on general the psychometric properties of the CSS, which can be useful for researchers across contexts who are interested in assessing the COVID-19 Stress Syndrome.  Table. Factor loadings from 5 factor exploratory structural equation model. CSS = COVID Stress Scale Item; F1. . .F5 = Factors 1 through 5. Models fit using weighted least squares with mean and variance adjustment (WLSMV) estimation and targeted oblique rotation. The rotation targets for items not associated with a factor were set to 0. Factor loadings greater than λ = ±.40 presented in bold. Factor correlations ranged from r = .05 to .55 (average r = .28) in the Polish sample, and from r = .04 to .49 (average r = .31) in the Dutch sample. (DOCX) S7 Table. Factor loadings from 6 factor exploratory structural equation model. CSS = COVID Stress Scale Item; F1. . .F6 = Factors 1 through 6. Models fit using weighted least squares with mean and variance adjustment (WLSMV) estimation and targeted oblique rotation. The rotation targets for items not associated with a factor were set to 0. Factor loadings greater than λ = ±.40 presented in bold. Factor correlations ranged from r = .05 to .55 (average r = .28) in the Polish sample, and from r = .04 to .49 (average r = .31) in the Dutch sample. (DOCX) S8 Table. Factor loadings from 5 factor correlated factors model. CSS = COVID Stress Scale Item; F1. . .F5 = Factors 1 through 5. Models fit using weighted least squares with mean and variance adjustment (WLSMV) estimation. The rotation targets for items not associated with a factor were set to 0. Factor correlations ranged from r = .31 to .82 (average r = .54) in the Polish sample, and from r = .43 to .75 (average r = .57) in the Dutch sample. Model fit for the Polish sample was: χ 2 = 3447.67, df = 584, p < .01; RMSEA = .094; SRMR = .08; CFI = .919; TLI = .912. Model fit for the Dutch sample was: χ 2 = 1720.01, df = 585, p < .01; RMSEA = .071; SRMR = .076; CFI = .937; TLI = .933. Dutch sample has one extra degree of freedom because the factor loading for CSS-10 was fixed to .98 to avoid convergence with negative residual variances. (DOCX) S9 Table. Reliability of the Polish COVID-Stress Scales (CSS-PL) and correlations among the CSS-PL Scales. ��� p < .001. (DOCX) S10 Table. Results from COVID Stress Scale measurement invariance tests across times 1 and 2. DC = Danger-Contamination Scale; SES = Socioeconomic Consequences Scale; XN = Xenophobia Scale; TR = Traumatic Stress Scale; CK = Checking Scale; I χ 2 = overall test of measurement invariance across items; a χ 2 = test of measurement invariance in item discrimination values; b χ 2 = test of measurement invariance in item difficulty parameters. Chi square values from specific measurement invariance tests presented in cells; bold denotes a statistically significant chi square value at p < .05. Significant values here suggest that an item or item parameter may be non-invariant across time (i.e., significant differences across time).The initial "All Items" sweep was conducted to identify anchor items and items that may demonstrate non-invariance. This process may over-identify non-invariance however, and so more targeted follow-up tests were conducted using the items and parameters that demonstrated invariance at a previous stage as anchors. The exception was that the presence of non-invariance in the discrimination value suggests that the whole item should be treated as functioning differently across time, even if there is no evidence for non-invariance in the difficulty parameters (i.e., equal difficulty in the absence of equal discrimination values is not particularly meaningful). In all models Time 1 was treated as the reference group (factor mean and variance fixed to 0 and 1, respectively) and Time 2 was treated as the focal group (factor and variance freely estimated). (DOCX) S11 Table. Results from COVID Stress Scale item response models with parameter constraints across times 1 and 2 supported by DIF tests. DC = Danger-Contamination Scale; SES = Socioeconomic Consequences Scale; XN = Xenophobia Scale; TR = Traumatic Stress Scale; CK = Checking Scale; a = item discrimination; b 1 . . .b 4 = item difficulty parameters; d MiAssumed = Cohen's d for the scale mean difference across time with measurement invariance assumed (i.e., all item parameters constrained to equality); d MiModeled = Cohen's d for the scale mean difference across time with measurement invariance modeled (i.e., only invariant item parameters constrained across time). Items that demonstrated non-invariance are presented in bold. Cohen's ds were computed with Time 1 as the reference group (i.e., positive values denote that scores were higher at Time 2). (DOCX) S12 Table. Tests