Development and calibration of a novel positive mindset item bank to measure health-related quality of life (HRQoL) in Singapore

Background Positive mindset (PM) is an important domain of health-related quality of life in Singapore, a multi-ethnic urban city state in Southeast Asia. We therefore developed and calibrated a novel item bank to measure and improve PM. Methods We developed an initial candidate pool of 48 items from focus groups, in-depth interviews and existing instruments locally developed and validated for use in Singapore. We administered all items in English to a multi-stage sample stratified for age and gender, of subjects with and without medical conditions recruited from the community and a hospital, and calibrated their responses using Samejima’s Graded Response Model. We evaluated a final 36-item bank with respect to Item Response Theory (IRT) model assumptions, model fit, differential item functioning (DIF), concurrent and known-groups validity. Results Among 493 participants (49.3% male, 41.6% above 50 years old, 33% Chinese, Malay and Indian), bifactor model analyses supported unidimensionality: explained common variance of the general factor was 0.86 and omega hierarchical was 0.97. Local independence was deemed acceptable: the average absolute residual correlations were <0.06 and 3.3% of the total item-pair residuals were flagged for local dependence. The overall model fit was adequate and provided good coverage of the PM construct (theta range: -3.6 to +2.4). Five items exhibited DIF with respect to ethnicity and gender, but were retained without modification of scores because they measured important aspects of PM. Scores correlated in the hypothesized direction with a self-reported measure of global health (Spearman’s rho = -0.28, p<0.001) and discriminated between groups of participants with and without a self-reported diagnosis of a mood disorder (p = 0.007) adjusting for age, gender, ethnicity, education and marital status. Conclusion The 36-item PM item bank demonstrated satisfactory psychometric properties for the English-speaking Singaporean population. IRT model assumptions were sufficiently met and scores showed concurrent and known-groups validity. Future studies to evaluate the validity of PM scores when items are administered adaptively are needed.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 of the general factor was 0.86 and omega hierarchical was 0.97. Local independence was deemed acceptable: the average absolute residual correlations were <0.06 and 3.3% of the total item-pair residuals were flagged for local dependence. The overall model fit was adequate and provided good coverage of the PM construct (theta range: -3.6 to +2.4). Five items exhibited DIF with respect to ethnicity and gender, but were retained without modification of scores because they measured important aspects of PM. Scores correlated in the hypothesized direction with a self-reported measure of global health (Spearman's rho = -0.28, p<0.001) and discriminated between groups of participants with and without a selfreported diagnosis of a mood disorder (p = 0.007) adjusting for age, gender, ethnicity, education and marital status.

Introduction
The World Health Organization (WHO) states that health is a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity. [1] PM is defined as thinking positively in life. [2] Although static instruments have been developed, we were not able to identify an item bank specifically measuring this latent construct. [3,4] Item banks that have been developed were initially focusing largely on latent constructs related to physical traits or functions. [5] Item banks that measure psychological constructs such as resilience and emotional distress have also been developed recently. [6,7] In Singapore, the Positive Mental Health Instrument (PMHI) and the Singapore Mental Wellbeing (SMWEB) scales have been developed. [3,4] Although similar latent constructs were being measured, PMHI and SMWEB were conceptualized as multidimensional constructs encompassing positive affect, satisfaction, and psychological functioning. In contrast PM refers to the amount of optimism one has. [3] Being able to measure the magnitude of how an individual thinks positively in life will allow interventions to be created and reduce the negative impact of poor PM. [8] Also, with the high prevalence of mental health conditions, the ability to maintain a PM may reduce the number of patients with mental health conditions. [9] The development of a PM item bank is a foundation for measuring PM and will enable development of short static instruments or computer adaptive testing (CAT) to measure PM as a latent construct.
Further, despite the popularity of HRQoL instruments such as the World Health Organisation Quality of Life Scale (WHOQOL-BREF) and Short Form-36 Health Surveys (SF-36), most of these instruments have been developed and used in Western populations and adapted later for use in other populations. Hence, to address the above gaps, we developed a comprehensive and culturally sensitive PM item bank to measure PM in Singapore. The aim of this study was to calibrate an item bank of PM that includes important and culturally appropriate items measuring PM that can be used across different age, gender and ethnic groups. A successfully calibrated item bank will allow us to be able to develop CAT or short static instruments to measure PM with accuracy and precision in Singapore.

Methods
This institutional board review-approved study (Ref 2014/916/A) consisted of the following sequential steps: development of a candidate item bank, administration of candidate item bank via a community and hospital-based survey, and item bank calibration through assessing the assumptions of item response theory (IRT), fitting the responses to IRT model, testing for differential item functioning (DIF) and testing the PM scores of the item bank using a priori hypotheses.

Development of a candidate item bank
The detailed methodology for the development of candidate items has been reported separately. [2,[10][11][12] In brief, we adapted the Patient Reported Outcome Measurement Information System (PROMIS) Qualitative Item Review (QIR) protocol [13], with input and endorsement from expert panels (comprised patients, members of the general public, and experts in psychology, social work and psychometrics). Items were generated from thematic analyses from focus groups and in-depth interviews and a literature search to identify studies that developed or validated a health-related quality of life instrument among adults in Singapore. Item from these sources were "binned" and "winnowed" (as detailed in the PROMIS QIR protocol) by two independent reviewers, blinded to the source of the items, who harmonized their selections to generate a list of candidate items (each item representing a subdomain). An expert panel reviewed and refined the face and content validity of these candidate items.

A community and hospital-based survey
We recruited Singapore citizens or permanent residents from the community and from the Singapore General Hospital Campus. We sampled 75% English and 25% Chinese (Mandarin) speaking participants separately. Within each language sampling frame, a purposive sample of participants was drawn based on age, gender, ethnicity and presence or absence of chronic illnesses. The list of chronic illnesses was based on the Singapore Burden of Disease Study [14] and is detailed in S1 Table. The presence or absence of a chronic illness was based on a participant's self-report of having been diagnosed of an illness by a physician. Participants were categorized into well, mildly unwell and unwell, according to number and severity of chronic illnesses. We excluded individuals who had impairments that precluded a meaningful exchange of ideas or other conditions that prohibited them from carrying out a normal interview, such as severe mental illness and cognitive impairment. In order to include participants with a wide spectrum of health, we predefined the proportion of participant recruitment in health categories to be 35% well, 15% mildly unwell, and 50% unwell.
Participants from the community were sampled using a residential household sampling frame of public housing, which 82% of Singaporeans reside in [15]. The primary sampling units were plots of land with approximately equal numbers of households, stratified according to geographic location and dwelling type. Households in each primary sampling unit were selected based on fixed route rules and skip patterns based on pre-specified ethnic and age quotas. Only one respondent per household was selected for a face-to-face interview. Three call attempts to each household were made at different times of the day with at least 1 visit on a non-work day (Saturday or Sunday). This residential household based sampling method has been used in the Singapore National Health Survey since 2004 [16,17]. The response rate of the study was computed using the standard set by the Council of American Survey Research Organization [18], generally defined as the number of completed Interviews divided by the number of eligible reporting units in sample. We engaged Nielsen Research Company to conduct the standardized surveys on behalf of the study team.
Interviewers administered the items developed from a previous study conducted by our team in English. [10,11,19] Interviewer administration was selected so that illiterate subjects (who form 20% of Singapore population) could be included so that the resulting item bank could be applied to the entire English speaking population in Singapore (Test administration for illiterate subjects could be accomplished through the use of interviewer-or technologyassisted formats). [20] There were 48 items presented to the participants with 5-level item response options adapted from the PROMIS. The response options of the item were "Never", "Seldom", "Sometimes", "Usually" and "Always" for items on frequency and "Not at all", "Mildly", "Moderately", "Quite a lot" and "Extremely" for items on intensity. We collected demographics including age, gender, ethnicity, education and current marital status. We collected a single-item, participant-reported assessment of global health for comparison.

Item bank calibration
We adapted the methodology published by PROMIS to calibrate the English version of the PM item bank. To test IRT model assumptions, we evaluated unidimensionality using factor analyses, which included Exploratory (EFA) and Confirmatory (CFA) and Exploratory bifactor analyses (with orthogonal rotation). We reported the latter if EFA and CFA showed evidence for secondary dimensions. In the exploratory bifactor analyses, we fit models with two, three and four group factors to clarify any underlying secondary dimensions. After ascertaining adequacy via conventional fit criteria, we used the average relative parameter bias (ARPB), explained common variance (ECV) of the general factor and omega hierarchical (omegaH) to assess whether the presence of multidimensionality does not disqualify interpretation of the instrument as being primarily unidimensional. To calculate these bifactor indices, we used a Microsoft Excel based calculator [21]. We checked for monotonicity using individual category response curves. We evaluated local independence by examining the residual correlation matrix from the single factor CFA. The specific criteria we used are given in Table 1. We used Mplus Version 8.0 software to check for unidimensionality and local independence [22]. We fitted Samejima's graded response model (GRM), a non-Rasch model, to calibrate the items and estimated parameters via marginal maximum likelihood using the Xcalibre 4.2 IRT software (Assessment Systems Corporation, USA). We tested adequacy of overall model fit as well as individual item fits using a chi-square fit statistic. We checked for DIF by these subgroups: age (age < 50 versus age �50), gender (Male/Female) and ethnicity (Chinese vs non-Chinese), using likelihood chi-square statistics from ordinal logistic regression, comparing models with and without subgroup membership as predictor. We tested for uniform and non-uniform DIF using a specially written syntax in IBM Statistics Version 23.0 (http://www-01.ibm.com/ support/docview.wss?uid=swg21572191, downloaded on 18 December 2017). We assessed items for concurrent validity with a self-reported measure of global health ("In general, would you say your health is: Excellent, Very Good, Good, Fair or Poor?", hypothesizing a moderate negative correlation (Spearman's rho < -0.25) between PM theta scores and the global health self-report. A negative correlation was hypothesized with a higher PM score indicating a more PM and a lower score on global health indicating better health. We also assessed knowngroups validity using Analysis of Variance (ANOVA), hypothesizing that PM scores could discriminate between participants with and without a self-reported physician diagnosis of anxiety or depression and those who did not, adjusted for participant's age (20-35, 36-49, 50 and above), gender (Male/Female), completion of secondary education (Yes/No) and current marital status (Single, Married, Divorced/Widowed/Separated) as covariates. We used a 5% significance level. Concurrent and known-groups validity were carried out using the IBM Statistics Version 25.0 software.

Results
Thirty-six of 48 items were retained in the final PM item bank after reviewing initial IRT model fits and adequacy checks and consulting with the expert panel. As this paper focuses on the calibration of the PM item bank, the detailed results of the item generation are being reported separately. [11] A total of 676 subjects completed the PM item bank survey in English (n = 493) or Chinese (n = 183). As this paper focuses on the analysis of the English PM item bank, a total of 493 participants were analysed in this PM item bank calibration study. Characteristics of the study participants are shown in Table 2.

Item analyses
Cronbach's alpha was 0.97, indicating very high inter-item consistency. The mean item-tototal score correlation was 0.68 (SD = 0.08). Correlations ranged from 0.47 to 0.78. Item means ranged from 3.16 to 4.49. The percentage of non-response at the item level was practically nil, ranging from zero to at most 0.4%. As shown in Fig 1, there is good coverage of the PM construct (theta range: -3.6 to +2.4).

IRT Assumptions of unidimensionality, local independence and monotonicity
Unidimensionality was assessed using EFA, CFA and bifactor analyses. In the EFA, 20.9% of the variance was explained by the first factor. The ratio of the first and second highest eigen value was 11.0 (Table 1). Both findings met recommended criteria for assessing unidimensionality. However, CFA showed Comparative Fit Index (CFI) <0.95, Tucker-Lewis Index (TLI) No threshold given Abbreviations: Average relative parameter bias (ARPB), Explained common variance (ECV), item explained common variance (IECV), omega Hierachical (OmegaH). § Maximum ARPB among three exploratory bifactor models with 2,3 and 4 group factors. See Table 3. ‡ Minimum general factor ECV attained among three exploratory bifactor models with 2, 3 and 4 specific factors. See Table 3.
¥ Minimum OmegaH attained among three exploratory bifactor models. See Table 3. Positive mindset item bank item loadings on the single-factor CFA and item loadings on the general factor of the bifactor models were not different according to the average relative parameter bias which were all under 5% (Table 3). Moreover, across all bifactor models, the attained ECVs and omegaHs of the general factor were above 0.80 and 0.95 respectively, much higher than Reise et al's suggested criteria (ECV>0.6 and omegaH>0.7) [23]. Consequently, the instrument can be interpreted as being primarily unidimensional despite the presence of some multidimensionality. Examination of the residual correlation matrix indicated little local dependence: the average value of the residual correlations was <0.06 which was less than the 0.1 threshold. The proportion of item-pairs having problematic residual correlations (i.e., greater than 0.20) was 3.3% (21 of 630). Items 25 and 26 which covered religion and spirituality accounted for 12 out of the 21 problematic residual correlations. We appraised the extent of local dependency to be minor as not to compromise the accuracy of IRT parameter estimation. In terms of monotonicity, we found that none of the items departed from monotonicity in terms of improper ordering.

IRT Calibration and Fit
PM items were scored so that higher scores indicated a more positive mindset. The overall fit of the GRM was found adequate (chi-square = 1616.3, df = 1952, p = 1.000). The items and parameter estimates are presented in Table 4. Using a significance value of 0.01, no item was found to misfit the GRM and p values ranged from 0.029 to 1.000 with a mean of 0.70. Item discrimination parameters varied from 0.68 to 1.72 (mean = 1.27, SD = 0.26) and item difficulty parameters ranged from -4.97 to 1.44. The latent PM trait covered by the items ranged from -3.6 to +2.4, showing more extensive coverage in the lower compared to higher PM traits. Test information was highest at latent trait scores between -3.5 and +0.5 with maximum attained at -2.90. At this range, the conditional standard error of measurement (CSEM) was less than 0.20 and at the maximum, it was 0.144. The CSEM was less than 0.31 (roughly, a reliability of 0.90) for scores below +1.5, and greater than 0.5 for scores above +2.0. Hence lower PM trait scores are measured with greater precision than higher PM trait scores.

Differential item function detection
At the 1% level of significance, none of the items were found to have significant age-related DIF. Five items were flagged for statistically significant DIF. Two items showed uniform DIF with respect to gender, MQ19 ("I do not let my worries overwhelm me") and MQ07 ("I am able to appreciate what each day brings"). In the former item, men had greater odds of endorsing higher frequency options compared to women whereas in the latter, men had lower odds of endorsing higher frequency options than women. Three items showed uniform DIF with respect to ethnicity, MQ25 ("I find comfort in my religion or spiritual beliefs"), MQ26 ("I find comfort in my religious beliefs") and MQ13 ("I am able to deal with stress"). On the first two items, Malays and Indians (i.e., non-Chinese) showed greater odds of endorsing higher frequency options compared with the Chinese. In the third item on dealing with stress, Chinese had greater odds of choosing higher frequency options than Malays and Indians. Response options were ordered in frequency as Always, Usually, Sometimes, Seldom and Never and the response variable in the ordinal regression is the log odds of endorsing higher versus lower frequency options.

Concurrent and known-groups validity evaluation
The Spearman correlation between PM scores and a self-report measure of global health was r = -0.28, supporting the hypothesis of a moderate correlation between the two measures and therefore demonstrating concurrent validity. After adjusting for age, gender, completion of secondary education and current marital status, a statistically significant mean difference was found in PM theta scores between participants with a self-reported diagnosis of anxiety or depression. The mean difference found was 0.62 (95% CI: 0.17 to 1.07 and p = 0.007), demonstrating known-groups validity. We categorized marital status as single, married or widowed/ divorced/separated. We also explored categorizing marital status as currently married vs not currently married, and found the same results ie subjects with a diagnosed mood disorder had significantly lower mean PM scores than those without. This further supports robustness of this assessment of known-groups validity. The results of the validity evaluation are in Table 5.

Discussion
This study described the calibration of a culturally sensitive item bank for PM for the Singapore population. Items from this PM item bank were derived from (1) extensive qualitative research to identify and incorporate perspectives from subjects in the population, representing a wide spectrum of healthy and ill subjects (with chronic diseases) (2) item from and involvement of investigators who developed static instruments measuring related concepts in the same population. The item bank we developed has high content validity in terms of relevance to the value of people in the right socio-cultural context, which can be generalizable to both healthy adults and those having chronic illnesses. The calibration processes aligned with the approach espoused by the PROMIS group [23][24][25][26][27][28]. The findings of this successful calibration indicate that this psychological item bank is a promising tool for measuring PM in the population. PM is a novel construct and is of increasing importance to measure in order to improve optimism, which is an important construct with wide ranging positive impact on health. For example, high PM has been shown to reduce the incidence of mental health issues and may also ameliorate the impact of mental disease. [9] The PM item bank can be used in mindfulness based intervention trials in community or hospital based settings. [29] Also, the PM item bank potentially can be used to assess effectiveness of workplace-based programs to improve PM.
[30] Given the potential application of the PM item bank, more research is needed to understand in depth the impact of PM on HRQoL, for examples in area of workplace stress and resillence building. [31] The analyses of the IRT assumptions show that the required assumptions of unidimensionality and local independence are met. As we did not have prior expectation about the structure of any underlying dimensions, we ran exploratory rather than confirmatory bifactor analyses, positing models with two, three and four group factors. After ascertaining via conventional fit criteria that these models were adequate, we evaluated bifactor-specific indices which verified essential unidimensionality which we successfully did. [24] Both the GRM and DIF tests for gender and ethnicity flagged out five items. This represents less than 15% of the total number Positive mindset item bank of items in the PM item bank. Although this is not ideal in calibration of item bank, other research groups have encountered a similar situation [32,33]. The expert panel recommended retaining these items due to their important content validity and modest impact of the DIF [19,33,34]. In the future, these items may be removed or revised. This study also supports the concurrent construct validity of the PM item bank. Our hypothesis testing showed good discriminative properties between participants with or without a reported diagnosis of anxiety and depression. Mental well-being comprised of various psychological constructs and overall physical well-being contributes to their mental well-being as well. [35] However, the number of participants with mood disorder is very small. Further tests are needed to assess the reproducibility of the results in a purposively sample of subjects with and without a diagnosis of a mood disorder.
We recognize several limitations of this study. First, a significant number of eligible subjects were excluded because quota for these subjects had been met. However, partly because of the use of quota sampling, the demographics in our sample are comparable to that of the population in Singapore. [36] Second, we only included 493 subjects in the analyses. However, our results fulfilled the needed assumptions of IRT calibration as set out by PROMIS. Third, the PM item bank may have poorer coverage on higher PM trait but better coverage on lower PM trait. However, this will unlikely be a problem if we use this item bank in clinical trials whereby patients of interest will likely have poorer PM than the general population. [3] Last, due to resource constraints and questionnaire fatigue, we only used self-reported measures of global assessment and self-reported diagnosis of depression and anxiety (rather than a psychological or mood scale) to measure concurrent validity and known-groups validity respectively. Future research should consider using a validated scale such as the Hospital Anxiety and Depression Scale to further evaluate the validity of the PM item bank.
In conclusion, we developed and calibrated a 36-item bank for PM that is relevant to the English-speaking Singaporean population and applicable to healthy adults and those having chronic illnesses. This would be promising item bank for the subsequent development of relevant short form or CAT to facilitate routine clinical use.
Supporting information S1