Examination of an eHealth literacy scale and a health literacy scale in a population with moderate to high cardiovascular risk: Rasch analyses

Introduction Electronic health (eHealth) strategies are evolving making it important to have valid scales to assess eHealth and health literacy. Item response theory methods, such as the Rasch measurement model, are increasingly used for the psychometric evaluation of scales. This paper aims to examine the internal construct validity of an eHealth and health literacy scale using Rasch analysis in a population with moderate to high cardiovascular disease risk. Methods The first 397 participants of the CONNECT study completed the electronic health Literacy Scale (eHEALS) and the Health Literacy Questionnaire (HLQ). Overall Rasch model fit as well as five key psychometric properties were analysed: unidimensionality, response thresholds, targeting, differential item functioning and internal consistency. Results The eHEALS had good overall model fit (χ2 = 54.8, p = 0.06), ordered response thresholds, reasonable targeting and good internal consistency (person separation index (PSI) 0.90). It did, however, appear to measure two constructs of eHealth literacy. The HLQ subscales (except subscale 5) did not fit the Rasch model (χ2: 18.18–60.60, p: 0.00–0.58) and had suboptimal targeting for most subscales. Subscales 6 to 9 displayed disordered thresholds indicating participants had difficulty distinguishing between response options. All subscales did, nonetheless, demonstrate moderate to good internal consistency (PSI: 0.62–0.82). Conclusion Rasch analyses demonstrated that the eHEALS has good measures of internal construct validity although it appears to capture different aspects of eHealth literacy (e.g. using eHealth and understanding eHealth). Whilst further studies are required to confirm this finding, it may be necessary for these constructs of the eHEALS to be scored separately. The nine HLQ subscales were shown to measure a single construct of health literacy. However, participants’ scores may not represent their actual level of ability, as distinction between response categories was unclear for the last four subscales. Reducing the response categories of these subscales may improve the ability of the HLQ to distinguish between different levels of health literacy.


Introduction
Electronic health (eHealth) strategies are evolving making it important to have valid scales to assess eHealth and health literacy. Item response theory methods, such as the Rasch measurement model, are increasingly used for the psychometric evaluation of scales. This paper aims to examine the internal construct validity of an eHealth and health literacy scale using Rasch analysis in a population with moderate to high cardiovascular disease risk.

Methods
The first 397 participants of the CONNECT study completed the electronic health Literacy Scale (eHEALS) and the Health Literacy Questionnaire (HLQ). Overall Rasch model fit as well as five key psychometric properties were analysed: unidimensionality, response thresholds, targeting, differential item functioning and internal consistency.

Introduction
For patients to be able to optimally manage their health, they require an adequate level of understanding about their condition and associated management strategies [1][2][3]. This is related to an individual's level of health literacy and is an increasingly important area of research particularly in people with chronic disease. Cardiovascular disease (CVD), for example, is a major health burden for patients and often requires life-long behaviour changes across multiple risk factors [4]. CVD management depends on active patient participation, which requires a sufficient level of health literacy [5]. It has been shown that patients with CVD that have a low level of health literacy are less likely to adhere to prescribed medications [4][5][6]. Similarly, lack of adherence to prescribed medication has also been found to be associated with inadequate or marginal health literacy [4]. Electronic health (eHealth) literacy has become increasingly relevant in recent years with the development of eHealth tools to support healthcare delivery and management [7][8][9]. However, with these innovative developments eHealth literacy becomes an equally important area of investigation. eHealth literacy is defined as "the ability to seek, find, understand, and appraise health information from electronic sources and apply the knowledge gained to addressing or solving a health problem" [10]. The additional challenge of eHealth relates to the skills required to use an electronic device as well as having an adequate level of health literacy to effectively make decisions regarding health [2,9]. In order to ensure the existing scales for health and eHealth literacy are clinically applicable, it therefore becomes important to examine how they perform in various populations when developing, evaluating and implementing health management strategies.
Several scales can be used to measure health literacy such as the Test of Functional Health Literacy in Adults (TOFHLA) [11] and the Rapid Estimate of Adult Literacy in Medicine (REALM) [12]. However, these scales assess different areas of health literacy which means that an individual can have a very different level of health literacy depending on the scale used [13]. A more comprehensive measure of health literacy is the Health Literacy Questionnaire (HLQ) [14]. In contrast, there are currently very few scales available to evaluate eHealth literacy [7]. One such scale is the electronic Health Literacy Scale (eHEALS) [15]. The psychometric properties of both the eHEALS and the HLQ have been examined using classical test theory and item response theory methods. Prior studies using confirmatory factor analysis (CFA) have demonstrated the eHEALS to be a valid and reliable scale [16,17]. More recent item response theory studies have confirmed these findings [18,19] and demonstrated that the scale measures the same underlying concept or construct. However, a recent study found that the eHEALS was not able to capture the full range of eHealth literacy levels in their study population (i.e. ceiling and floor effects) [18]. The factor structure of the HLQ has also been tested with CFA and three subscales tested were found to be measuring the same underlying construct [19]. It has also demonstrated satisfactory reliability [14]. However discrimination between response categories was questioned for three of the nine scales [14,20], suggesting that response categories could be revised to improve the scales measurement properties.
There is growing evidence to indicate that the Rasch measurement model, which measures a scale's internal construct validity, is the gold standard for psychometric evaluations of outcome scales [21]. Rasch analysis has several well-recognised advantages over classical test theory including analysis of item summation legitimacy, response category distinction, hierarchical structure of item difficulty and discrepancies in item response for a given level of ability [22,23], which all strengthen scale construct validation. Having valid scales is particularly relevant in a population at risk for CVD where an adequate level of health literacy has been shown to improve effective patient disease management [24]. Moreover, there has not yet been a Rasch validation study in a population with CVD. Given that CFA may not fully resolve issues associated with the conceptual structures of psychological scales such as health literacy [20], further evaluation using the Rasch measurement model is required to extend the current knowledge on the validity of these scales. The aim of this study was to extend prior validation studies by examining the internal construct validity of an eHealth and a health literacy scale using Rasch analysis to provide clinicians and researchers with information on the usefulness of these scales.

Materials and methods Design
The internal construct validity of the eHEALS and the HLQ was analysed using Rasch analysis in a population with moderate to high CVD risk. The first 397 consented participants in the CONNECT (Consumer Navigation of Electronic Cardiovascular Tools) Study [25] comprised the sample and completed the eHEALS and HLQ scales. All participants provided written, informed consent and ethical approval was obtained from the University of Sydney Human Research Ethics Committee (Project number 2013/091).

Participants and setting
Details of the CONNECT Study have been published elsewhere [25]. In brief, it is a randomised controlled trial examining whether an eHealth strategy improves risk factor control when compared with usual health care in patients at risk of or with CVD. Participants were recruited via Australian primary care practices. To be eligible, they had to be 18 years or older, have access to the Internet at least once a month (mobile phone, tablet or computer) and have moderate to high risk for CVD. Moderate to high CVD risk was defined as (a) ! 10% 5-year CVD risk using the Framingham risk equation; (b) a clinically high risk condition (Aboriginal or Torres Strait Islander > 75 years, diabetes and >60 years, diabetes and albuminuria, estimated glomerular filtration rate < 45 ml/min, systolic blood pressure !180 mmHg, diastolic blood pressure !110 mmHg, total cholesterol > 7.5 millimol); (c) an established CVD diagnosis (ischaemic heart disease, stroke/transient ischaemic attack, peripheral vascular disease). Participants with an insufficient level of English to provide informed consent or severe intellectual disability were excluded.

Scales used for assessment of eHealth literacy and general health literacy
The eHEALS was used to assess eHealth literacy (Study 1). This scale aims to measure an individual's perception of their knowledge and skills in relation to using electronic health information and determine whether an eHealth approach is suited to the individual [15,26]. It is an 8-item scale with each item scored on a 5-point Likert scale (Table 1). The sum across the eight equally weighted items is presented as a score out of 40. There is no fixed cut-off to distinguish high from low eHealth literacy but higher scores reflect a higher level of eHealth literacy [15,27]. The scale was completed online directly by participants themselves.
The HLQ was used to assess health literacy (Study 2). This scale aims to measure an individual's capacity to effectively use health information and services [14]. It is a 44-item questionnaire with 9 subscales (Table 1). Subscales 1 to 5 are scored on a 4-point Likert scale while the subscales 6 to 9 are 5-point Likert scale. The score for the items in each subscale is summed and divided by the number of items providing nine individual scores. There is no total score across subscales. Although there are no fixed values to classify the level of health literacy, similar to the eHEALS, higher scores indicate higher health literacy in all subscales [14,28].

Model used to assess psychometric properties
The Rasch model was used to examine the psychometric properties of the eHEALS and HLQ. Rasch analysis is a form of item response theory, where the ordinal ratings of the questionnaire are transformed to estimates of interval measures that demonstrate the essential features of the scale [23]. Analysis with the Rasch model provides difficulty measures for each item and ability estimates for each participant located on the same measurement scale as a log of the odds units, or logits. This allows expected and observed results to be compared and the internal construct validity of each item to be determined [23]. This will determine how well the items and participants fit the Rasch model i.e. overall model fit. We also examined five key psychometric parameters detailed in Table 2. To ensure an appropriate degree of precision from the Rasch analysis, a minimum sample size between 108 to 243 participants is required [29].

Analysis
The CONNECT data were analysed using IBM SPSS Statistics 22.0 (IBM SPSS Statistics for Windows Armonk, NY: IBM Corp), with the Rasch analysis completed using the RUMM2030 package using a partial credit model for polytomous data (RUMM Laboratory Pty Ltd, Perth, Australia). Given that both the 8-item eHEALS and HLQ have multiple response category options (e.g. 'strongly disagree' to 'strongly agree'), a partial credit Rasch model was used to examine the internal construct validity of both scales, which is not possible with, for example, a two-parameter logistic (2PL) model. [21] To determine whether the observed data fit the expectations of the Rasch model (overall model fit) the item-trait interaction statistic was used, which were reported as a χ 2 statistic. A significant value (p<0.05) indicated that the observed data did not fit the expectations of the Rasch model [23]. It is important to keep in mind that the χ 2 test is sensitive to sample size, with larger samples having a tendency to generate a significant value [32]. Model fit was also assessed by examining item-person interaction statistics, where a residual standard deviation (SD) of >1.5 suggested there may be an issue with fit [23,33]

Parameter Definition/Aim Measurement Unidimensionality
The extent to which the items of a scale measure a single construct (or concept). All items must measure a single construct for them to be summed.
Subsets of items were defined by positive and negative loadings on the first factor extracted using a principal component analysis of residuals [21,23,30]. Independent t-tests were then used to compare person estimates derived from the two most dissimilar subsets of scale items. The scale was unidimensional if independent t-test <0.05 i.e. less than 5% show a significant difference between their scores on the two subsets. If t-test >0.05, the value of 5% should fall within the 95% CI around the t-test estimate calculated with a binomial test of proportions [23,31] Response Thresholds Reflects the distance between response categories to determine whether participants had difficulty discriminating between them Category probability curves [23] were used to identify the presence of disordered thresholds and attempts to order them were made by collapsing response categories. Response categories were deemed ordered when each response systematically had a point along the location/ability continuum where it was the most likely response (indicated by a peak in the curve)

Targeting
Representation of the extent to which the spread of items reflects the levels of ability (e.g. health literacy) within the sample.
Person-item threshold distribution maps, which reflect the mean location score obtained for the persons with that of the value of zero, [23] were analysed for (1)  as well as residual fit statistics of individual item-and person-fit statistic where values > ± 2.5 [33] indicated misfitting items or persons. The details of the five key psychometric parameters assessed are described in Table 2 [27].

Results
Demographic characteristics of the 397 consenting participants are presented in Table 3. Threequarters of the sample were male, with a mean age of 66.3 years (SD 8.1), 88% were Caucasian and the majority (79%) were married or in a defacto relationship. Due to incomplete scale completion, one participant was removed from both the eHEALS and HLQ for the Rasch analysis. The mean total score for the eHEALS was 27.1 (range: 8-40; SD 6.67), 3.03 (range: 1-4; SD 0.52) for the first five HLQ subscales and 4.19 (range: 1-5; SD 0.47) for subscales 6 to 9 (Table 4).

Study 1: eHEALS
The eHEALS met the Rasch model expectations as demonstrated by the χ 2 item-trait interaction statistic (p: 0.06) ( Table 4). The scale did, however, indicate some degree of item misfit (fit residual mean -0.65, SD 2.31) and person misfit (fit residual mean -0.81; SD 1.69) as reflected in the item-interaction statistics [23]. Individual person-fit statistics also revealed several participants with fit residuals > ±2.5, which could be due to the scale's limitation of detecting mid to high levels of eHealth literacy (see 'Targeting' lower down). Examination of eHealth and health literacy scales in a cardiovascular population: Rasch analyses Unidimensionality. We found limited evidence to support unidimensionality of the eHEALS. Analysis using principal components analysis suggested that the eHEALS may be measuring two separate constructs of the eHealth literacy (p = 0.13; 95% CI 0.11, 0.15), which may indicate that items must be scored separately.
Response thresholds. Inspection of thresholds maps and category probability curves showed ordered thresholds for the five response categories ('strongly disagree' to 'strongly agree') used in the eHEALS scale, demonstrating that participants were able to distinguish between response options.
Targeting. The eHEALS scale displayed reasonable targeting (Fig 1: overall spread of items on bottom half matched spread of persons in top half) with a mean logit score of 0.64 (N.B. ideal: 0). The scale was not, however, able to detect small but clinically important changes in participants with mid to higher levels of eHealth literacy (Fig 1: gap between logit 1 and 3 in the bottom half of the graph).
DIF (Item bias). No significant DIF (or item bias) was evident in the eHEALS for age, gender, polypharmacy and education indicating no influence of these characteristics on response to any of the items of the scale. PSI (internal consistency). With a PSI of 0.90 (Table 4), the scale demonstrated good internal consistency. Examination of eHealth and health literacy scales in a cardiovascular population: Rasch analyses Study 2: HLQ Only subscale 5 met the Rasch model expectations for good overall model fit (χ 2 item-trait interaction statistic <0.05). The remaining HLQ subscales did not reflect hierarchical ordering of the items across all levels of health literacy ( Table 4). Inspection of individual item-fit statistics revealed the presence of misfits and redundancies (S1 Table).
Response thresholds. Ordered thresholds were obtained for subscales 1 to 5 ('strongly disagree' to 'strongly agree'). However, inspection of category probability curves for subscales 6 to 9 showed that participants had difficulty distinguishing between their response categories ('cannot do' to 'quite easy'), as seen by the confluence of peaks in Fig 2A. When the response categories 'very difficult' and 'quite difficult' were collapsed into one category, response thresholds for these subscales became ordered (Fig 2B: distinct peaks for each response category).
Targeting. Targeting was suboptimal for all subscales of the HLQ (mean logit: 1.02 to 4.26) indicating the sample was slightly overqualified for the level of health literacy measured by the scale. This was further highlighted by the ceiling effect (Fig 3: absence of items in bottom half of graph for higher levels of health literacy represented on the top half). There were also gaps in the mid to high health literacy range. DIF (Item bias). Uniform DIF (p<0.05) was observed in HLQ subscales 2, 3, 4 and 8 for education, polypharmacy, gender and age, respectively. Subscale 3 displayed non-uniform DIF (p<0.05) for item 13 ("Despite other things in my life, I make time to be healthy"). However, in all cases, the DIF was not marked and isolated to one item per subscale with minor effect on each respective subscale.
PSI (internal consistency). The PSI varied between 0.62 and 0.82 indicating moderate to good internal consistency for all nine HLQ subscales (Table 4).

Discussion
This study found that the eHEALS had good overall fit to the Rasch model, distinct response categories, no item bias and reasonbale targeting. This provides evidence supporting the use of the 8-item eHEALS as a measure of eHealth literacy with higher scores truly representing Examination of eHealth and health literacy scales in a cardiovascular population: Rasch analyses higher levels of eHealth literacy. All subscales of the HLQ except for subscale 5, on the other hand, did not fit the Rasch measurement model and had suboptimal targeting. Nevertheless, all subscales apart from subscale 9 did demonstrate unidimensionality, had no major item bias and moderate to good internal consistency. This indicates that the HLQ suitably measures nine aspects of health literacy. Both scales were, however, unable to reflect mid to high ranges of health literacy. Consequently, when they are used to measure changes in ability over time, an individual would have to acquire a substantial increase in health or eHealth literacy for it to be reflected in their score. This could account for the presence of misfitting items and persons and highlights the potential for further scale testing and refinement.
Prior studies using factor analysis or principal component analysis (PCA) and item response theory (IRT) have found the eHEALS to be internally consistent (Crohnbach α: 0.87 to 0.94) with modest to good stability [7,16,17,34] and have supported the construct validity of the scale. The high internal consistency found in this study (PSI: 0.90) further supports these findings [7,16]. A recent study using IRT in a population with diverse chronic diseases demonstrated good model fit and distinct response categories for the eHEALS [19]. Good model fit was also found in another IRT study (in a student population and adults who use the Internet) with similar findings around targeting, notably a marked ceiling effect [18]. Unlike the current study, however, response categories were not distinct in the 'strongly disagree' and 'disagree' categories [18]. Further research is needed to test whether reducing the number of eHEALS response categories would better reflect the range of eHealth literacy levels.
Contrary to our findings, previous studies using classical test theory and item response theory (in a student population, healthy adults using the internet and patients with rheumatic disease) have supported unidimensionality of the eHEALS [17,34]. In our analysis, we identified two constructs, or concepts, of eHealth literacy: (1) items 1-5 (around knowledge about resources) and (2) items 6-8 (around evaluation of resources). A Likert-scale, such as the eHEALS, depends on unidimensionality to be able to sum its items [35]. A possible explanation for this discrepancy may be related to the age of the population studied with this study focusing on an older population (mean age: 66 years old). It could be that the knowledge to use the Internet and the ability to evaluate online resources are two distinct skills for an older population but not for a younger one that is more familiar with the electronic medium. Further research is therefore required in different populations to explore whether the eHEALS is indeed composed of two separate constructs, which may then mean summing item scores across constructs is not appropriate.
The HLQ has previously been found to have a good level of construct validity and a composite reliability of !0.8 for almost all subscales [14,36]. However, the scale developers had noted that participants had difficulty distinguishing between the response categories for subscales six to nine ('very difficult', 'quite difficult', 'quite easy' and 'very easy') [14]. This was confirmed in their follow-up paper using a Bayesian model approach where subscales six to eight were found to have disordered thresholds [20]. Similar findings were observed in this Rasch analyses as subscales six to nine displayed disordered thresholds. Disordered thresholds are not uncommon in scales with multiple response categories or when wording between them is similar [23,35]. If participants are not able to distinguish between response categories, the sum of the items does not truly reflect the individual's ability i.e. selecting 'quite' difficult would be equivalent to selecting 'very difficult'. The Rasch analysis demonstrated that collapsing two response categories ('very difficult' and 'quite difficult') resulted in ordered thresholds. This improved the ability for participants to distinguish between the response categories and Examination of eHealth and health literacy scales in a cardiovascular population: Rasch analyses ultimately, could enable healthcare providers to accurately measure the range of health literacy levels.
This is the first study to have assessed and compared the internal construct validity of the eHEALS and the HLQ scales using Rasch analysis in a population with CVD. Validation of such scales is particularly important in order to accurately measure the health literacy of patients with chronic disease, such as CVD, who are active participants in the management of their health. This study was able to provide new insight into the measurement properties of two commonly used scales for eHealth and health literacy by comparing prior findings. It was also able to provide information regarding item bias (ie. DIF), which has not yet been examined in the eHEALS. Furthermore, Rasch was able to analyse several psychometric aspects of the scale, such as the hierarchical structure of items, ordering of response categories, DIF and the summation of the scale, which are beyond the scope of classical test theory. We were, therefore, able to highlight aspects of both scales which require further investigation such as whether the eHEALS measures two distinct constructs of eHealth literacy and provide further support for rescoring subscales six to nine of the HLQ response categories in order to better reflect hierarchical distribution of health literacy. Further validation is now required to test these changes in a different population.
This research does have several limitations. Firstly, the cohort is mostly male and the group is known to have home Internet access, reducing generalizability of the findings. Secondly, both scales had many misfitting items and persons, which may in turn have contributed to the degree of misfit observed between the data and the Rasch measurement model. However, achieving model fit would have entailed deletion of items or persons, which was beyond the scope of this paper. Thirdly, this study did not seek to confirm whether individuals with higher eHealth or health literacy scores truly had better disease management. This is particularly noteworthy since a Dutch study found that perceived skills as measured by the eHEALS did not predict actual performance [34], although it is the only study to have done this analysis and further research is required to confirm this finding. Furthermore, we were not able to assess the external validity of both scales, which is an important area of health literacy. Finally, Rasch analysis is one among many tools for internal construct validity and the results of this study should be considered among the already existing validation work that has been done on these two scales.

Conclusions
Electronic resources are valuable tools to access health information, however, patients must acquire the skills to effectively engage with and benefit from the plethora of resources available online. This is particularly true for CVD in which disease management and prevention depend largely on patients' health literacy. For healthcare providers to optimally assess a patient's level of health literacy, valid and reliable scales are essential. This study demonstrates the good psychometric properties of eHEALS and highlights that the HLQ appropriately measures its nine distinct aspects of health literacy. Further research is now needed to determine the extent to which higher eHEALS scores correctly identify those individuals with greater eHealth capacities and whether collapsing response categories in the last four subscales of the HLQ improves boundaries between response categories.
Supporting information S1 Table. Rasch item and fit statistics for the electronic Health Literacy Scale (eHEALS) and Health Literacy Questionnaire (HLQ) scales. (DOCX)