Patient knowledge in anaesthesia: Psychometric development of the RAKQ–The Rotterdam anaesthesia Knowledge questionnaire

Sander F. van den Heuvel; Hester van Eeren; Sanne E. Hoeks; Anna Panasewicz; Philip Jonker; Sohal Y. Ismail; Jan J. van Busschbach; Robert Jan Stolker; Jan-Wiebe H. Korstanje

doi:10.1371/journal.pone.0299052

Abstract

The transition from in-person to digital preoperative patient education requires effective methods for evaluating patients’ understanding of the perioperative process, risks, and instructions to ensure informed consent. A knowledge questionnaire covering different anaesthesia techniques and instructions could fulfil this need. We constructed a set of items covering common anaesthesia techniques requiring informed consent and developed the Rotterdam Anaesthesia Knowledge Questionnaire (RAKQ) using a structured approach and Item Response Theory. A team of anaesthetists and educational experts developed the initial set of 60 multiple-choice items, ensuring content and face validity. Next, based on exploratory factor analysis, we identified seven domains: General Anaesthesia–I (regarding what to expect), General Anaesthesia–II (regarding the risks), Spinal Anaesthesia, Epidural Anaesthesia, Regional Anaesthesia, Procedural sedation and analgesia, and Generic Items. This itemset was filled out by 577 patients in the Erasmus MC, Rotterdam, and Albert Schweitzer Hospital, Dordrecht, the Netherlands. Based on factor loadings (≥0.25) and considering clinical relevance this initial item set was reduced to 50 items, distributed over the seven domains. Each domain was processed to produce a separate questionnaire. Through an iterative process of item selection to ensure that the questionnaires met the criteria for Item Response Theory modelling, 40 items remained in the definitive set of seven questionnaires. Finally, we developed an Item Response Theory model for each questionnaire and evaluated its reliability. 1-PL and 2-PL models were chosen based on best model fit. No item misfit (S-χ², p<0.001 = misfit) was detected in the final models. The newly developed RAKQ allows practitioners to assess their patients’ knowledge before consultation to better address knowledge gaps during consultation. Moreover, they can decide whether the level of knowledge is sufficient to obtain digital informed consent without face-to-face education. Researchers can use the RAKQ to compare new methods of patient education with traditional methods.

Citation: van den Heuvel SF, van Eeren H, Hoeks SE, Panasewicz A, Jonker P, Ismail SY, et al. (2024) Patient knowledge in anaesthesia: Psychometric development of the RAKQ–The Rotterdam anaesthesia Knowledge questionnaire. PLoS ONE 19(7): e0299052. https://doi.org/10.1371/journal.pone.0299052

Editor: Stefano Turi, IRCCS: IRCCS Ospedale San Raffaele, ITALY

Received: June 8, 2023; Accepted: February 4, 2024; Published: July 12, 2024

Copyright: © 2024 van den Heuvel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available from the Dataverse database (accession number doi:10.34894/3UCXY4). A Data Transfer Agreement (DTA) in line with GDPR regulations and/or a Research Collaboration Agreement (RCA) should be signed before data is shared.

Funding: This study was supported by a grant provided by the Dutch Ministry of Economic Affairs to JK (TKI grant No. EMCLSH200009, URL: https://www.health-holland.com/en/project/2021/2020/making-preoperative-anesthesiological-screening-future-proof). The sole responsibility for the content of this publication lies with the authors. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: JK was an unpaid medical adviser for NovaCair B.V., a developer of digital preoperative screening software. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

Informed consent is legally required for any medical procedure or therapy. Preoperative patient education on the procedure, risks, and necessary preparations are essential for obtaining informed consent for any type of anaesthesia (e.g. general, spinal, epidural, or regional anaesthesia). Therefore, it is an integral part of preoperative anaesthetic consultations. Today’s technological advancements make digital preoperative education and screening feasible, with less direct involvement of the anaesthetist, less need for hospital real estate, and a lower burden on patients in terms of time and costs spent on travelling [1]. To facilitate the transition from face-to-face education and screening to digital education and screening, there is a need to compare novel teaching methods, for example educating patients using video animation, to a conventional face-to-face setting in terms of patients’ knowledge level on anaesthesia. A psychometrically validated knowledge questionnaire covering all aspects of anaesthesia could be a tool to evaluate patients’ knowledge before obtaining informed consent and a first step in digitally screening patients.

Although several questionnaires have been developed to measure knowledge on anaesthetic topics [2–5], only the questionnaire developed by Miller et al. was validated [3]. However, it was intended for the parents of paediatric patients. Moreover, these earlier questionnaires covered only general anaesthesia and preoperative instructions. None of the questionnaires tested knowledge on other anaesthesia techniques. Furthermore, none of the questionnaires were validated for use in different populations, and none were developed for use in daily practice to test individual patients’ knowledge.

To assess patients’ knowledge on anaesthesia effectively and validly in a digital setting, new psychometrically validated questionnaires are needed for a wider variety of anaesthesia types. Therefore, we designed this study to construct a set of items covering the most common anaesthesia techniques that require informed consent (i.e. general anaesthesia, spinal anaesthesia, epidural anaesthesia, regional anaesthesia, procedural sedation and analgesia (PSA), and generic items). The primary objectives were to (i) develop a set of items covering knowledge on different anaesthesia techniques, (ii) develop scales along which this knowledge can be graded, and (iii) construct models using Item Response Theory (IRT) and determine the psychometric properties of the items.

Materials and methods

Based on the methodology described by Boateng et al.[6] and Reeve et al.[7], we divided the process of developing knowledge scales into two distinct phases: I) scale development, and II) scale evaluation, with each phase having multiple steps, as shown in Fig 1. This study aimed to create a comprehensive set of questionnaires covering six distinct knowledge domains regarding anaesthesia: general anaesthesia, spinal anaesthesia, epidural anaesthesia, regional anaesthesia, PSA, and generic items (e.g. preoperative instructions). The development and adaptation of the questionnaires are explained in the following steps, adhering to psychometric and clinical guidelines. See S1 Table in the Online Supporting Information for a glossary of terms pertaining to the development of questionnaires.

Download:

Fig 1. Steps in developing RAKQ (adapted from Boateng et al.[7] and Reeve et al.[1]).

*Item Response Theory; †Differential item Functioning.

https://doi.org/10.1371/journal.pone.0299052.g001

Phase I–scale development

1. Item generation.

A. Identification of domains and item generation. First, we conducted a literature search to collect sample questions from published questionnaires on patient knowledge regarding anaesthesia to guide item generation. Two anaesthetists (SH and JK) then developed an item bank of 51 items covering the six aforementioned knowledge domains. The items were formulated as multiple-choice items with three to five answer categories, including an ‘I do not know‘ option to discourage guessing. All the items had only one correct answer.

B. Content validity. To ensure content validity, all items were presented to 14 anaesthetists and three registrars in both a general hospital and an academic hospital. They were asked to accept or reject an item or suggest adaptation of an item. The resulting item bank was reviewed by an educational expert and psychologist, and the formulation of the items was adapted to ensure that language level B1 [8] was not exceeded and to prevent ambiguous answer categories.

C. Face validity. To ensure face validity, this set of items was presented to three patients visiting the outpatient preoperative screening clinic. After completing the questionnaires, the patients were interviewed about the appropriateness of the questionnaires to measure knowledge on anaesthesia, and comprehension of all items was assessed. Their comments were used to adapt the items.

2. Sampling and survey administration.

As a rule of thumb, the sample size for IRT analysis requires ten participants per item [9]. Therefore, the appropriate number of participants was calculated based on the resulting number of items generated during phase I-1. (Item generation). These participants were recruited at two hospital sites. Between September 2020 and November 2020, participants were included on the outpatient preoperative clinic of the Erasmus Medical Centre (Erasmus MC) in Rotterdam, the Netherlands, a university hospital. From April 2021 to May 2021, participants were included on the outpatient preoperative clinic of the Albert Schweitzer Hospital (ASZ) in Dordrecht, the Netherlands, a general teaching hospital. Patient inclusion criteria were a minimal age of 18 years, ability to read and understand Dutch, and planned elective surgery. Before administering the questionnaire in the ASZ, slight textual adjustments were made by a team of anaesthetists and a psychologist, based on remarks from participants in Erasmus MC. Care was taken not to change the content of the items or the answer categories (S2 Document in S1 File). Special attention on differential item functioning was paid to the altered items during the scale evaluation phase.

The questionnaires were sent by e-mail after the preoperative consultation had taken place. All participants were asked to complete all questionnaires, regardless of the anaesthesia technique they were educated on. Furthermore, the anaesthetists were not informed about the participation of their patients in this study, and the education provided was limited to anaesthesia techniques relevant to the planned surgery and in accordance to standard practice. The questionnaires were presented on a secure online survey platform.

Age, sex and the planned anaesthesia technique were extracted from the hospital information system during the period of inclusion. This data was pseudonymised and the authors did not have access to information that could identify individual participants.

3. Extraction of factors.

To explore latent factors within each predefined anaesthesia domain, we performed exploratory factor analysis using Multidimensional IRT (MIRT). To determine the number of latent factors that best described the data, we first fitted multiple MIRT models with an increasing number of factors using the Metropolis-Hastings Robbins-Monro algorithm [10]. We compared nested models using the Akaike information criterion (AIC), Bayesian Information Criteria (BIC), and chi-square difference tests. Although we took into account the BIC, in this exploratory phase we preferred the AIC over the BIC to decide which model to choose, because the AIC applies a smaller penalty than the BIC and thus reduces the risk of losing too much information.

Next, we explored the factor loadings in the optimal MIRT model option for each domain. To allow the factors to correlate, we used oblique rotation (Oblimin) [11]. The exploratory nature of the analysis and the large sample size made us consider items with factor loadings as low as 0.25 [12]. Guided by the factor loadings, we assembled item sets based on the following: factors needed to be conceptually interpretable, the first factor was prioritised above the next (since every next factor is extracted based on the residual of the previous factor), and a minimum of three items per factor was needed (to facilitate further analysis) [11]. If the item contributed conceptually to one of the factors, the cross-loading of an item on multiple factors was not used as a reason for removal. When an item did not load on the factor that was selected for the final item set, providing it was conceptually fitting that factor, it was deemed relevant from a clinical perspective and there were no doubts about the quality of the item, the item was kept in the item set during this phase of the scale development. Each selected factor represented a separate scale, corresponding to a separate questionnaire, and was evaluated in the next phase.

Phase II–scale evaluation

1. Checking of assumptions IRT.

Answer categories were dichotomised into correct vs. incorrect answers, where the category ‘I don’t know’ was categorised as being incorrect. To fit an IRT model the assumptions of unidimensionality, local independence, and monotonicity must be met [7]. We assessed these assumptions for each scale, derived in phase I based on item generation and scale development. The assumptions were tested in an iterative process in which an assumption was tested again if the next assumption was not met for that scale.

A. Unidimensionality. Confirmatory factor analysis with weighted least square mean- and variance-adjusted estimator was performed to assess the unidimensionality of each scale. Unidimensionality was accepted when the following criteria were met: a standardised root mean square residual (SRMR) value < 0.08, root mean square error of approximation (RMSEA) value < 0.06 and Scaled Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) values > 0.95 [13–18]. When unidimensionality was not satisfactory, items were chosen to be left out in subsequent analysis, based on factor loading and clinical relevance. The final set of items in each scale was subjected to an additional check for unidimensionality with Modified Parallel Analysis [19].

B. Monotonicity. We assessed monotonicity and scalability and invariant item ordering (IIO) by fitting nonparametric IRT functions using Mokken scaling [20]. The assumption of monotonicity was considered met when no significant violations of monotonicity were detected based on the z-test statistics. Scalability was assessed using the value of the scalability coefficient H of the entire scale and individual items (H_i). The H value of the entire scale was considered strong if H > 0.50, moderate if 0.40 ≤ H ≤ 0.50 and weak if 0.30 ≤ H ≤ 0.40 [21]. IIO was determined by assessing the number of significant violations of IIO per item based on the z-test statistic. When significant violations of monotonicity or IIO were detected or unscalable items were encountered, backward selection was applied by stepwise removal of the items with the highest violations or the lowest H coefficient until no violations were detected. This process ended when the H-coefficient of the entire scale was satisfactory [22].

C. Local independence. Local independence was met if the residual correlations between all the item pairs in the CFA model were < 0.20 [7]. When local dependencies were detected, we reviewed those items to determine the nature of the dependency and removed them when deemed necessary.

2. IRT model fitting.

Once the assumptions were met, the item response functions, that is 1-PL, 2-PL, and 3-PL models, were fitted per scale, using the Expectation Maximization estimation algorithm. Nested models were compared using a likelihood ratio test, AIC, and BIC, and the best-fitting model was chosen [23]. The goodness of fit of the IRT models was assessed at item level using the S-χ² statistic for item fit, for which a p-value of <0.001 for an item was considered a misfit [7, 24, 25].

3. Differential item functioning.

We evaluated Differential Item Functioning (DIF) using ordinal regression modelling with McFadden’s pseudo R² ≥ 0.02 being indicative of DIF (R²₁₂ suggests uniform DIF, R²₂₃ suggests non-uniform DIF) [26]. We evaluated age (two groups divided by the median age of the study sample), sex, the hospital the patient visited for preoperative screening (EMC vs. ASZ), level of education (tertiary vs. primary, secondary, and other), and whether the anaesthesia technique discussed with the participant during the preoperative consultation matched the knowledge domain of the respective scale (yes vs. no). To assess the magnitude and statistical relevance of DIF when an item was flagged for DIF, we compared the difference between the initial theta’s (i.e. the estimated level of ability) and the theta’s without items with DIF (i.e. purified theta’s) on a scale level. This difference was then evaluated by plotting the differences due to DIF against the median standard error of measurement (SEM) [27]. Differences larger than the mean SEM were considered noticeable.

4. Information.

In IRT, each response pattern results in a different theta and a different associated standard error. Therefore, the precision or reliability of an IRT model differs across the range of theta and is conceptualised as information. To investigate the information of the models, Test Information Curves were plotted together with the SE(theta), which was estimated using the Expected A Posteriori estimator. When compared with classical testing, an SE(theta) of 0.32 or lower, corresponds to a reliability of 0.90 or higher, and can thus be considered a reliable measurement [28].

Statistics and data analysis

Data were collected using LimeSurvey [29] and Gemstracker [30]. Data analysis was performed using R (version 4.2.0) [31]. Unidimensionality and local independence were assessed with the R package ‘lavaan’ [32] and ‘ltm’[33]. Monotonicity, scalability and invariant item ordering (IIO) were assessed with the R-package ‘mokken’ [21, 34]. Differential Item Functioning (DIF) was evaluated with the R-package ‘lordif’ [35]. Exploratory Factor Analysis, IRT modelling and assessment of the test information were performed with the R-package ‘mirt‘ [23, 36].

Demographic and clinical characteristics were compared between the two hospitals and expressed as mean (SD) or number (percentage), where appropriate. Continuous variables were compared using Welch’s t-test, Mann-Whitney U test or Kruskal-Wallis test where appropriate, categorical variables were compared using the Chi-Square test.

Ethical considerations

Ethical approval for this study was granted by the Medical Ethics Committee of Erasmus MC Rotterdam (MEC-2020-0468). Because there was no infringement on the physical and/or psychological integrity of the subject, the Medical Ethics Committee deemed the trial not to be subject to the Dutch Law on Medical Research [37]. Written informed consent was obtained from all subjects. This study was conducted in compliance with the principles of the Declaration of Helsinki [38].

Results

In this study, we developed sets of items covering knowledge on different anaesthesia techniques and constructed scales in which this knowledge can be graded. We then constructed models using IRT and determined the psychometric properties of the items.

Phase I–scale development

1. Item generation.

Initially, six knowledge domains were defined: Generic items, General anaesthesia, Spinal anaesthesia, Epidural anaesthesia, Regional anaesthesia and PSA. Then, 51 items covering these knowledge domains were formulated, which were subsequently supplemented by 9 items following the review of a larger group of experts, in total 60 items. The full list of 60 items, as a non-validated English translation of the Dutch original, that resulted after checking for content and face validity is shown in Table 1. Additionally, this table indicates the phase at which each item was removed from the item set and the reason for its removal.

Download:

Table 1. List of items resulting from Phase I-1 –Item generation.

https://doi.org/10.1371/journal.pone.0299052.t001

2. Sampling and survey administration.

Given the fact 60 items were generated during phase I-1 (Item generation), a sample size of 600 participants would be needed, which was approximated by including 577 patients visiting the preoperative outpatient clinic. In the Erasmus MC, 610 patients consented to be approached regarding this study. The response rate was 55% (n = 336) and of these respondents, 95% provided informed consent for inclusion and completed the questionnaire (n = 319). In ASZ, 950 patients consented to be approached regarding this study. The response rate was 39% (n = 370) and of these respondents, 70% provided informed consent for inclusion and completed the questionnaire (n = 258).

Table 2 shows the demographic and clinical characteristics of the in total 577 participants who completed the questionnaire. All participants answered all items in the questionnaire. Across all participants, the mean age was 60 (± 15) and 274 (48%) were female. The percentage of female participants was 41% versus 56% in Erasmus MC and ASZ, respectively (p = 0.002). In ASZ, spinal and regional anaesthesia was planned more often compared to Erasmus MC (20% versus 2% and 10% versus 3%, respectively, p<0.001). Other types of planned anaesthesia techniques, as well as age and highest level of education did not differ significantly between participants.

Download:

Table 2. Demographics and clinical characteristics of the two sample sites.

https://doi.org/10.1371/journal.pone.0299052.t002

3. Extraction of factors.

Table 3 shows the factor loadings >0.25 for all items within each predefined knowledge domain, based on the optimal number of factors as determined by model comparison (S2 Table). Items retained in the final item set(s) are printed in bold. The domains Regional anaesthesia, Spinal anaesthesia, Epidural anaesthesia and PSA all resulted in two factors, but were reduced to a single set of items per domain, including several items that did not load on the first factor, because of their clinical and conceptual relevance. The Generic items domain consisted of three factors and was reduced to a single set of items, because factors 2 and 3 consisted of too few items after allocating cross-loading items to factor 1. The General anaesthesia domain was reduced to two sets of items. Conceptually, factor 1 consisted of items relating to direct perioperative care (‘General anaesthesia–I’), and factor 4 comprised of items concerning complications or side-effects (‘General anaesthesia–II’). Factor 2 within General anaesthesia consisted of too few items when cross-loading was taken into account, and factor 3 was conceptually not interpretable. Items from factors 2 and 3 that fitted the construct of factor 4 were included in the ‘General anaesthesia–II’ item set for further analysis. In Table 1, the last column indicates the rationale for the exclusion of particular items from the item set. In total 7 factors were deemed suitable for further evaluation in the next phase as separate scales.

Download:

Table 3. Factor loadings for exploratory factor analysis.

https://doi.org/10.1371/journal.pone.0299052.t003

Phase II–scale evaluation

1. Checking of assumptions IRT.

The assessment of IRT assumptions and consecutive adjustments of the item sets is an iterative process which is elaborated on below. The final assessment of all IRT assumptions is presented in Tables 4–6. The scree plots following Modified Parallel Analysis are presented in S1 Fig in the Online Supporting Information, showing unidimensionality on all scales.

Download:

Table 4. Final assessment of Unidimensionality before IRT modelling can be undertaken.

https://doi.org/10.1371/journal.pone.0299052.t004

Download:

Table 5. Final assessment of Monotonicity before IRT modelling can be undertaken.

https://doi.org/10.1371/journal.pone.0299052.t005

Download:

Table 6. Final assessment of Local Independence before IRT modelling can be undertaken.

https://doi.org/10.1371/journal.pone.0299052.t006

Generic items The initial set of seven items met the unidimensionality criteria, but was not scalable (H = 0.202; all items H_i<0.3). To improve scalability, the items GEN10, GEN4, and GEN12 were removed. The reduced set of four items met the unidimensionality criteria and was evaluated as weakly scalable (H = 0.307; two items H_i<0.3). No serious violations of monotonicity, violations of IIO or local dependencies were detected.

General anaesthesia–I The initial set of eight items met the unidimensionality criteria and was weakly scalable (H = 0.329; two items H_i<0.3). Analysis of IIO revealed that after removing two items (GA1 and GA11), no violation of IIO was detected. The reduced set of six items met the unidimensionality criteria and was moderately scalable (H = 0.441; no item H_i <0.3). No serious violations of monotonicity were detected. Two items showed local dependence (GA4 and GA5), but apart from being very easy questions, no clinically relevant correlation could be found, and they were preserved in the item set.

General anaesthesia–II The initial set of seven items did not meet the unidimensionality criteria (SRMR = 0.086). After reviewing the three items that did not load on the latent factor in Phase II and removing the clinically least relevant item (GA13), the reduced set of six items met the unidimensionality criteria. The item-set was weakly scalable (H = 0.329; no item H_i<0.3). No serious violations of monotonicity, violations of IIO or local dependencies were detected.

Spinal anaesthesia The initial set of 12 items met the unidimensionality criteria and was moderately scalable (H = 0.410; one item H_i<0.3). Analysis of IIO revealed that after removing two items (SA3 and SA10), no violation of IIO was detected. The reduced set of ten items met the unidimensionality criteria and was moderately scalable (H = 0.439; no item H_i<0.3). No serious violations of monotonicity or local dependencies were detected.

Regional anaesthesia The initial set of six items met all but one unidimensionality criterion (scaled TLI) but was not scalable (H = 0.252; four items H_i<0.3). To improve scalability, items RA4 and RA5 were removed. The reduced set of four items met the unidimensionality criteria and was weakly scalable (H = 0.349; one item H_i <0.3). No serious violations of monotonicity, violations of IIO or local dependencies were detected.

Epidural anaesthesia The initial set of five items met the unidimensionality criteria and was weakly scalable (H = 0.391; one item H_i <0.3). No serious violations of monotonicity, violations of IIO or local dependencies were detected.

Procedural sedation and analgesia The initial set of five items met the unidimensionality criteria and was weakly scalable (H = 0.316; four items H_i <0.3). No serious violations of monotonicity, violations of IIO or local dependencies were detected.

2. IRT model fitting.

Table 7 shows the item parameters for the 1-PL and 2-PL models fitted to the data. S3 Table shows comparisons between the 1-PL, 2-PL, and 3-PL models for each scale. The 1-PL model was the best fitting model for the Generic items, General anaesthesia–II and Regional anaesthesia scales. The 2-PL model was the best fitting model in the General anaesthesia–I, Spinal anaesthesia, Epidural anaesthesia, and PSA scales. Table 7 also shows the item fit per scale for the best fitting model. No item misfit was detected. Fig 2. shows the item characteristic curves (ICC) per scale for the best fitting model.

Download:

Fig 2. Item response functions per scale.

https://doi.org/10.1371/journal.pone.0299052.g002

Download:

Table 7. IRT-model item parameters and item fit (p-values).

https://doi.org/10.1371/journal.pone.0299052.t007

3. Differential item functioning.

Table 8 shows which items have DIF and the magnitude and type of DIF (only McFadden’s pseudo R² ≥ 0.02 is shown). Overall, no items were flagged for DIF regarding sex, seven items were flagged for DIF regarding age, one item was flagged for DIF regarding anaesthesia technique, four items were flagged for DIF regarding the sample hospital, and four items were flagged for DIF regarding the level of education. The impact of DIF on the total scores for each scale was well below the mean SEM, and its impact was negligible. Of the items that were slightly modified between the hospitals (GA2 and GA3), only GA2 was flagged for DIF regarding the sample hospital, but its impact was negligible. A more detailed exploration of DIF within the scales is shown in online Supporting Information S2 Document in S1 File.

Download:

Table 8. Differential item functioning (pseudo-R²) per model.

https://doi.org/10.1371/journal.pone.0299052.t008

4. Information.

The Test Information Curves and SE(theta) for each scale are shown in Fig 3. The order of the scales regarding difficulty (i.e. the theta’s at which the scale is most informative) was from most difficult to least difficult: Regional anaesthesia, Epidural anaesthesia, Spinal anaesthesia, PSA, General anaesthesia I, General anaesthesia II, and Generic items.

Download:

Fig 3. Test Information (I) and Standard Errors (SE) Curves of the seven scales.

https://doi.org/10.1371/journal.pone.0299052.g003

Final questionnaires

The resulting scales from phase I and II each form a separate questionnaire within the RAKQ. S4 Table shows the English translation of these seven final questionnaires on preoperative knowledge on anaesthesia techniques.

Discussion

This is the first elaborative effort to develop a validated questionnaire that covers the main domains of patient knowledge in anaesthesia in adult care and to evaluate its psychometric properties. The RAKQ is a set of seven questionnaires: one generic questionnaire and six questionnaires covering five different anaesthesia techniques, of which general anaesthesia is divided into two questionnaires. In total the RAKQ contains 40 multiple-choice items. Through a psychometric evaluation, we created unidimensional questionnaires and provided IRT models for each scale.

Several questionnaires have been developed to measure knowledge on anaesthetic topics [2–5] in the past. However, the RAKQ is the first questionnaire to cover all clinically relevant anaesthesia techniques, such as spinal, epidural, and regional anaesthesia, and PSA. Furthermore, the RAKQ has undergone extensive psychometric evaluation, using IRT for the first time in a preoperative knowledge questionnaire. Trustworthy alternatives to in-person patient education on all topics of anaesthesia can only be developed using a validated instrument, such as the RAKQ.

Methodological considerations

A deductive approach was used to ensure content and face validity in the item development phase (Phase I). This implies that the generation of items was based on predefined domains formulated by experts. Since this was a first endeavour in constructing a comprehensive knowledge questionnaire on anaesthesia, we conducted exploratory factor analysis to explore underlying latent constructs which could have been less obvious at forehand. During exploratory factor analysis we found some items deemed important by experts had low factor loadings (<|0.4|). This could mean that indeed different underlying constructs can be identified, hindering our efforts to create unidimensional scales within predefined knowledge domains. Nevertheless, based on discussions within the team and the fact that the data was collected from a large multicentre sample [11], we proceeded with a lower threshold (>|0.25|). In future studies, it could be valuable to also include the patient perspective in developing items, which could enhance unidimensionality of the scales. Subsequently, the resulting lists of items were evaluated using confirmatory factor analysis, which confirmed the unidimensionality of the individual scales. The resulting set of questionnaires can inform the clinician about patients’ knowledge on these domains, while also helping to shape and refine the constructs.

Scalability is also important during the development of a valid scale using IRT [7]. We found weak to moderate scalability for all scales, indicating how difficult it is to scale knowledge on relatively broad subjects, such as knowledge on anaesthesia techniques. Generating additional items that align with the same construct could improve scalability, but could also result in a division of the current scales into more scales, each focusing on a narrower knowledge domain, rather than the broader coverage provided by the current questionnaires in the RAKQ. This approach would consequently require a larger number of questionnaires to achieve the same level of coverage as the current set of RAKQ questionnaires which would impair user-friendliness. Still, we believe that improving scalability would be the first step in accurately measuring patients’ knowledge, since we aim for tailored consultation and education based on the questionnaire.

Given the exploratory nature of our study, we compared the relative fit of 1-, 2- and 3-PL models to assess which model showed the best fit to the data. The 3-PL models never showed to be an enhancement over the 2-PL models, in line with our approach to reduce the element of guessing by adding an additional response option, “I don’t know”. It should be noted that when employing the questionnaire, a choice must be made between a 1-PL and 2-PL model. This decision should not only be based on the fit of the model to the data, but also on more fundamental considerations. An argument for using a 1-PL model is the premise that every question is of equal importance in estimating the knowledge level.

Conversely, a 2-PL model provides more opportunities to differentiate between patients with sufficient or insufficient knowledge, with the contribution of the items to the estimation of the knowledge level being different.

Future directions

The RAKQ can be used to assess the level of knowledge of patients regarding the perioperative process, risks, and preoperative preparations in both clinical practice and research settings. When applied after digital education and before consultation, consulting anaesthetists can pay particular attention to gaps in knowledge. Moreover, patients can automatically be offered additional information even before preoperative consultation. Furthermore, when a fully digital preoperative screening is considered for selected (low-risk) patients, the RAKQ can indicate whether patients have truly understood the information or whether additional education is needed for a well-informed consent. The RAKQ can be used as a research tool to compare new methods of patient education with traditional methods. The practical application of the RAKQ and its acceptance by patients and anaesthetists as a tool to optimise preoperative education should be evaluated in clinical settings.

As a next step, a computerised-adaptive test (CAT) can be developed with the difficulty and discriminatory parameters provided by IRT modelling. With a sufficiently large item pool with items that are sufficiently discriminatory and differ across their difficulty levels, CAT can be used to reliably measure the level of knowledge with a minimal, individualised subset of items to reduce the burden on patients. As can be deduced from the Information Curves, our questionnaires differ in their overall difficulty. The item sets require further diversification regarding the difficulty of the items to facilitate meaningful CAT, such that better discrimination between an acceptable and unacceptable level of knowledge is possible. Furthermore, conducting reliability testing on the validated questionnaires before developing a CAT is warranted to ensure the stability and precision of the IRT parameters across different ability levels. Additionally, targeting populations educated on the specific anaesthesia technique assessed in each questionnaire and providing standardized instructions to the anaesthetist educating the patients, would enhance the development of the questionnaires and, consequently, of the CAT. This approach would results in a more sensitive and reliable assessment of the participants’ abilities within the respective domains.

In summary, the set of questionnaires presented in this paper is the first to cover multiple commonly used anaesthesia techniques and the first to be psychometrically validated using IRT. We believe that these questionnaires are a solid foundation on which to further develop knowledge scales and explore computerised-adaptive testing. This can pave the way for trustworthy digital informed consent, which could reduce patient burden and optimise the efficiency of preoperative care.

Supporting information

S1 Fig. Scree plots following modified parallel analysis.

https://doi.org/10.1371/journal.pone.0299052.s001

(PDF)

S1 File. Alterations in the items between the two study sites.

https://doi.org/10.1371/journal.pone.0299052.s002

(DOCX)

S2 File. Differential item functioning.

https://doi.org/10.1371/journal.pone.0299052.s003

(DOCX)

S1 Table. Glossary of terms.

https://doi.org/10.1371/journal.pone.0299052.s004

(DOCX)

S2 Table. Comparison of nested multidimensional Item response theory models for exploratory factor analysis.

https://doi.org/10.1371/journal.pone.0299052.s005

(DOCX)

S3 Table. Comparison of 1-, 2- and 3-PL item response theory models.

https://doi.org/10.1371/journal.pone.0299052.s006

(DOCX)

S4 Table. Final questionnaires.

https://doi.org/10.1371/journal.pone.0299052.s007

(DOCX)

S5 Table.

https://doi.org/10.1371/journal.pone.0299052.s008

(DOCX)

Acknowledgments

The authors would like to thank Dr. M. Vereen and Dr. E. Galvin for the English translation of the RAKQ.

References

1. Kamdar NV, Huverserian A, Jalilian L, Thi W, Duval V, Beck L, et al. Development, Implementation, and Evaluation of a Telemedicine Preoperative Evaluation Initiative at a Major Academic Medical Center. Anesthesia and Analgesia. 2020;131(6):1647–56. pmid:32841990.
- View Article
- PubMed/NCBI
- Google Scholar
2. Kakinuma A, Nagatani H, Otake H, Mizuno J, Nakata Y. The effects of short interactive animation video information on preanesthetic anxiety, knowledge, and interview time: a randomized controlled trial. Anesth Analg. 2011;112(6):1314–8. Epub 2011/02/25. pmid:21346166.
- View Article
- PubMed/NCBI
- Google Scholar
3. Miller KM, Wysocki T, Cassady JF, Cancel D, Izenberg N. Validation of Measures of Parents’ Preoperative Anxiety and Anesthesia Knowledge. Anesthesia & Analgesia. 1999;88(2):251–7. pmid:9972736-199902000-00005.
- View Article
- PubMed/NCBI
- Google Scholar
4. Snyder-Ramos SA, Seintsch H, Bottiger BW, Motsch J, Martin E, Bauer M. [Development of a questionnaire to assess the quality of the preanesthetic visit]. Anaesthesist. 2003;52(9):818–29. Epub 2003/09/25. pmid:14504809.
- View Article
- PubMed/NCBI
- Google Scholar
5. Zvara DA, Mathes DD, Brooker RF, McKinley AC. Video as a patient teaching tool: does it add to the preoperative anesthetic visit? Anesth Analg. 1996;82(5):1065–8. Epub 1996/05/01. pmid:8610869.
- View Article
- PubMed/NCBI
- Google Scholar
6. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. Epub 2018/06/27. pmid:29942800; PubMed Central PMCID: PMC6004510.
- View Article
- PubMed/NCBI
- Google Scholar
7. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(5 Suppl 1):S22–31. Epub 2007/04/20. pmid:17443115.
- View Article
- PubMed/NCBI
- Google Scholar
8. Council of Europe. Common European Framework of Reference of Languages (CEFR) 2023 [14–11–2023]. Available from: https://www.coe.int/en/web/common-european-framework-reference-languages/table-1-cefr-3.3-common-reference-levels-global-scale.
- View Article
- Google Scholar
9. Nunnally JC. Psychometric theory. 2nd ed. ed. New York;: McGraw-Hill; 1978.
10. Garnier-Villarreal M, Merkle EC, Magnus BE. Between-Item Multidimensional IRT: How Far Can the Estimation Methods Go? Psych. 2021;3(3):404–21.
- View Article
- Google Scholar
11. Worthington RL, Whittaker TA. Scale Development Research: A Content Analysis and Recommendations for Best Practices. The Counseling Psychologist. 2006;34(6):806–38.
- View Article
- Google Scholar
12. Keenan AP, James PS. Applied Multivariate Statistics for the Social Sciences: Analyses with SAS and IBM’s SPSS, Sixth Edition. New York: Routledge; 2016.
13. MacDonald RP. Test theory: a unified treatment. Mahwah [etc.]: Lawrence Erlbaum Associates; 1999.
14. Kline RB. Principles and practice of structural equation modeling. New York: Guilford Press; 2016.
15. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238–46. pmid:2320703.
- View Article
- PubMed/NCBI
- Google Scholar
16. West SG, Finch JF, Curran PJ. SEM with nonnormal variables. In: Hoyle RH, editor. Structural equation modeling: concepts, issues, and applications. Thousand Oaks, Calif.: Sage; 1995. p. 56–75.
17. Lt Hu, Bentler PM Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 1999;6(1):1–55.
- View Article
- Google Scholar
18. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, editors. Testing structural equation models. Newbury Park, CA: Sage Publications; 1993.
19. Drasgow F, Lissak RI. Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology. 1983;68(3):363–73.
- View Article
- Google Scholar
20. Wind S. Examining the Psychometric Quality of Multiple-Choice Assessment Items using Mokken Scale Analysis. Journal of applied measurement. 2016;17:142–65. pmid:28009581
- View Article
- PubMed/NCBI
- Google Scholar
21. van der Ark LA. Mokken Scale Analysis in R. Journal of Statistical Software. 2007;20(11):1–19.
- View Article
- Google Scholar
22. van Schuur W. Ordinal Item Response Theory: Mokken Scale Analysis. Thousand Oaks, California 2011.
23. Chalmers RP. mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software. 2012;048(i06).
- View Article
- Google Scholar
24. Luijten MAJ, Van Litsenburg RRL, Terwee CB, Grootenhuis MA, Haverman L. Psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS®) pediatric item bank peer relationships in the Dutch general population. Quality of Life Research. 2021;30(7):2061–70. pmid:33606180
- View Article
- PubMed/NCBI
- Google Scholar
25. Orlando M, Thissen D. Further investigation of the performance of S-X²: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement. 2003;27(4):289–98.
- View Article
- Google Scholar
26. Crins MHP, Terwee CB, Ogreden O, Schuller W, Dekker P, Flens G, et al. Differential item functioning of the PROMIS physical function, pain interference, and pain behavior item banks across patients with different musculoskeletal disorders and persons from the general population. Quality of Life Research. 2019;28(5):1231–43. pmid:30600494
- View Article
- PubMed/NCBI
- Google Scholar
27. Kleinman M, Teresi JA. Differential item functioning magnitude and impact measures from item response theory models. Psychol Test Assess Model. 2016;58(1):79–98. Epub 2016/01/01. pmid:28706769; PubMed Central PMCID: PMC5505278.
- View Article
- PubMed/NCBI
- Google Scholar
28. Klaufus LH, Luijten MAJ, Verlinden E, Van Der Wal MF, Haverman L, Cuijpers P, et al. Psychometric properties of the Dutch-Flemish PROMIS® pediatric item banks Anxiety and Depressive Symptoms in a general population. Quality of Life Research. 2021;30(9):2683–95. pmid:33983618
- View Article
- PubMed/NCBI
- Google Scholar
29. Limesurvey GmbH. LimeSurvey: An Open Source survey tool. Hamburg, Germany 2020.
- View Article
- Google Scholar
30. Erasmus MC and Equipe Zorgbedrijven. GemsTracker ©. 1.9.0 ed 2011.
- View Article
- Google Scholar
31. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria 2022.
32. Rosseel Y. lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software. 2012;48(2):1–36.
- View Article
- Google Scholar
33. Rizopoulos D. ltm: An R Package for Latent Variable Modeling and Item Response Analysis. Journal of Statistical Software. 2006;17(5):1–25.
- View Article
- Google Scholar
34. van der Ark A. New Developments in Mokken Scale Analysis in R. J Stat Softw. 2012;48.
- View Article
- Google Scholar
35. Choi SW, Gibbons LE, Crane PK. lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. J Stat Softw. 2011;39(8):1–30. Epub 2011/05/17. pmid:21572908; PubMed Central PMCID: PMC3093114.
- View Article
- PubMed/NCBI
- Google Scholar
36. Immekus JC, Snyder KE, Ralston PA. Multidimensional Item Response Theory for Factor Structure Assessment in Educational Psychology Research. Frontiers in Education. 2019;4.
- View Article
- Google Scholar
37. Overheid.nl. Wet medisch-wetenschappelijk onderzoek met mensen [14–11–2023]. Available from: https://wetten.overheid.nl/BWBR0009408/2022-07-01.
38. World Medical Association. WMA Declaration of Helsinki—Ethical principles for medical research involving human subjects 2022 [updated 06-09-202214-11-2023]. Available from: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/.
- View Article
- Google Scholar

[ref1] 1. Kamdar NV, Huverserian A, Jalilian L, Thi W, Duval V, Beck L, et al. Development, Implementation, and Evaluation of a Telemedicine Preoperative Evaluation Initiative at a Major Academic Medical Center. Anesthesia and Analgesia. 2020;131(6):1647–56. pmid:32841990.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Kakinuma A, Nagatani H, Otake H, Mizuno J, Nakata Y. The effects of short interactive animation video information on preanesthetic anxiety, knowledge, and interview time: a randomized controlled trial. Anesth Analg. 2011;112(6):1314–8. Epub 2011/02/25. pmid:21346166.
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Miller KM, Wysocki T, Cassady JF, Cancel D, Izenberg N. Validation of Measures of Parents’ Preoperative Anxiety and Anesthesia Knowledge. Anesthesia & Analgesia. 1999;88(2):251–7. pmid:9972736-199902000-00005.
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Snyder-Ramos SA, Seintsch H, Bottiger BW, Motsch J, Martin E, Bauer M. [Development of a questionnaire to assess the quality of the preanesthetic visit]. Anaesthesist. 2003;52(9):818–29. Epub 2003/09/25. pmid:14504809.
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Zvara DA, Mathes DD, Brooker RF, McKinley AC. Video as a patient teaching tool: does it add to the preoperative anesthetic visit? Anesth Analg. 1996;82(5):1065–8. Epub 1996/05/01. pmid:8610869.
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. Epub 2018/06/27. pmid:29942800; PubMed Central PMCID: PMC6004510.
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(5 Suppl 1):S22–31. Epub 2007/04/20. pmid:17443115.
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Council of Europe. Common European Framework of Reference of Languages (CEFR) 2023 [14–11–2023]. Available from: https://www.coe.int/en/web/common-european-framework-reference-languages/table-1-cefr-3.3-common-reference-levels-global-scale.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref9] 9. Nunnally JC. Psychometric theory. 2nd ed. ed. New York;: McGraw-Hill; 1978.

[ref10] 10. Garnier-Villarreal M, Merkle EC, Magnus BE. Between-Item Multidimensional IRT: How Far Can the Estimation Methods Go? Psych. 2021;3(3):404–21.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref11] 11. Worthington RL, Whittaker TA. Scale Development Research: A Content Analysis and Recommendations for Best Practices. The Counseling Psychologist. 2006;34(6):806–38.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref12] 12. Keenan AP, James PS. Applied Multivariate Statistics for the Social Sciences: Analyses with SAS and IBM’s SPSS, Sixth Edition. New York: Routledge; 2016.

[ref13] 13. MacDonald RP. Test theory: a unified treatment. Mahwah [etc.]: Lawrence Erlbaum Associates; 1999.

[ref14] 14. Kline RB. Principles and practice of structural equation modeling. New York: Guilford Press; 2016.

[ref15] 15. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238–46. pmid:2320703.
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref16] 16. West SG, Finch JF, Curran PJ. SEM with nonnormal variables. In: Hoyle RH, editor. Structural equation modeling: concepts, issues, and applications. Thousand Oaks, Calif.: Sage; 1995. p. 56–75.

[ref17] 17. Lt Hu, Bentler PM Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 1999;6(1):1–55.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref18] 18. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, editors. Testing structural equation models. Newbury Park, CA: Sage Publications; 1993.

[ref19] 19. Drasgow F, Lissak RI. Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology. 1983;68(3):363–73.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref20] 20. Wind S. Examining the Psychometric Quality of Multiple-Choice Assessment Items using Mokken Scale Analysis. Journal of applied measurement. 2016;17:142–65. pmid:28009581
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref21] 21. van der Ark LA. Mokken Scale Analysis in R. Journal of Statistical Software. 2007;20(11):1–19.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. van Schuur W. Ordinal Item Response Theory: Mokken Scale Analysis. Thousand Oaks, California 2011.

[ref23] 23. Chalmers RP. mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software. 2012;048(i06).
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref24] 24. Luijten MAJ, Van Litsenburg RRL, Terwee CB, Grootenhuis MA, Haverman L. Psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS®) pediatric item bank peer relationships in the Dutch general population. Quality of Life Research. 2021;30(7):2061–70. pmid:33606180
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref25] 25. Orlando M, Thissen D. Further investigation of the performance of S-X²: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement. 2003;27(4):289–98.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref26] 26. Crins MHP, Terwee CB, Ogreden O, Schuller W, Dekker P, Flens G, et al. Differential item functioning of the PROMIS physical function, pain interference, and pain behavior item banks across patients with different musculoskeletal disorders and persons from the general population. Quality of Life Research. 2019;28(5):1231–43. pmid:30600494
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref27] 27. Kleinman M, Teresi JA. Differential item functioning magnitude and impact measures from item response theory models. Psychol Test Assess Model. 2016;58(1):79–98. Epub 2016/01/01. pmid:28706769; PubMed Central PMCID: PMC5505278.
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref28] 28. Klaufus LH, Luijten MAJ, Verlinden E, Van Der Wal MF, Haverman L, Cuijpers P, et al. Psychometric properties of the Dutch-Flemish PROMIS® pediatric item banks Anxiety and Depressive Symptoms in a general population. Quality of Life Research. 2021;30(9):2683–95. pmid:33983618
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref29] 29. Limesurvey GmbH. LimeSurvey: An Open Source survey tool. Hamburg, Germany 2020.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref30] 30. Erasmus MC and Equipe Zorgbedrijven. GemsTracker ©. 1.9.0 ed 2011.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref31] 31. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria 2022.

[ref32] 32. Rosseel Y. lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software. 2012;48(2):1–36.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref33] 33. Rizopoulos D. ltm: An R Package for Latent Variable Modeling and Item Response Analysis. Journal of Statistical Software. 2006;17(5):1–25.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref34] 34. van der Ark A. New Developments in Mokken Scale Analysis in R. J Stat Softw. 2012;48.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref35] 35. Choi SW, Gibbons LE, Crane PK. lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. J Stat Softw. 2011;39(8):1–30. Epub 2011/05/17. pmid:21572908; PubMed Central PMCID: PMC3093114.
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref36] 36. Immekus JC, Snyder KE, Ralston PA. Multidimensional Item Response Theory for Factor Structure Assessment in Educational Psychology Research. Frontiers in Education. 2019;4.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref37] 37. Overheid.nl. Wet medisch-wetenschappelijk onderzoek met mensen [14–11–2023]. Available from: https://wetten.overheid.nl/BWBR0009408/2022-07-01.

[ref38] 38. World Medical Association. WMA Declaration of Helsinki—Ethical principles for medical research involving human subjects 2022 [updated 06-09-202214-11-2023]. Available from: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Phase I–scale development

1. Item generation.

2. Sampling and survey administration.

3. Extraction of factors.

Phase II–scale evaluation

1. Checking of assumptions IRT.

2. IRT model fitting.

3. Differential item functioning.

4. Information.

Statistics and data analysis

Ethical considerations

Results

Phase I–scale development

1. Item generation.

2. Sampling and survey administration.

3. Extraction of factors.

Phase II–scale evaluation

1. Checking of assumptions IRT.

2. IRT model fitting.

3. Differential item functioning.

4. Information.

Final questionnaires

Discussion

Methodological considerations

Future directions

Supporting information

S1 Fig. Scree plots following modified parallel analysis.

S1 File. Alterations in the items between the two study sites.

S2 File. Differential item functioning.

S1 Table. Glossary of terms.

S2 Table. Comparison of nested multidimensional Item response theory models for exploratory factor analysis.

S3 Table. Comparison of 1-, 2- and 3-PL item response theory models.

S4 Table. Final questionnaires.

S5 Table.

Acknowledgments

References