Development and pre-testing of the Patient Engagement In Research Scale (PEIRS) to assess the quality of engagement from a patient perspective

Objectives To develop and examine the content and face validity of the Patient Engagement In Research Scale (PEIRS) for assessing the quality of patient engagement in research projects from a patient partner perspective. Methods Our team of researchers and patient partners conducted a mixed qualitative and quantitative study in three phases. Participants were English-speaking adult patients (including informal caregivers, family members, and friends) with varying experiences as partners in research projects in Canada. 1) Questionnaire items were generated following thematic analysis of in-depth interviews and published literature. 2) A three-round e-Delphi survey process via email correspondence was undertaken to refine and select the items for a provisional PEIRS. 3) Two rounds of cognitive interviewing elicited participants’ understanding and opinions of each item and the structure of the PEIRS. Results One hundred and twenty items were generated from 18 interviews and organized across eight themes of meaningful engagement of patients in health research to form an initial questionnaire. The e-Delphi survey and cognitive interviewing each included 12 participants with a range of self-reported diseases, health-related conditions, and use of healthcare services. The e-Delphi survey yielded a 43-item provisional PEIRS. The PEIRS was then reduced to 37 items organized across seven themes after 1) refinement of problems in its instructions and items, and 2) the combining of two themes into one. Conclusions We developed a 37-item self-reported questionnaire that has demonstrated preliminary content and face validity for assessing the quality of patient engagement in research.

Introduction knowledge, it has not undergone any validation process to evaluate the quality of patients' engagement throughout the research process. [19] To advance the science and practice of patient engagement, it is critical to have valid and reliable measures to test the effectiveness of patient engagement interventions, such as training workshops, from the patients' perspective. Hence, this study sought to develop and examine the content and face validity of a novel outcome measure for assessing the quality of patient engagement in research projects from a patient partner perspective.

Methods
We used a mixed qualitative and quantitative study design. As depicted in Fig 1, our collaborative team of researchers (CBH, TA, NK, CLB, LCL) and four experienced patient partners (AMH, AMM, KE, SM) conducted this study in three phases: 1) item generation, 2) item selection, and 3) pretesting of the resulting questionnaire.

Item generation
Items were generated through a secondary qualitative analysis previously described for the development of the Patient Engagement In Research (PEIR) Framework (Fig 2). [6,35] The PEIR Framework has eight organizing themes for the elements of meaningful patient engagement in research projects from patients' perspectives. Briefly, it was developed through thematic analysis of 18 transcripts of one-to-one in-depth interviews in a study exploring the experiences and views of patients with arthritis regarding their engagement in health research and supplemented by 18 publications related to public and patient engagement. [6,35] The lower-level themes underlying the eight organizing themes of the Framework were restructured into statements and used as the initial questionnaire items.

Item selection
We conducted a three-round modified Delphi survey process via email correspondence to select and refine the questionnaire items (Fig 1). [36,37] The purpose of the e-Delphi survey was to build consensus among the target users about the items for the emergent measure. Teleconferences among participants were incorporated to enhance the participants' assessment of items by providing a medium to discuss their positions on items. [38] Settings and participants: e-Delphi survey. We sought to recruit a minimum of 10 participants. [36] Eligible individuals were adults (�18 years old) who: 1) had engaged in research projects in Canada within the last three years and had reliable internet access; and 2) identified themselves as patients or informal caregivers, family members, or friends of patients. Recruitment was through email, websites, and social media accounts of health-related research networks or organizations and patient engagement organizations (e.g., Patient Voices Network) using a digital recruitment poster. We also recruited from our personal contacts and by wordof-mouth through study participants.
Data collection: e-Delphi survey. Before round one, our research team iteratively refined the initial questionnaire through discussions in a team meeting and subsequent email correspondence to remove redundancy and improve the comprehension and format of the questionnaire. The FluidSurveys (http://fluidsurveys.com) platform at the University of British Columbia hosted the questionnaire, and unique links were emailed to each participant. In round one, participants: 1) rated the quality of wording of each item (1-4; higher = better quality), 2) rated the level of importance (1-4; higher = more important), 3) provided alternative wording for the items (optional), and 4) suggested additional items (optional). The questionnaire was revised based on the results. In round two, individuals reviewed a personalized anonymized summary of their ratings compared against those of all other participants, and they then repeated steps 2 to 4 with the revised questionnaire. In addition, they reviewed items that did not meet all of the selection criteria (see below) and indicated which should be kept and which should be removed. Steps 2 to 4 were repeated in round three.
Between rounds two and three, each participant took part in one of two teleconference discussions co-facilitated by CBH and TA. Participants discussed certain items, chosen by our team, that had only partially passed the selection criteria. The facilitators took notes on suggestions for refining or removing each item.
Data analysis: e-Delphi survey. Across the three rounds, we applied two quantitative selection criteria to each item: 1) a median rating of �3.25 for level of importance, and 2) a rating of 3 or higher for level of importance by �70% of the e-Delphi survey participants. [39] We had a supplementary third criterion, the comments on an item's wording and importance, when available. An item passed the selection criteria when both 1 and 2 were met, partially passed when only either 1 or 2 was met, and failed when neither was met. These criteria guided our decision to retain an item when it passed; revise when it passed, partially passed, or failed; or remove when it partially passed or failed. The supplementary criterion and ratings on the quality of wording in round one informed the revision of an item. In round two, to be selected, >50% of participants needed to respond 'Yes' to keeping certain items that partially passed round one.
Two researchers (CBH and TA) analysed and interpreted round one data to revise the questionnaire. After rounds two and three, the respective data were analysed and interpreted by our patient-researcher partnership team members (except CLB and NK) in a two-hour team meeting. Team members received a data summary one to two weeks before the meeting. CBH presented the summary of the data for each item during the meeting and made the agreedupon recommended changes. Subsequently, the questionnaire was iteratively refined through email and in-person communication between CBH and individual team members. Data from round two and the teleconferences were combined to refine the questionnaire for round three. Finally, our team used round three data to build consensus for the items to include in the provisional measure. All calculations were performed using SAS software package, version 9.5 (Cary, North Carolina, USA).

Pretesting of questionnaire
Using the provisional measure, we conducted two rounds of cognitive interviewing to proactively identify and fix potential problems that would contribute to measurement error in order to establish the face and content validity. [40] Verbal information from respondents completing the provisional measure was used to evaluate the quality and respondents' interpretation of each item. [41] Our approach was guided by the Cognitive Interviewing Reporting Framework, which provides direction on high-quality reporting. [42] Settings and participants: Cognitive interviewing. Cognitive interviewing has no agreed-upon minimum sample size, [40,43] but the number of participants typically varies between 5 and 15 per round. [41] Eligibility was consistent with the item selection phase, except that internet access was not required and individuals had to be available for an in-person interview in or near Vancouver, British Columbia. The recruitment strategy mirrored the item selection phase. Additionally, we contacted patient partners on email lists from research conferences and contacted participatory research teams.
Data collection: Cognitive interviewing. We constructed an interview guide (see S1 File). Three authors (CBH, AMM, LCL) used the Question Appraisal System (QAS-99) to independently review each item to identify possible problems to ask about. [40,44] QAS-99 is an eight-step checklist for identifying and fixing problems in questionnaire items. [40,44] The main probes were organized into five categories. Four were common cognitive processing problems for questionnaires (comprehension, information retrieval, judgment/estimation, reporting) as articulated in the Cognitive Aspects of Survey Methodology (CASM) model and one involved logical/structural problems. [45,46] All interviews were scheduled for one hour at a mutually convenient location. In one-toone interviews by 'concurrent verbal probing,' CBH asked each participant about an item immediately after or when responding to it, and about the instructions, structure, and usefulness of the questionnaire. [40] The interviewer attempted to selectively probe at least two participants' views on each item. [40] Interviews were audio-recorded and transcribed verbatim for analysis. CBH has experience in psychometric evaluation of patient-reported measures and has published on the engagement of patients in research. He was trained by a senior qualitative researcher (CLB) on interview skills for this study.
Data analysis: Cognitive interviewing. After each round, two researchers (CBH and NK) segmented and reduced the interview data to match corresponding items and questionnaire instructions. They then coded for potential questionnaire problems. A list of the items and corresponding potential problems was emailed to the research team. Subsequently, the research team met, discussed, and agreed by consensus on a solution to keep, modify, or remove each potentially defective item. The changes were made by CBH, and further feedback requested from the team by email. Our collaborative approach sought to ensure the issues addressed were real rather than errors arising from using a researcher perspective exclusively. [41] Problems were addressed based on their logical merit as decided by our research team, rather than on their frequency. When a problem identified was applicable to other items, we made the appropriate changes. The data analysis was performed using NVivo software (version 11, QSR International Pty Ltd, Burlington, MA).

Ethical considerations
Each participant in the item generation phase gave consent. Participants were offered an honorarium of $60 for the item selection and $40 for the pretesting phase. They gave informed written consent and indicated whether they wanted to be explicitly acknowledged in this paper. This study was approved by the Behavioural Research Ethics Board at the University of British Columbia (H16-02337).

Overview of patient engagement
This was a researcher-initiated study. [21] Our collaborative research team included four patient partners, all Caucasian women with an arthritis diagnosis and previous experience of engaging in health research. They were members of the Arthritis Patient Advisory Board of Arthritis Research Canada. Patient partners were engaged from the preparatory phase (refining the research question and grant application) through to the ongoing translation phase. They contributed by reviewing and commenting on study documents through email, research team meeting discussions (whether in-person or remotely via teleconference and videoconference), recruitment of participants, presentation of study findings at conferences, and writing of this manuscript. Patient partners on our research team were vital for the stage of interpreting the participants' comments. They contributed to subjective modifications of the questionnaire that reflected on the perspectives of the study participants, to reach our final decision on each item and the overall questionnaire.

Item generation
We created 120 items. The details of the sample used for item generation are published elsewhere. [6] Notably, of the 18 participants, 17 were women, all were diagnosed with arthritis, and 12 had concurrent health conditions/diseases. These 120 items were divided into eight themes [38]: Procedural Requirements (n = 43), Convenience (n = 9), Contributions (n = 16), Research Environment (n = 5), Team Interaction (n = 12), Support (n = 7), Feel Valued (n = 12), and Benefits (n = 16). Table 1 presents the demographic characteristics of the 12 participants in the e-Delphi survey. Most participants were women (83%), Caucasian (92%), and aged over 45 years (75%). They represented a variety of diseases, health-related conditions, and use of healthcare services. Their highest formal education ranged from high school diploma (n = 1) to master's degree (n = 2). Two participants were recruited from the authors' personal contacts. All participants completed each round, except one participant who missed round two because of an environmental disaster.

Item selection
In round one of the e-Delphi survey, which took three and a half weeks, 65 items (54%) passed the quantitative selection criteria. Only four items were missing one rating each for level of importance. For quality of wording, most items had one to three missing responses. The majority of the items (n = 87) had a median rating of 3 ('good') for quality of wording, while 30 had a median rating of >3, and four had a median rating of <3. Fifty items were revised, and all 120 items were included in the refined questionnaire for round two.
In round two, which took four weeks, 38 items (32%) passed the quantitative selection criteria plus one item that >50% participants rated should be kept. (See S1 Appendix for a per item summary.) An example of an item discussed is "The project matched my interests," which had a median of 3 and 91% of participants rated it �3 for level of importance. During the first teleconference, the word 'suited' was suggested to replace 'matched,' but 'piqued' was later suggested in the second teleconference. The item was modified accordingly, but eventually removed after the final round after it failed the selection criteria. A second example, the item "I was paid for my contributions," had a median of 4, but only 64% of participants rated it as �3 (moderately or extremely important). This item was extensively discussed during the teleconferences. The first teleconference resulted in "I received sufficient payment for my contributions (. . .)," which was changed to "I was offered sufficient payment for my contributions (. . .)" during the second teleconference. It was important to include the word 'offered' because, for various reasons, some patient partners would not accept compensation. A total of 37 items were modified, 23 of which had passed the quantitative selection criteria.
Of the 57 items in round three, which took three weeks, 34 passed and nine partially passed the two quantitative selection criteria (see S2 Appendix). Twenty-three items were subsequently modified. The remaining 43 items were divided across the eight themes, from three items each for Research Environment and Team Interaction to 16 items for Procedural Requirements. Alzheimer's disease, bursitis, cancer, cerebral palsy, Crohn's disease, diabetes, multiple sclerosis, rheumatoid arthritis, alcohol abuse disorder, compression fracture, depression/ anxiety, fragility, hearing loss, hepatitis C, HIV, leg amputation, osteoporosis, spinal cord injury, stroke, vertigo, "alcoholic/ addict in early recovery", care failure resulting in death, and lung transplant.
The questionnaire had 120 items in round one, 120 items in round two (86 rated for level of importance and 34 rated on whether or not they should be kept), and 57 items in round three. Table 2 shows distributions of the items with respect to meeting the two quantitative selection criteria within each round. During the two teleconferences, in July 2017, eight participants discussed four items in the first teleconference, and four participants discussed 11 items in the second. https://doi.org/10.1371/journal.pone.0206588.t001

Pretesting of questionnaire
Cognitive interviewing included 12 participants (round one: n = 5; round two: n = 7) between January and April 2018 (Table 1). One man had participated in the e-Delphi survey. Most participants were men (75%), Caucasian (83%), and aged over 45 years (67%). All identified as patients, and three also identified as both family members and informal caregivers. They reported having a variety of diseases or health-related conditions. Their highest formal education ranged from some college (n = 3) to master's degree (n = 1). One participant was recruited from the authors' personal contacts. The cognitive interviews lasted between 24 and 77 minutes. In round one, each item had comments from 2 to 5 participants, except for two items that had one and no comments each. In round two, each item had comments from 2 to 7 participants.
Potential problems were identified in 32 items, and the general and theme-specific instructions ( Table 3). Five of these items were addressed because of a potential problem identified within one item during round two. We identified 27 potential problems in round one and 32 in round two. We applied the CASM model and logical problems scheme 54 times across the items: comprehension (n = 26), retrieval (n = 2), judgment (n = 13), reporting (n = 4), and logical (n = 9). Four items were removed after round one and one more after round two. Four of the six items removed were too similar to other items, and two of them were integrated with other items. The fifth item, "I had the option of joining meetings remotely," was subsequently removed after it was deemed by our team to not be broadly applicable, and the sixth item was considered redundant.
One participant's recommendation, subsequently affirmed by other participants, informed the modification of the general instruction section of the measure to include information about its expected completion time. The theme-specific instructions were modified to clearly indicate that respondents should consider their entire experience throughout a research project when responding to each item.
In round one, the number of themes in the PEIRS was reduced from eight to seven after Research Environment and Team Interaction were combined into Team Environment and Interaction. The original themes had overlapping constructs, and our team decided that the small number of items could be represented by a single theme. Across the two rounds, the item "I was offered sufficient compensation for my contributions (. . .)," in the Feel Valued theme, stood out as a contentious item. Participants had diverse opinions about its inclusion. In round two, one participant noted that the term 'compensation' often has a financial connotation, and suggested that the word 'recognition' be included in parentheses to circumvent potential ambiguity. Our research team replaced the word 'compensation' with 'recognition,' noting that it was a more comprehensive description of the ways in which patient partners are shown appreciation for their contributions to a research project. CBH observed no differences in views on items by gender or type of patient partner.

Patient Engagement In Research Scale (PEIRS)
The resulting measure, the Patient Engagement In Research Scale (PEIRS), contains 37 items distributed across seven themes of meaningful engagement of patients in research that could potentially operate as subscales (see S2 File). There are three to five items in six of the themes, and 14 in the Procedural Requirements theme. The PEIRS currently uses a five-point Likert scale to rate each item, and likely takes 10 to 15 minutes to be completed. Numbers were not added within the response categories of the Likert scale, because labelling the Likert scale with numbers rather than adjectives might entail different cognitive processing (for example, 4/5 versus Agree/Strongly Agree) when a patient partner is estimating their experiences as The number of patient partners on the research team was appropriate   captured by each item. Total scores generated by the PEIRS will be interpretable only after its measurement properties have been determined in a subsequent study. (See S3 File for a guide to calculating the total scores.) Finally, the PEIRS achieved a Flesch Reading Ease score of 71.2 in Microsoft Word 2016, demonstrating it is suitable for reading at a 7 th -grade level or higher.

Discussion
There is increased utilization of participatory research approaches that include patient partners on health research project teams. [47] This study developed and pretested a self-administered questionnaire for patient partners to self-report the quality of their engagement in research projects. To our knowledge, the PEIRS is the first measure developed to assess the quality of patient engagement in research in a comprehensive way, [48] and first to be built primarily from the perspectives of patient partners. This study is, in part, a response to the reported need for such a measure to determine effective methods for engaging patients in research. [3,15,32] Instead of focusing on the dimensions of process, context, and impact outline by Esmail et al., [32] we had a broader focus on the quality of engagement. Meaningful engagement as the construct underlying the quality of engagement encapsulates aspects of those dimensions. [6] A main strength of this study is our detailed approach to ensure the PEIRS was grounded on the experiences and views of patients (inclusive of their informal caregivers, family members, and friends) who engage in research project teams.
Both the e-Delphi survey process and cognitive interviewing process (i.e., cognitive testing) provided content validation for the generated items. This demonstrated that the participants viewed the items as comprehensible and acceptable for capturing degrees of meaningful patient engagement in research. [36,49] Through the e-Delphi survey, we determined the highly important items for capturing meaningful engagement. Both processes endorsed the items' placements within the themes of the PEIR Framework. No additional items were added during the e-Delphi survey, which suggests the PEIRS is a comprehensive way to capture what patient partners value as the essential elements of meaningful engagement in research. The e-Delphi survey built anonymous consensus among the study participants using their independent ratings, while limiting any bias arising from the influence of any participant. [36,38] The In addition to content validation, the cognitive interviewing process helped to establish face validity of the PEIRS. This demonstrated that participants subjectively endorsed the PEIRS as appropriate for capturing meaningful engagement of patients in research. [49] Through the interviews, participants affirmed the acceptability of the items included in the PEIRS. Furthermore, we corrected potential problems in the questionnaire that would ensure its content is clearly presented and understandable, and has no ambiguity or redundancies. Theoretically, addressing those problems mitigates potential measurement errors in the scores that will be generated by the PEIRS.
This study has limitations. The sample of the e-Delphi survey consisted mainly of Englishspeaking Caucasian women. The views of other ethnic groups and gender identities might have been inadequately reflected in the provisional PEIRS. The cognitive interviewing increased the contribution of the perspectives of men and non-Caucasian ethnicities to the content of the PEIRS. Overall, the ideas within each item of the PEIRS were endorsed by participants with varied experiences as patient partners and with a variety of ethnic, gender, and other demographic characteristics. Finally, our research team determined the final content of the PEIRS, rather than performing cognitive interviews until no additional potential problems were identified. The inclusion of experienced patient partners on our team made this process credible, although the fact that they were all women with arthritis could be perceived as a limitation.

Context and interpretation
The construct underlying the PEIRS is deemed multidimensional and accounts for experiences throughout the entirety of a patient-researcher partnership. [6] The PEIRS is designed for an adult patient partner to complete about their own perspective of being engaged in a research project. Its readability meets recently published criteria, [48] which demonstrates it appropriate even for individuals will low degrees of reading skills in the English-language. Its administration would be appropriate after a research project team has had sufficient activities for a patient partner to have experiences to reflect upon. Individuals could complete it at multiple points throughout the life cycle of a research project. The PEIRS is designed to test patient engagement methods/interventions in cross-sectional and longitudinal analyses. Finally, it is intended for both individual-and group-level evaluations.

Future directions
The PEIRS is currently undergoing psychometric testing to establish its measurement properties for descriptive and evaluative applications. This follow-up online survey study seeks to establish reliability/reproducibility, construct validity, and interpretability of the scores generated by the PEIRS within a broad range of adult patient partners in different age categories, diseases/conditions and healthcare services, and locations across Canada. [49] This could lead to further modification of the PEIRS. A subsequent study should investigate the responsiveness of the validated PEIRS. [49] Individual-level assessments, as opposed to group-level, might be needed when monitoring and evaluating research projects that engage few patient partners.
Future studies could conduct cross-cultural adaptation of the PEIRS. Studies could also investigate the validity of the PEIRS for use with children who are patients, the public in general, or specific populations, such as Indigenous Peoples, who engage on health research teams.