Measuring patient centeredness with German language Patient-Reported Experience Measures (PREM)–A systematic review and qualitative analysis according to COSMIN

Background Patient centeredness is an integral part of the quality of care. Patient-reported experience measures (PREMs) are assumed to be an appropriate tool to assess patient-centredness. An evaluation of German-speaking PREMs is lacking. Objective To perform a systematic review and qualitative analysis of psychometric measurement qualities of German-language PREMs using for the first time a comprehensive framework of patient centredness. Methods A systematic literature search was performed in Medline, PsycInfo, CINHAL, Embase, Cochrane database (last search 9th November 2021) for studies describing generic, surgery- or cancer care-specific PREMs. All questionnaires that were developed in or translated into German were included. The content of the included PREMs was evaluated using a comprehensive framework of patient centredness covering 16 domains. Baseline data of all PREM studies were extracted by two independent reviewers. Psychometric measurement qualities of the PREMs were assessed using current COSMIN guidelines. Results After removal of duplicates 3,457 abstracts were screened, of which 3,345 were excluded. The remaining 112 articles contained 51 PREMs, of which 12 were either developed in (4 PREMs) or translated into German (8 PREMs). Eight PREMs were generic (NORPEQ, PPE-15, PEACS, HCAHPS, QPPS, DUQUE, PEQ-G, Schoenfelder et al.), 4 cancer care-specific (EORTC IN-PATSAT32, PSCC-G, Danish National Cancer Questionnaire, SCCC) and none was surgery-specific. None of the PREMs covered all domains of patient-centeredness. Overall rating of structural validity was adequate only for PEACS and HCAHPS. High ratings for internal consistency were given for NORPEQ, Schoenfelder et al., PSCC-G and the SCCC. Cross-cultural validity for translated questionnaires was adequate only for the PSCC-G, while reliability was adequately assessed only for the EORTC IN-PATSAT32. Due to a lack of measurement gold standard and minimal important change, criterion validity and measurement invariance could not be assessed for any of the PREMs. Conclusion This is the first systematic review using a comprehensive framework of patient centredness and shows that none of the included PREMs, even those translated from other languages into German, cover all aspects of patient centredness. Furthermore, all included PREMS show deficits in the results or evaluation of psychometric measurement properties. Nonetheless, based on the results, the EORTC IN-PATSAT32 and PSCC-G can be recommended for use in cancer patients in the German-language region, while the German versions of the HCAHPS, NORPEQ, PPE-15 and PEACS can be recommended as generic PREMs. Trial Registration Registration. PROSPERO CRD42021276827.


Introduction
Improving the patient centredness (PC) of healthcare has been a main objective of healthcare politics over the last decades, including German-speaking countries [1,2]. PC has been defined as one of six domains of the quality of care by the Institute of Medicine (IOM), next to safety, effectiveness, timeliness, efficiency and equitability [3] (S1 Fig). However, patients frequently experience a lack of PC in many fields of healthcare [4].
Furthermore, the dimensions of PC have not been clearly defined and several models have been proposed in the past (S1 Table). A systematic review has identified 15 dimensions of PC [5]: patient as a unique person, biopsychosocial perspective, essential characteristics of the clinician, patient involvement in care, involvement of family and friends, physical support, emotional support, clinician-patient communication, patient empowerment, patient Information, access to care, integration of medical and non-medical care, coordination and continuity of care, teamwork and teambuilding, clinician-patient relationship. The influential Picker model contains an additional dimension termed "effective treatment by trustworthy and qualified personnel" [6] (S1 Table).
Several methods have been proposed to measure PC in clinical practice [7]. Assessment via questionnaires termed Patient-Reported Experience Measures (PREM) is most frequent as they permit a standardized appraisal of PC. PREMs aim to measure PC via the experience of patients in a certain healthcare context. Depending on this healthcare context four different categories of PREMs can be distinguished, although they partially overlap: a. generic PREMs measure general aspects of PC and can be applied across multiple healthcare settings and disciplines.
b. discipline-specific PREMs assess the PC within a certain discipline. For example, surgeryspecific PREMs measure, among others, aspects of PC specific to surgical disciplines e.g. pain.
c. healthcare pathway-specific PREMs measure the PC across a specific healthcare pathway. For example, a cancer care -specific PREM measures aspects of PC important to cancer patients irrespective of treatment (surgery, chemotherapy, radiotherapy) and healthcare setting (in-hospital or as outpatient). 5. Studies containing PREMs that were not generic, surgery-specific or cancer healthcare pathway-specific. Therefore, disease-specific PREMs were excluded.

Information sources
The following information sources were searched:

Search
The search algorithm is described in detail in supplement 4 (S1 File). It is an adaptation of the search algorithm described by Bull et al. [8]. Additional studies were identified by reference searching and full text reading.

Study selection
The references of all generic, surgery-specific or cancer healthcare pathway-specific PREMs were imported into the citation program Zotero (www.zotero.org; Version 5.0.96.2). Duplicates were identified and merged either with the find duplicates function in Zotero or by hand. Titles and abstracts of all articles were read by two reviewers (AMi, CDH) and those studies not fulfilling eligibility criteria were removed. In a next step, the fulltext articles of all remaining studies were read, to decide which articles fulfil eligibility criteria. Fulltext as well as references were screened to identify additional PREMs. For all non-German PREMs fulfilling the eligibility criteria additional searches were performed in the above-mentioned databases to identify German translations. In addition, Google and google scholar were search with the name of the PREM in combination with "German translation" or "cross-cultural validation" to identify German translations. Only PREMs developed in German or for which a German language translation existed were considered for further analyses. psychometric properties was done in according to current COSMIN guidelines [14,15] (www. cosmin.nl). The evaluation of content validity was done according to Terwee et al. [11]. We used the 16 PC dimensions described in S1 File to assess content validity. The following psychometric properties were analysed: (1) content validity; (2) structural validity; (3) internal consistency; (4) measurement invariance / cross-cultural validity; (5) reliability; (6) measurement error; (7) criterion validity; (8) hypothesis testing for construct validity. Table 1 shows the COMSIN assessment criteria used in this study. For overall evaluation "+" marks adequate, "-"inadequate and "?" unclear psychometric properties. A detailed description of the methods can be found in S2 File.

Content validity
Content validity was evaluated using he 16 dimensions of PC outlined in the introduction ( Table 3). None of the questionnaires covers all aspects of PC. An overview of the methodological quality of studies analyzing the content validity according to COSMIN can be found in Table 4. Alle 12 included PREMs exhibit sufficient relevance. Because of the lack of content dimensions all questionnaires have deficits in comprehensiveness ( Table 4). Understandability of questionnaires is adequate in most cases. However, the methodological quality of content validity studies varies widely from low (Schoenfelder, DUQUE) to high (Picker, HCAHPS, PEQ, EORTC IN-PATSAT32, PSCC-G) ( Table 4).

Structural validity
"Structural validity refers to the degree to which the scores of a PROM (or PREM) are an adequate reflection of the dimensionality of the construct to be measured" [11]. A summary table of the findings on structural validity can be found in Table 5. Not all necessary data on confirmatory factor analyses according to current COSMIN guidelines were available to rate the NORPEQ or the questionnaire by Schoenfelder et al. (Table 5). The PPE-15 received an insufficient rating for structural validity due to the inadequate design of the underlying studies [16]. Similarly, the SCC showed insufficient structural validity. No data on structural validity could be obtained for the German PEQ neither in the original publication nor in additional studies [25]. The same was true for the Danish National Cancer Questionnaire. PEACS and HCAHPS received a sufficient rating (Table 5). For the EORTC INPAT-SAT32 there is a metanalysis summarizing the data on structural validity [29]. There is a detailed analysis of structural validity for the PSCC-G [22], which shows mixed results. Table 6 shows the results of internal consistency studies of German-language PREMs. Cron-bach´s Alpha for the NORPEQ (containing only one scale) is 0.85 [20]. A single validation study is available for the PPE-15 [16]. It is unclear whether the consistency statistics contained in this study are calculations of Cronbach´s Alpha or not. Furthermore, the consistency    The internal consistency of the EORTC IN-PATSAT32 has been analyzed in a recent metaanalysis [29]. According to this metaanalysis, 5 studies are available on the internal consistency of the IN-PATSAT32, of which 3, however, are of poor methodological quality. Analyses are available for all subscales of the questionnaire, with Cronbach's Alpha �0.70 across all scales, except for the subscale hospital access (Cronbach's Alpha <0.70). The overall Cronbach´s Alpha for the entire PSCC-G is reported to be 0.92 [22]. As the German version of the PSCC (PSCC-G) consists of the German translation of the English PSCC [30] as well as of translations of parts of the French REPERES (subscale "Information") [31] and German PASQOC [32] (subscales family and friends, shared-decision making and nursing staff), the original questionnaires can be analyzed for internal consistency as well. For all subscales Cronbach´s Alpha is reported to be �0.70.

Internal consistency
involvement of family and friends clinician-patient communication integration of medical and nonmedical care 13 coordination and continuity of care

Measurement invariance / cross-cultural validity
"Cross-cultural validity refers to the degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the performance of the items of the original version of the instrument" [11]. It is particularly important for PREM translations into German. Results are depicted in Table 7. No data on cross-cultural validity could be found for the DUQUE questionnaire. A cross-cultural validation study for the NOR-PEQ is available is available for several Scandinavian countries, but not for German. The PPE-15 underwent validation in 5 countries including German speaking countries (Switzerland and Germany) [16]. The patient cohorts were comparable in respect to sex and age, but not in respect to indications (elective vs. emergency). No further details about the patient cohorts were collected. Furthermore, no adequate study could be identified that compares the English original questionnaire with the German translation [16]. Such a study exists for the Spanish language [33]. This study shows that an expansion of the questionnaire from 15 to 33 questions (PPE-33) is necessary to preserve psychometric properties [33]. There is a cross-cultural validation study for the HCAHPS PREM [34]. In this study it is unclear whether patient cohorts were comparable. Furthermore, only 10 German-speaking patients were included [34]. In

Reliability
An overview of the results of the reliability analyses can be found in Table 8. A study measured test-retest reliability of the NORPEQ [20] for which 68 of 244 patients were resent the questionnaire within 5-6 days. The intraclass correlation index (ICC) was between 0.45 (nurse professional skills) and 0,83 (doctors understandable). Four of the eight subscales exhibited an ICC <0.70. The test-retest reliability for the total score was 0.88 [20]. Consequently, the overall reliability rating for the NORPEQ was +/-. Noest et al. analyzed reliability for the PEACS questionnaire [24]. Test-retest reliability was measured via weighted kappa. With exception of the subscale institutional treatment and transition (weighted kappa 0,671), the weighted kappa was �0.70 for all subscales. Thus, overall rating was +. The study by Keller et al. analyzed hospital-level reliability [19]. No data on test-retest reliability could be identified for the HCAHPS. Hospital-level reliability assumes that recurrent measurements (retesting) of patients in the  [25]. Logistic regression analyses were used to find out whether the questionnaire can distinguish between patient cohorts from different hospitals. However, no specific results are reported except for ". . .none of the instruments showed significant results". Further reliability data for the PEQ could not be identified. No reliability study could be identified for the PPE-15, QPPS or the QPP, DUQUE, PSCC-G, the Danish National Cancer Patient Questionnaire or the SCCC.
For the cancer-care specific PREM EORTC IN-PATSAT32 two studies investigate testretest reliability [38,39]. Appreciation of reliability in these two studies has already been done by Neijenhuijs et al. [29]. ICC was �0.70 for all subscales of the EORTC IN-PATSAT32 in the study by Pishkuhi et al. In the study by Obtel et al. all scales except doctor availability (correlation coefficient 0.64) and overall satisfaction (correlation coefficient 0.67) showed a correlation coefficient �0.70. Consequently, overall reliability rating was +. However, both studies showed methodological weaknesses as the time interval between test and retest was too short (30 minutes) [39] or it was unclear, which type of correlation coefficient has been used [38].

Analysis of measurement error
An overview of the results for the analysis of measurement error can be found in Table 9. Measurement error could not be analyzed as no minimal important change has been defined for any of the German PREMs so far. Therefore, all PREMs received "?" rating. Only for HCAHPS there has been a calculation of the standard error of Measurement (SEM). For the EORTC IN-PATSAT32 the SEM and SDC can be calculated from the studies by Obtel et al. [39] and Pishkuhi et al. [38] as has been shown by Neijenhuijs et al. [29]. However, as no minimal important change has been defined for the EORTC IN-PATSAT32 an overall rating of the measurement error is not possible.

Analysis of criterion validity
As no gold standard for the measurement of patient-centeredness has yet been defined, overall criterion validity cannot be analyzed for PREMs. However, for some PREM subscales, gold standards for measurement are available. Consequently, criterion validity for these subscales can be analyzed. No data on criterion validity was found for NORPEQ, PPE-15, HCAHPS,     [41], respectively. Both subscales showed a very high (SDM-Q-9; r = 0.814, p < 0.001) or high correlation (CTM-3; r = 0.511, p<0.001) with the respective gold standard [24].

Analysis of hypothesis testing for construct validity
Hypothesis testing for construct validity describes the degree to which a PREM results is consistent with an a priori hypothesis. The hypothesis to be tested can either be a comparison of PREM results between two clinically defined patient groups (known groups validity) or PREM subscales can be compared to another known measurement tool (convergent validity). No data on hypothesis testing for construct validity could be found for: PEACS, DUQUE, PEQ-G, Danish National Cancer Patient Questionnaire. Results for all other German PREMs can be found in Table 10. Most hypotheses are tested positive, i.e., results confirm the a priori formulated hypothesis. However, all PREMs, except for the PSCC-G, also show negative results. For the PSCC-G only positive test results could be found.

Discussion
In the current study numerous German language PREMs could be identified that were not contained in previous publications [8,42]. This was due to publications in recent years as well as due to the difficulty in identifying PREMs by database searches alone. Many German PREMs were found by hand-searches. The current study uses for the first time the current COSMIN guidelines for the assessment of PREMs [15]. Furthermore, by using a comprehensive framework of PC covering all dimensions of PC (S1 Table) a thorough analysis of content validity of PREMs was possible for the first time. The results show the lack of patient-relevant content domains in all 12 PREMs, not only for those developed in German, but also for commonly used international PREMs (Table 3). In addition, all included PREMS show deficits in the results or evaluation of psychometric measurement properties according to current COS-MIN guidelines. Based on these results, context-specific application of German PREMs is mandatory and several recommendations can be made.

Recommendations
Two out of the 12 PREMs cannot be recommended for use in German because of a lack of validation of psychometric properties: the DUQUE questionnaire [27], as well as the German translation of the Danish National Cancer Patient Questionnaire used by Rudolph et al. [21]. Depending on the intended use, one of the remaining ten PREMs can be selected. Fig 2 shows a schematic representation of the remaining PREMs within their intended area of use. The        figure can facilitate preliminary PREM selection. In a next step, the results of this systematic review can be used to select a PREM with sufficient psychometric properties (Tables 5-10) and the necessary content (Tables 3 and 4). For example, for cancer care the PSCC-G has significant better psychometric properties than the SCC with its insufficient structural validity and lack of assessment in many psychometric domains. Furthermore, when selecting a PREM the intended area of application should to be considered (see introduction) [8]: is it intended as a reflection instrument for patients or rather as a provider-specific evaluation instrument for internal use or as a benchmarking instrument to compare different providers? In each case different content dimensions ( Table 3) and length of questionnaires (Table 2) are of interest.  The following generic PREMs have been sufficiently evaluated in German: HCAHPS, NOR-PEQ, PPE-15 and PEACS. For cancer care the EORTC IN-PATSAT32 and PSCC-G have been adequately assessed and can currently be recommended. We were unable to identify a surgeryspecific PREM in the German language. However, even for the above mentioned generic and cancer care-specific PREMs certain deficits need to be considered before use. The HCAHPS for example, although showing sufficient psychometric properties in many areas, exhibits deficits in its cross-cultural validation into German (Table 7) [34]. Some of its demographic questions like "What is your race? Please select at least one." were rated poorly by native German speakers [34] and refer to its development in a different sociocultural context. Therefore, an adaptation to the German-speaking sociocultural context seems necessary. The PPE-15, one of the most frequently used PREMs worldwide, exhibits poor structural validity, while covering many dimensions of PC (sufficient content validity). In addition, for many psychometric properties of the PPE-15 no data could be found. We cannot rule out that such data exist but has not been published by the Picker institutes or was not identified by our search. The NORPEQ is an extensively studied PREM with adequate psychometric properties. However, cross-cultural validation studies only exist for languages other than German, although it has been used in a non-validated German translations [43]. One of the most extensively evaluated generic PREMs is the PEACS questionnaire, that has been developed in German with involvement of patients. It is a comprehensive questionnaire with more than 50 questions covering many aspects of PC. Because of its length ( Table 2) its intended use is as a reflection instrument for patients and as an assessment tool for providers rather than as benchmarking instrument. It is the only German generic PREM that covers not only in-hospital aspects of patient experience, but also the transition into out-patient care. Although the PEACS has sufficient psychometric properties in many areas, there is a lack of data for test-retest, inter-rater and intra-rater reliability. The PSCC-G is a cancer care specific PREM, that covers transition aspects of care. It has been built from a validated German translation of an English questionnaire with additional questions from other languages ( Table 2). The PSCC-G scored adequately in many psychometric domains, but data on testretest, inter-rater and intra-rater reliability are lacking.

Limitations
The study has several limitations. First, the search was limited to generic, surgery-and cancer care specific PREMs, i.e., PREMs for other disciplines (e.g., internal medicine) as well as PREMs for specific diseases were excluded. These PREMs can be found in the excluded fulltext list (S4 Table). Another limitation could have been the search algorithm. The fact, that many German-language PREMs were identified by hand-searching rather than the database search, could be a hint that the search algorithm was not specific enough. However, the large number of identified and screened articles indicates that our search was broad. Furthermore, we were able to identify significantly more German-language PREMs than in previous reviews [8,42]. Identifying PREMs in scientific databases is not easy. Contrary to PROMs no PREM-specific taxonomy (e.g., MeSH term) exists for PREMs in common medical databases. As pointed out, patient centredness and patient experience are only beginning to be clearly defined (S1 Fig and  S2 Table). The delineation to other concepts like patient satisfaction is not always clear cut which makes building a search algorithm more difficult.
A main finding of the study is the lack of psychometric data for many of the included PREMs. Frequently we were unable to find appropriate studies in accessible databases. However, many PREMs have been developed and are implemented by independent or commercial institutions or healthcare agencies. These institutions are often not scientifically driven and might not publish all available psychometric data. Exception are the transparent development and publication of data by the U.S. Agency for Healthcare Research and Quality (AHRQ) (www.ahrq.gov/cahps/about-cahps/index.html) or the Swedish PREMs [18,44,45].

Future research
If PC is supposed to be more than a declaration of intent of healthcare politicians, it will require the implementation of PREMs into everyday clinical practice via the following measures: • As shown in our study, there is no comprehensive modular PREM system in the German language comparable to other countries [23]. Many areas of healthcare (e.g., surgery) are not covered with available German-language PREMs. Consequently, the development, translation and testing of new PREMs is necessary.
• The missing psychometric properties of currently available German-language PREMs need to be evaluated.
• Most of the PREMs currently available are paper-based versions ( Table 2). For broad implementation and timely assessment in hospitals and doctor´s offices electronic PREMs (ePREM) seem necessary. For this purpose, paper-based PREMs will need to be evaluated as digital versions and electronic systems will have to be developed and implemented that adhere to local data safety regulations. An integration into available hospital information systems is desirable, to facilitate the use in everyday clinical practice.
• It is unclear which conclusions should be drawn from the results of PREM (sub)scales. If providers adapt their service based on PREM results, there is little evidence-base to guide such changes [46]. Individualized local measures may be implemented, but there may also be standardized interventions, which can be tested in randomized-controlled trials which might improve aspects of PC and subsequently PREM results. More research is needed in this field.
There are two projects that should be mentioned in this context. First, the EORTC is currently developing and testing the PATSAT-33, a cancer-care specific PREM that will not only cover in-hospital patients, but also aspects of PC in out-patient settings as well as the transitional period [47]. A phase IV validation study in several European countries is underway including German-speaking countries. Second, the Hamburg-based ASPIRED project [48], is currently developing a German-language PREM, that will cover all aspects of PC according to Scholl et al. [5]. Both projects will close important evidence gaps.

Conclusions
This is the first systematic review using a comprehensive framework of patient centredness and shows that none of the included PREMs, even those translated from other languages into German, cover all aspects of patient centredness. Furthermore, all included PREMS show deficits in the results or evaluation of psychometric measurement properties. Nonetheless, based on the results, the EORTC IN-PATSAT32 and PSCC-G can be recommended for use in cancer patients in the German-language region, while the German versions of the HCAHPS, NOR-PEQ, PPE-15 and PEACS can be recommended as generic PREMs.