Measurement of Physician-Patient Communication—A Systematic Review

Background Effective communication with health care providers has been found as relevant for physical and psychological health outcomes as well as the patients' adherence. However, the validity of the findings depends on the quality of the applied measures. This study aimed to provide an overview of measures of physician-patient communication and to evaluate the methodological quality of psychometric studies and the quality of psychometric properties of the identified measures. Methods A systematic review was performed to identify psychometrically tested instruments which measure physician-patient communication. The search strategy included three databases (EMBASE, PsycINFO, PubMed), reference and citation tracking and personal knowledge. Studies that report the psychometric properties of physician-patient communication measures were included. Two independent raters assessed the methodological quality of the selected studies with the COSMIN (COnsensus based Standards for the selection of health status Measurement INtruments) checklist. The quality of psychometric properties was evaluated with the quality criteria of Terwee and colleagues. Results Data of 25 studies on 20 measures of physician-patient communication were extracted, mainly from primary care samples in Europe and the USA. Included studies reported a median of 3 out of the nine COSMIN criteria. Scores for internal consistency and content validity were mainly fair or poor. Reliability and structural validity were rated mainly of fair quality. Hypothesis testing scored mostly poor. The quality of psychometric properties of measures evaluated with Terwee et al.'s criteria was rated mainly intermediate or positive. Discussion This systematic review identified a number of measures of physician-patient communication. However, further psychometric evaluation of the measures is strongly recommended. The application of quality criteria like the COSMIN checklist could improve the methodological quality of psychometric property studies as well as the comparability of the studies' results.

Introduction measures published between 1986 and 1996 which measure physician-patient interaction. They found 44 instruments that were reviewed for reliability and validity. Most instruments were reliable and were designed for teaching and medical education. An up-to-date comparison and evaluation of existing instruments based on clearly defined quality criteria is missing so far, but is necessary to a) choose the most appropriate instrument for a specific research purpose b) facilitate the comparison and appraisal of different intervention studies and c) clarify further research needs, e.g. (re-) development of measurement instruments. Hence, this study seeks to 1) provide a systematic overview of generic measures on physician-patient communication, 2) evaluate the quality of design, methods and reporting of studies that present psychometric properties of measures, and 3) determine the quality of the psychometric properties of the identified measures.

Methods
The systematic review was registered in the International prospective register of systematic reviews PROSPERO (registration code: CRD42013005687).

Eligibility criteria
Peer-reviewed studies, published in English or German, were retrieved. We included studies, which tested psychometric properties (e.g. validity, reliability) of instruments that measure the construct physician-patient communication. We adopted a broad definition of communication comprising verbal or non-verbal behavior, a set of communication, interaction or interpersonal skills. We included studies on communication between physicians and adult patients ($18 years). We excluded studies that only reported communication in a subscale of a broader construct and studies that were limited to the medical education setting. Only generic instruments (i.e. applicable to a broad range of health conditions, groups of patients, and settings) were included for the reason that we found specific measures (e.g. measuring only end-of-life care) were less comparable to each other than to generic measures. The applied inclusion and exclusion criteria are displayed in Table 1.

Search strategy
We searched the databases PubMed, PsycINFO and EMBASE including all articles from their inception to August 15, 2013. For each data base a specific search strategy was developed based on a combination of Medical Subject Headings (MeSH) and free text terms in five domains: (i) patient (ii) physician, (iii) communication, (iv) measurement and (v) psychometrics (see S1 File). Furthermore, we used the PubMed search filter for finding studies on psychometric properties of measures developed by Terwee et al. [14]. This filter was developed by a multidisciplinary team of experts in the field of health status measurement instruments, also known as the COSMIN group (www.cosmin.nl) to facilitate the selection of studies on measurement properties of measurement instruments. We also conducted a secondary search, tracking all reference lists and citations of the included full-texts for further studies of potential relevance and included articles of the authors' personal knowledge.

Study selection
For an initial screening, all search results were imported into a reference management software (Endnote) and duplicates were removed. First, titles and abstracts were assessed to exclude clearly irrelevant records. Second, the remaining full texts were assessed for eligibility. All steps were performed independently by two team members (EC and JZ or IS or JD). The two members decided upon inclusion. Disagreements between reviewers were resolved in discussion with a third team member (JZ or IS or JD). The reviewers were not blinded to authors, date and journal of publication.

Data extraction and quality assessment
Three reviewers (EC, JZ and EM) extracted data of the included studies on measures of physician-patient communication by using data extraction sheets. To reduce any bias that may occur with the assessment of one reviewer only, one study was independently assessed triple by EC, JZ and EM. As recommended by Mokkink et al. [15], we did a self-training to ensure all reviewers apply the COSMIN checklist (see section 2.4.2) and Terwee et al.'s criteria (see section 2.4.3) correctly. For another five studies, independent double assessment was performed (either JZ and EM or EC and EM). Initial ambiguities in the rating procedure were discussed between the reviewers and within the research team. After this set Table 1. Inclusion and Exclusion criteria.

Inclusion criteria
The full text is accessible (2) The language of the publication is English or German The article is published in a peer-reviewed journal (4) The aim of study is to test psychometric properties of an instrument (5) The measured construct is communication* The target group is adult patients The communication partners are patient and physician of five studies, no further questions occurred and the data extraction and quality rating was performed by one reviewer (either JZ or EC). We sought information about (1) descriptive data of measures and studies (2) quality of design, methods and reporting and (3) quality of psychometric properties of the included studies.

Descriptive data
The following descriptive data was extracted for the measures: name of the instrument, authors, year, language, perspective (e.g. patient-or physicianreported outcome or observer rating or coding), dimensions, number of items and response scale. Furthermore, we extracted study characteristics (e.g. setting, sample, country).

Assessment of the methodological quality
The Center for Reviews and Dissemination and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommends using checklists for the appraisal of study quality (http://www.prisma-statement.org/). We undertook two assessments of quality, one for the methodological quality of the included studies and one that describes the psychometric quality of the included studies. For the assessment of the methodological quality of the included measures, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist, [16][17][18] was applied. The COSMIN checklist was developed in an international Delphi study that sought consensus on definitions and assessments of measurement properties [17]. For systematic reviews the application of the four-point rating scale has been found as appropriate assessment method [18]. The COSMIN checklist consists of twelve boxes. Nine of these boxes refer to methodological standards for studies on measurement properties: A) internal consistency, B) reliability, C) measurement error, D) content validity, E) structural validity, F) hypotheses testing, G) cross-cultural validity, H) criterion validity, I) responsiveness. Box J) contains two standards for the interpretability of patient-reported outcomes. Furthermore, the COSMIN checklist provides evaluation standards for articles that use the Item-Response-Theory (IRT box) and generalizability of the results (Generalizability box). Each of the boxes A) to I) and the IRT box consist of several items concerning design requirements and statistical analyses. The items can be scored on the four-point rating scale representing options for poor (0), fair (+), good (++) or excellent (+++) quality. The overall score of the quality of each psychometric property is defined as the lowest score of any item within the box, following the ''worst score counts'' method. Data extraction and evaluation was performed for all COSMIN boxes, but we limited the description to the results of the 4-point scale ratings psychometric property boxes A) to I), since the Generalizability box and the Interpretability (box J) do not add much information to our extraction of descriptive data of the studies. On the COSMIN website (www.cosmin.nl) the authors point out that the checklist mainly focuses on standards for studies that examine psychometric properties of Health-Related Patient-Reported Outcomes (HR-PROs). In this study, we included patient-and self-reported measures as well as observer based measures on physician-patient communication. Nevertheless, prior studies applied the COSMIN criteria to a range of measures [19,20]. For the instruments that use observer codings or ratings, the items of the COSMIN checklist are not always applicable (e.g. the design requirement on how to handle missing items is not applicable for observer coding systems). Those items of the checklist for the observer tools were coded as ''not applicable'' (n/a).

Quality rating of psychometric properties
In order to evaluate and compare the included studies for the quality rating of psychometric properties the criteria developed by Terwee et al. [21] were applied. The criteria refer to the following psychometric properties: content validity, internal consistency, criterion validity, construct validity, reproducibility (agreement and reliability), responsiveness, floor and ceiling effects and interpretability. All properties can be evaluated by one item as either positive (+), intermediate (?), negative (-) or no information available (0). This list of criteria has been applied successfully in prior reviews [22].

Data analysis and synthesis of results
The key characteristics of the studies and the assessment of the methodological quality and the quality rating of psychometric properties were combined in a narrative summary. For the results of the methodological quality assessment, the median of the number of COSMIN criteria reported in the studies is presented. Furthermore, an overview of the results is displayed in two tables.

Literature search and study selection
Electronic searches identified 7508 records. The secondary search yielded another 94 records, 92 studies were identified from citation and reference tracking and two studies by the authors' personal knowledge. Duplicates were removed and of the 6001 remaining records, 5765 records were excluded based on title-and abstract screening. The full-texts of 245 records were assessed for eligibility. 219 records did not fulfill the inclusion criteria (see Table 1) and were excluded. This led to the inclusion of 26 studies. The main reasons for exclusion were that the measured construct was not communication (N567) or that the aim of study was not to test psychometric properties of an instrument (N551) or to measure communication skills within a medical education setting (N551). The study selection procedure and reasons for exclusion are displayed in Fig. 1.
The initial studies on the development of the following three instruments [23][24][25] could not be included in this review. For the Classification System of Byrne and Long and the Roter Interaction Analysis System (RIAS) [26], no study on the original development was published in a peer reviewed journal and the publication on the original development study of the VR-MICS was only available in Italian [27]. For three studies, we only extracted one part of the study since these articles described more than only a physicians' version of the measure [28][29][30]. For one study [31], no data extraction was conducted for the reason that we found the structure of the study not transparent and neither COSMIN nor the criteria of Terwee et al. could be applied. Therefore, data on methodological quality and quality of psychometric properties was extracted for 25 studies only.

Characteristics of included instruments
In total, we included 20 measures in the review. Four measures were not clearly named by the authors; we therefore used the description from the title or abstract to abbreviate the instruments in our description, the Physicians-patient communication patterns (PPCP) [46], the Classification System of Byrne and Long (CSBL) [23], the Matched-pair instrument (MPI) [48] and the Generic peer feedback instrument (GPFI) [46]. We found eleven measures that use observer coding or rating systems [23-25, 29, 30, 34, 40, 42, 43, 46, 47]. Five measures are patient-reported [32,33,39,45,50]. Another two instruments use both physicianand patient-reports [44,48]. Only one measure solely measures the physician's rating [28] and a last measure is a computer based analysis [49]. Characteristics of the identified measures are displayed in Table 3.
Cross-cultural validity (Box G) was only assessed in one study [35] and rated as poor. Three studies [39,41,50] translated instruments, but did not assess cultural validity. For these studies, the translation procedure was rated with the items 4 to 11 of Box G. Criterion validity (Box H) and Responsiveness (Box I) were not analyzed by any of the studies. The detailed COSMIN ratings on item level are shown in S1

Quality of psychometric properties
The evaluation of the quality of psychometric properties of the identified measures was conducted with the criteria of Terwee et al. and results are shown in Table 5. Content validity received a positive score in eight studies [28,29,32,33,40,42,44,45]. Four studies [25,46,48,50] were rated as intermediate, and six studies [30,34,39,43,47,49] received a negative rating. The other studies did not give any information on content validity. For internal consistency, positive ratings were found for seven studies [32,33,37,39,44,45,50], six studies received intermediate ratings [28,41,43,[46][47][48] and one study received a negative score [42]. For half of the studies, no information was available on internal consistency. The majority of the studies also did not provide information on construct validity. Nevertheless, five studies received a positive score [28,33,37,42,45] and five studies an intermediate score [24,39,43,47,49]. Two studies scored negative on construct validity [25,48]. Information on reproducibility (reliability) was rated as positive for five studies [29,32,35,38,41], intermediate for ten studies Table 5.  [23,25,30,33,34,39,43,46,47,50], and negative for one study [45]. The study on the LIV-MAAS [36] was rated as positive and intermediate, because this study examined reliability in two different samples and therefore one rating for each sample was conducted. Scores were positive and negative for the study on the PBCI [42] due to its two dimensions (1.facilitating, 2. inhibiting). Scorings on interpretability were either intermediate or no information was available. None of the studies gave information on criterion validity, reproducibility (agreement), responsiveness or floor and ceiling effects.

Discussion
This review sought to systematically examine studies on psychometric properties of measures on physician-patient communication, to investigate the methodological quality of these studies and to evaluate the quality of the psychometric properties of the identified measures. We extracted data from 25 studies examining 20 measures of physician-patient communication.
Regarding the methodical quality of the studies, the results revealed a heterogeneous picture. Only two studies received an excellent or good score on internal consistency [33,42]. For reliability, the best rating was good for four of the studies [29,36,41,42]. Six studies [29,36,38,40,41,43] showed conflicting results. For content validity, two studies received an excellent or good score [29,44]. From the studies that investigated structural validity, three were rated of good or excellent quality [24,33,42]. For hypothesis testing, only one study [42] received a good score. Cross-cultural validity was only examined for one measure [35] which scored poor. In summary, three of the instruments received poor scores on the overall COSMIN rating. The study on the PBCI [42] tested the most psychometric properties and was the only study that achieved two excellent and two good scores.
Remarkably, none of the studies on patient-or physician-reported measures received an excellent score on any psychometric property, but three of the observer ratings did. However, when ratings are examined for each study per item (see S1 Table and S2 Table in S2 File), most of the studies received more excellent and good scores. Furthermore, the items concerning the handling of missing items were rated on the COSMIN checklist for the patient-or physician-reported measures. In case of a low rating on these items and due to the COSMIN recommendation namely to count the worst score per box, the final results of the patient-or physician-reported measures might be lower than for the observer rating systems.
Quality of psychometric properties evaluated with the Terwee et al.'s criteria [21] were available for content validity, internal consistency, construct validity, reproducibility (reliability) and interpretability. For criterion validity, reproducibility (agreement), responsiveness, floor-and ceiling effects none of the studies reported information. For measures that describe the absence or presence of certain communication aspects, reporting of floor and ceiling effects might be not appropriate on the item level. In the case of available information, psychometric properties scored mostly positive or intermediate. Negative ratings for the quality of content validity were found only for three studies [30,47,49]. Although some of the measures scored well on the methodical rating with COSMIN, the evaluation with the Terwee et al.'s criteria [21] showed clearly that the quality of the results was not always sufficient. For example, the study on the PBCI [42] scored excellent for the methodological assessment of internal consistency, but the quality of this psychometric property was only rated poor with the Terwee et al.'s criteria. The findings were similar for the study on the MPI [48] on construct validity.
In summary, the results for the methodological quality assessment show that studies reported on a median of 3 out of the nine COSMIN criteria. However, several flaws were revealed concerning the methodical quality and the quality of the psychometric properties. Content validity and hypothesis testing was of rather poor methodical quality and measurement error, criterion validity and responsiveness were almost not considered and should be addressed in future psychometric studies. The quality rating with Terwee et al.'s criteria showed that some measures received positive scores even though the methodological procedure was not always adequate. When combining the ratings of the studies on the COSMIN and Terwee et al.'s criteria, best results were received for the studies on the following measures: the SEGUE framework [29], the PBCI [42], and the QQPPI [33]. Each achieved at least two excellent or good ratings on COSMIN and two positive ratings on Terwee et al.'s criteria. The studies on the TCom-skill GP [32] scale and the PHCPCS [45] scored good and fair on COSMIN, but received only three positive ratings on the Terwee et al.'s rating.
Our results are barely comparable to the previous reviews conducted by Ong et al. [12] and Boon and Stewart [13]. Ong et al. [12] mainly presented an overview of measures of physician-patient communication without evaluating the psychometric properties of the instruments. Boon and Stewart [13] included instruments developed for the use in medical education settings, as well as manuals of measures without a validation-study published in a peer reviewed journal, therefore almost none of the measures included in that review were included in this current review. Moreover, since then several new instruments examining physician-patient communication were developed which could not be considered in those reviews, but were evaluated in this study.
From our results, we suggest to further evaluate psychometric properties of existing measures on physician-patient communication using more rigorous methodological designs. Furthermore, there is a particular need to conduct further psychometric evaluation studies on the measures, especially to assess psychometric properties that have not been tested yet (e.g. responsiveness). However, the results from this study can be helpful for researchers to select the most appropriate measure for conducting a study on physician-patient communication. Since the included measures have different rating perspectives, the selection of a measure over another will be driven by the study aim and the feasibility in a certain study setting.

Strengths and limitations of the study
A strength of this review is the detailed electronic search strategy, which was based on the COSMIN filter [14]. Moreover, two researchers independently assessed all records and full texts and together with a third reviewer quality was ensured by double or triple assessment of some studies. Another notable strength is that quality assessments were conducted by using both the COSMIN checklist and the quality criteria for good psychometric properties developed by Terwee et al. [21]. To our knowledge, no systematic review on physician-patient communication to date provides an elaborated judgment on the methodical quality of the studies and their final results on the psychometric properties following the recommendation of Mokkink and colleagues [17].
However, the current review has several limitations that need to be addressed. First, our review includes only generic measures for reasons of feasibility; and measures developed specifically for the medical education context or specific indications were beyond the scope of this review. Second, our search was limited to studies published in German or English. Therefore, studies published in other languages may not have been included. Third, although we believe our search strategy was very sensitive and guided by methodological recommendations for systematic reviews [14,51], not all studies were identified by our electronic search and were subsequently added from the authors' personal knowledge. However, the importance of personal knowledge as a valid source has been described in the literature [51]. Fourth, due to the lag between the time of the search completion and the final manuscript publication, we may have missed out recently published studies on this topic.

Conclusion
This systematic review provides an overview on measures on physician-patient communication and helps researchers to identify the appropriate instrument for their research purpose. Moreover, our study highlighted current gaps in the methodological quality of studies on psychometric properties and the quality of their results. We recommend that future evaluation studies on psychometric properties should apply standards like the COSMIN checklist in order to enhance quality of the studies and to increase the comparison of results.