Assessment of Trust in Physician: A Systematic Review of Measures

Over the last decades, trust in physician has gained in importance. Studies have shown that trust in physician is associated with positive health behaviors in patients. However, the validity of empirical findings fundamentally depends on the quality of the measures in use. Our aim was to provide an overview of trust in physician measures and to evaluate the methodological quality of the psychometric studies and the quality of psychometric properties of identified measures. We conducted an electronic search in three databases (Medline, EMBASE and PsycInfo). The secondary search strategy included reference and citation tracking of included full texts and consultation of experts in the field. Retrieved records were screened independently by two reviewers. Full texts that reported on testing of psychometric properties of trust in physician measures were included in the review. Study characteristics and psychometric properties were extracted. We evaluated the quality of design, methods and reporting of studies with the COnsensus based Standards for the selection of health status Measurement INstruments (COSMIN) checklist. The quality of psychometric properties was assessed with Terwee’s 2007 quality criteria. After screening 3284 records and assessing 169 full texts for eligibility, fourteen studies on seven trust in physician measures were included. Most of the studies were conducted in the USA and used English measures. All but one measure were generic. Sample sizes range from 25 to 1199 participants, recruited in very heterogeneous settings. Quality assessments revealed several flaws in the methodological quality of studies. COSMIN scores were mainly fair or poor. The overall quality of measures’ psychometric properties was intermediate. Several trust in physician measures have been developed over the last years, but further psychometric evaluation of these measures is strongly recommended. The methodological quality of psychometric property studies could be improved by adhering to quality criteria like the COSMIN checklist.


Introduction
Patient-centeredness has gained importance in research, health policy and clinical practice. Trust is considered a central factor in determining a positive patient-physician relationship [1][2][3], which is an important dimension of patient-centeredness [4]. Trust in the context of healthcare has received increasing attention in the last two decades [5]. This is partly due to the voice of concerns about the effects of organizational changes in the healthcare system on patients' trust in their healthcare professionals, healthcare institutions and the healthcare system itself [6,7]. Patients' trust has a particularly delicate notion, as patients who are ill and may have to face high risks regarding their health find themselves in an extremely vulnerable situation. Reliance on patients' individual physicians and the healthcare system is often inevitable [6,8]. The patient-physician relationship is characterized by a knowledge and power imbalance in which patients depend on the physicians' expertise and execution of treatments to solve their health problems [6,8,9]. Hence, trust in physician plays an important role and has been studied extensively.
Trust in physician can be defined as the patient's optimistic acceptance of a vulnerable situation and the belief that the physician will care for the patient's interests [2]. Empirical studies have revealed that patients' trust in physician is associated with patient satisfaction [10], continuity of care [11] and adherence to treatment [12]. Trust in physician facilitates access to healthcare, disclosure of relevant information and thereby supports accurate and timely diagnosis to be made [8]. Trust in physician is also associated with self-reported health improvement [13] and patients' self-reported ability to manage their chronic disease [14]. As the body of work increases, the question of how to measure trust in physician gains importance. The validity of empirical findings is fundamentally dependent on the quality of the measures in use. Therefore, the selection of a measure should be carefully considered and based on the measure's psychometric properties. Some studies addressed the quality of trust in physician measures [5,7,15], but no systematic review on trust in physician measures and their psychometric properties has been published to date. A thorough overview and comparison of different validated measures is needed a) to facilitate the choice of an appropriate instrument in accordance with the individual research purpose, b) to identify research gaps and needs for further psychometric testing of instruments and c) to inspire new measurement developments, if necessary.
Thus, the aims of this systematic review of measures on trust in the physician are 1) to identify existing psychometrically tested measures of trust in physician, 2) to determine the methodological quality of the studies that report on psychometric properties of measures, and 3) to evaluate the quality of identified measures based on their psychometric properties.

Registration and search strategy
The protocol for this systematic review was registered in the International prospective register of systematic reviews PROS-PERO [16] with the registration code CRD42013005048. We performed an electronic literature search using Medline, EM-BASE and PsycInfo databases (via OVID). We identified relevant articles published between January 1979, the year of the first known measure of trust in physician [11] and the 21 st of June, 2013, when we administered the electronic literature search. For this purpose, we developed a detailed search strategy for each database (see Appendix S1). We considered a combination of the following four aspects appropriate: Trust AND the context of patient-physician interaction AND measurement AND psychometric properties. We adapted terms and keywords for each database and limited all searches to publications concerning adult, middle-aged or aged humans, published in either English or German. Full insight in the electronic database search strategy can be attained by consulting Appendix S1. Furthermore, we combined the electronic database search with a secondary search including reference and citation tracking of included full texts and consultation of experts in the field of research. Additionally, we screened references of a recently published review on trust in the health system [5].

Study selection
Two reviewers (EM and JZ) independently screened titles and abstracts of the identified records for possible inclusion in the study and independently assessed full texts for eligibility by applying exclusion criteria (see Table 1). We resolved differences concerning exclusion criteria by discussion until we reached consensus. If consensus could not be reached, the final decision was made by a third reviewer (IS).

Data extraction and quality assessments
We used data extraction sheets to collect study data and to make quality assessments. Data extraction sheets were pilot-tested and adjusted. Data extraction sheets comprised descriptive data of included studies and identified measures, and data on which quality assessments are based. We assessed the quality of design, methods and reporting of included studies on psychometric properties with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist with a 4-point scale [17][18][19]. Furthermore, we evaluated the psychometric properties of identified measures with the quality criteria for good psychometric properties developed by Terwee et al. [20]. The quality criteria developed by Terwee [20] and the COSMIN checklist are described below. One reviewer (EM) performed data extraction and quality assessments. At the beginning of the quality rating, a double assessment of two studies was conducted by a second reviewer (IS) with whom ambiguities were discussed and resolved. The second reviewer (IS) further assisted with any questions occurring in the process of data extraction and quality evaluation.
2.3.1 Quality of design, methods and reporting. The COSMIN checklist is based on an international Delphi study in which 57 experts found consensus on the definitions and assessments of measurement properties [17,18]. The checklist rates the design, methodological and reporting quality of studies on measurement properties. There exist two versions for rating the COSMIN checklist: a dichotomous yes/no rating scale and a 4point scale. The latter has been recommended to use in systematic reviews [19]. The COSMIN checklist comprises twelve boxes and assesses the following psychometric properties: A) internal consistency, B) reliability, C) measurement error, D) content validity, E) structural validity, F) hypotheses testing, G) crosscultural validity, H) criterion validity, I) responsiveness and J) interpretability. For studies using item response theory methods, the IRT box provides evaluation. Sample data is extracted for each psychometric property separately with the generalizability box G. The IRT box and psychometric property boxes A to I can be evaluated with the 4-point scale. We performed data extraction and evaluation for the complete COSMIN checklist, but limit our presentation to the concise results of the 4-point scale ratings per psychometric property box. Item scores are excellent (+++), good (++), fair (+) or poor (0). The overall score for each box is determined by the lowest item score. Detailed information on the COSMIN checklist and the 4-point scale can be found on the COSMIN website [21].
2.3.2 Quality of psychometric properties. The quality criteria for psychometric properties proposed by Terwee and colleagues [20] provide a condensed evaluation of measures' psychometric properties and have been used in previous systematic reviews [22]. The Terwee criteria apply to the following properties: content validity, internal consistency, criterion validity, construct validity, reproducibility (agreement and reliability), responsiveness, floor and ceiling effects and interpretability. All properties are represented by one item that can be rated as positive (+), intermediate (?), negative (-) or no information available (0). We rated psychometric properties for each study separately, as they report on different study populations and results differ. For the exact definitions of psychometric properties and scoring criteria see the original publication [20].

Literature search and study selection
The electronic database search identified 5090 records. We found an additional number of 29 records through the secondary search. After removal of duplicates, the total search comprised 3284 records. We excluded 3115 records based on title-and abstract screening. Of the remaining 169 full texts, 155 full texts were excluded by applying exclusion criteria (see Table 1). The majority of full texts were excluded because the aim of the study was not to test psychometric properties of a scale on trust in physician. We included 14 studies in this review. The process of study selection is shown in Figure 1. We excluded some known measures of trust in physician such as the Kao scale [23] and the Safran scale [10]. They were excluded either because psychometric testing was not reported in peer-reviewed journal articles [23,24] or trust in physician measures were subscales of instruments assessing a broader construct [10,[25][26][27][28].

Description of included studies and measures
Most of the studies were conducted in the USA and used English measures. Sample sizes range from 25 to 1199 partici- pants. The majority of study samples included patients which were recruited in very heterogeneous settings. Most studies were based on outpatient samples [1,11,12,[29][30][31][32][33][34] with a variety of health issues. Included studies reported on psychometric properties of the following seven measures of trust in physician: the Trust in Physician Scale (TiPS), the Trust Scale for the Patient-Physician Dyad (TSPPD), the Wake Forest Physician Trust Scale (WFPTS) and a short form of the WFPTS, the Abbreviated Wake Forest Physician Trust Scale (A-WFPTS), the Health Care Relationship Trust Scale (HCRTS) and the further developed Health Care Relationship Trust Scale Revised (HCRTS-R), and the Trust in Oncologist Scale (TiOS). The TiOS, which was developed on the basis of the WFPTS, is the only population-specific measure and assesses cancer patients' trust in their oncologists [35]. All measures are unidimensional and use a 5-point Likert response scale, except for the TSPPD. The TSPPD comprises two dimensions of benevolence and technical competence and can be rated on a 7-point Likert scale [33]. Descriptive data of included studies and identified measures are presented in Table 2.

Quality of design, methods and reporting
Assessment of the quality of design, methods and reporting of psychometric property studies with the COSMIN checklist are shown in Table 3. All included studies reported on internal consistency (Box A) and COSMIN rating could be applied. Studies on the TiPs received three poor [29,34,36], one fair [37] and one good [32] score for internal consistency. The study on the TSPPD [33] received a poor score. The WFPTS shows mixed results with one good study rating [1] and two fair ratings [11,38] for internal consistency. The internal consistency scores for studies on A-WFPTS [12], HCRTS [30] and HCRTS-R [31] were good. Studies on the TiOS received one good [35] and one fair [39] rating for internal consistency. Few studies assessed reliability (Box B) and rating could be applied to five studies. Scores were either fair or poor. Studies reporting on the reliability of the TiPS [34] and the TiOS [35] were rated as fair. Studies assessing reliability of the WFPTS [11,38] and the HCRTS [30] received poor scores. None of the studies reported on the psychometric property measurement error (Box C). Ratings for content validity (Box D) were made for studies reporting on the initial development of measures. Scores were good for the TiPS [29], WFPTS [11], HCRTS [30] and TiOS [35], but the study on the TSPPD [33] received a poor score for content validity. Structural validity (Box E) was assessed by most studies and the major part scored fair or good. Structural validity assessments of the TiPS [32,37] were rated as fair, whereas the study on the TSPPD [33] scored poorly. Results for studies on the WFPTS and TiOS were mixed for structural validity. Studies on the WFPTS scored good [1] and fair [11,38]. Reports on the structural validity of the TiOS were rated as good [35] and fair [39]. Structural validity ratings were good for studies reporting on the A-WFPTS [12], HCRTS [30] and HCRTS-R [31]. Hypotheses testing rating (Box F) applied to all studies. Results were either fair or poor. One study on the TiPS [32] and WFPTS [1] each, as well as the studies reporting on the A-WFPTS [12] and HCRTS-R [31] scored fair. Cross-cultural validity (Box G) was assessed by four studies. Rating applied to studies on the TiPS [36,37], WFPTS [1] and TiOS [39]. All studies received poor ratings for cross-cultural validity. The measurement properties criterion validity (Box H) and responsiveness (Box I) were not assessed by any of the studies. Detailed results for COSMIN ratings on item level are shown in Appendix S2.

Quality of psychometric properties
Quality ratings of measures' psychometric properties assessed with the Terwee criteria are presented in Table 4. Studies reporting on the initial development of measures [11,29,30,35] received positive scores for content validity, except for the study reporting on the development of the TSPPD [33]. Scores for internal consistency were all positive for studies on the WFPTS [1,11,38], the A-WFPTS [12], and the TiOS [35,39]. Studies on the TiPS received positive [32,37] and intermediate [29,34,36] scores. The TSPPD [33] and the HCRTS [30] scored intermediately. The HCRTS-R [31] received the only negative score for internal consistency. Criterion validity was not assessed by any of the studies. Construct validity was mainly rated as intermediate [12,30,31,33]. The TiPS received one positive [37] and three intermediate ratings [29,32,34]. Similarly, the WFPTS scored intermediately twice [11,38] and positive once [1]. Construct validity scores of the TiOS were mixed with a positive [35] and negative [39] rating each. Few studies provided data on the measurement property reproducibility. The reproducibility aspect agreement was not assessed by any of the studies, whereas some studies present data on the reproducibility aspect reliability. The single study that assessed reliability for the TiPS [34] scored positively. Reliability of the WFPTS [11,38], HCRTS [30] and TiOS [35] was rated as intermediate.  [1,11,38], A-WFPTS [12], HCRTS-R [31] and TiOS [35,39].

Discussion
This systematic review included fourteen studies on seven measures of trust in physician. Most studies were conducted in the USA and reported on psychometric properties of the TiPS or the WFPTS and its abbreviated version. Samples varied enormously in size and participants' characteristics. Quality assessments with the COSMIN checklist and the Terwee criteria revealed a heterogeneous picture of the methodological quality of included studies and the quality of psychometric properties of identified measures.
Regarding the results of the COSMIN rating for the design, methods and reporting of psychometric studies, several research gaps became apparent. With a total of five different studies [29,32,34,36,37], the TiPS is the measure which has been most extensively tested. However, the majority of studies on the TiPS were rated poor for internal consistency [29,34,36]. Only two of the studies on the TiPS assessed structural validity [32,37], and the quality of these assessments was rated as fair. COSMIN results for all psychometric studies reveal that only a selection of psychometric properties was reported and ratings were mainly fair or poor. Internal consistency and hypotheses testing were addressed in all of the studies, but quality ratings with the COSMIN checklist revealed serious flaws in more than 70% of the studies' reports on this psychometric property [11,29,30,[33][34][35][36][37][38][39]. Few studies assessed reliability [11,30,34,35,38] or cross-cultural validity [1,36,37,39], and the quality of these assessments was rated as poor, except for two studies with fair reporting [34,35]. The psychometric properties measurement error, criterion validity and responsive-ness were not addressed in any of the studies. Looking at the COSMIN ratings per study, two studies received poor scores for  Table 3. Quality of design, methods and reporting of studies on psychometric properties.   all reported psychometric properties. These studies are the measure development study of the TSPPD [33] and a crosscultural validation study of the TiPS [36]. The measure development study of the TiOS [35] had the best quality regarding the design, methods and reporting of psychometric property assessment, closely followed by the study on the HCRTS [30]. Remarkably, none of the studies scored excellent on any psychometric property in the COSMIN evaluation. Looking at the results of COSMIN items (see Appendix B), studies scored excellent in many respects. Yet, this is not reflected in COSMIN scores for psychometric properties. The ''worst score counts'' policy of COSMIN leads to a negatively biased view on the studies' design, methods and reporting. However, as all items represent aspects considered very important by the COSMIN Delphi panel, poor ratings for any of the items should be considered as serious flaws [19]. Overall, the results of this review show that the methodological quality of psychometric property studies on trust in physician is not satisfactory in many respects. However, the more recently published measure development studies [30,35] better met with the COSMIN criteria and had reasonably good results for most reported psychometric properties.
To give an overview of the quality of psychometric properties assessed with the Terwee criteria, we composed a table (see Table 4) with quality ratings presented for each study individually.
Overall, the quality of psychometric properties of trust in physician measures was intermediate. For some measures, psychometric properties were assessed in a variety of study populations and quality judgments per measure differ. For example, the TiPS had positive ratings for floor and ceiling effects in two studies of the English version [32,34], whereas floor and ceiling effects of the German version [37] were judged negatively. Content validity ratings were positive for all measure development studies [11,29,30,35], but for the development study of the TSPPD [33]. The use of a measure is only recommended, if content validity is adequate [20]. Looking at the quality judgments of measures per study, the TSPPD [33] had the worst quality. Consequently, the TSPPD would not be recommended to use without further psychometric evaluation. The measure development study of the TiOS [35] received the best quality ratings for psychometric properties.
However, our results concerning the quality of psychometric properties evaluated with the Terwee criteria need to be considered carefully. The assessment of the methodological quality of studies with the COSMIN checklist indicated that many studies lack quality of design, methods and reporting. Judgment on the quality of a measure can only be as good as the basis for evaluation [20]. In this review, the basis for evaluation is the studies' reports of psychometric property assessments and outcomes. Hence, some of the measures evaluated here, may have received worse quality judgments for psychometric properties due to flaws in the study's reporting. Viewing the quality of psychometric properties in the light of the studies' quality of design, methods and reporting, the TiOS is the measure with the best psychometric properties evaluated in the methodologically best study.
The results of this review can be used to assist researchers in choosing a measure optimal for their individual research purpose. However, it is important to note that a measure's psychometric properties need to be re-established for any new setting, sample or cultural context [40].
The present systematic review has several positive qualities: First, we used a complex and detailed search strategy in the electronic database search to retrieve all records relevant to our purpose. Second, two reviewers independently assessed records and full texts for possible inclusion in the study. Third, we performed two quality assessments by using both the COSMIN checklist with 4-point scale rating and the quality criteria for good psychometric properties developed by Terwee et al. [20]. This combination has been recommended to use for the separate evaluation of the methodological quality of studies and the quality of their results [17]. Judgment on the quality of studies provides the background for the interpretation of psychometric properties reported in the studies. Thus, a strength of this review is that it supplies both, a condensed evaluation of the quality of studies and of their results. This review has several limitations: First, our search was limited to studies published from 1979 onwards, limited to English and German, and we searched only three databases. As a consequence, we might have missed relevant publications. However, we carried out a thorough secondary search to limit this possibility to a minimum. Second, data extraction and quality evaluation of included studies was performed by one reviewer only. This may have led to a biased assessment of included studies and measures' psychometric properties. However, we performed a double assessment of two studies in the beginning of the quality assessments and discussed any ambiguities occurring in the process of quality assessments to reduce this bias. Furthermore, as every systematic review, our results are limited by our inclusion and exclusion criteria and we might have missed certain interesting scales, e.g. a paper on the Spanish version of the WFPTS that did not aim to test psychometric properties [41] and a paper on a measure that assesses trust in physicians in general [42].
In this review, we identified seven psychometrically evaluated measures of trust in physician. These measures cover a multitude of research needs, as they are mainly generic and include short as well as long scales validated in diverse study populations. Hence, the development of new measures does not seem necessary. However, the mixed results of the Terwee quality criteria for psychometric properties in different studies indicate that further psychometric evaluation is strongly recommended. The quality assessment of psychometric studies with the COSMIN checklist revealed several research gaps. Content areas like measurement error, criterion validity and responsiveness have been neglected in the studies to date and should be addressed in future psychometric studies. The results of the COSMIN checklist for hypotheses testing indicate serious flaws in the methodological quality of present evaluation studies. Hence, hypotheses testing should receive special attention in future psychometric evaluation studies. Cross-cultural validity was addressed in only four studies [1,36,37,39] and the methodological quality of these studies was rated as poor. However, translations of measures are needed to support research on trust in physician worldwide. The applicability of translated measures should be assessed in cross-cultural validity studies for different languages and cultural contexts [43]. Moreover, investigation of psychometric properties should adhere to standards for assessing psychometric properties like the COSMIN checklist in order to contribute to the quality of future studies and facilitate the comparison of their results.
In conclusion, this systematic review identified several trust in physician measures and serious gaps in the psychometric property evaluation of some of these measures. Good quality measures are needed to assess trust in physician in empirical studies in the context of healthcare.