Validation of the Patient-Doctor-Relationship Questionnaire (PDRQ-9) in a Representative Cross-Sectional German Population Survey

The patient-doctor relationship (PDR) as perceived by the patient is an important concept in primary care and psychotherapy. The PDR Questionnaire (PDRQ-9) provides a brief measure of the therapeutic aspects of the PDR in primary care. We assessed the internal and external validity of the German version of the PDRQ-9 in a representative cross-sectional German population survey that included 2,275 persons aged≥14 years who reported consulting with a primary care physician (PCP). The acceptance of the German version of this questionnaire was good. Confirmatory factor analysis demonstrated that the PRDQ-9 was unidimensional. The internal reliability (Cronbach's α) of the total score was .95. The corrected item-total correlations were≥.94. The mean satisfaction index of persons with a probable depressive disorder was lower than that of persons without a probable depressive disorder, indicating good discriminative concurrent criterion validity. The correlation coefficient between satisfaction with PDR and satisfaction with pain therapy was r = .51 in 489 persons who reported chronic pain, indicating good convergent validity. Despite the limitation of low variance in the PDRQ-9 total scores, the results indicate that the German version of the PDRQ-9 is a brief questionnaire with good psychometric properties to assess German patients' perceived therapeutic alliance with PCPs in public health research.


Introduction
The patient-doctor relationship (PDR) is an important concept in health care. A good physician-patient relationship is associated with better treatment adherence, higher patient satisfaction, and a better prognosis [1][2][3][4]. Several aspects of the PDR have commonalties with the helping alliance in psychotherapy, i.e., high levels of trust, helpfulness, empathic understanding, and interpersonal openness [5]. Both the patient's and the physician's perspectives must be considered to understand the PDR [6]. Substantial efforts have been made to develop instruments to assess the PDR from the patient's point of view. A systematic review found 19 instruments that assess the PDR. These instruments assessed a variety of dimensions and used diverse conceptual models for the PDR [7]. The authors stated that in the primary care setting, a research instrument is preferably concise and easy to use. They suggested the use of the Patient-Doctor Relationship Questionnaire (PRDQ-9) as a brief (9 items) questionnaire with excellent overall internal consistency [7].
The Patient-Doctor Relationship Questionnaire (PRDQ-9) was originally developed in the Netherlands as a short assessment of the relationship between the primary care physician (PCP) and the patient from the patient's perspective [8]. It adapted an existing instrument from psychotherapeutic research on therapeutic alliance, the Helping Alliance Questionnaire (HAQ) [9], for use in primary care and public health research. The HAQ contains 11 items and served as the basis for item creation and selection in the PDRQ. In adapting the instrument to the needs of primary care, some strongly psychotherapeutic aspects (e.g., gaining new insight) were omitted or rephrased, and other aspects (e.g., 'My PCP has enough time for me', 'My PCP is dedicated to help me') were added. This procedure resulted in the first, 15-item version of the PDRQ. The psychometric properties of the PDRQ were initially tested in a rather small sample of 110 general practice patients and 55 patients in an epilepsy clinic [8]. In this validation study, a principal component factor analysis with varimax rotation of the 15 items resulted in 2 factors. The first factor focused on the empathic style and availability of the doctor and accounted for 58% of the total variance explained. The second factor focused on the medical symptoms of the patients and accounted for 9% of the total variance explained. The internal consistency of the first factor was high and that of the second was moderate. With the aim of clearly assessing the patient-doctor relationship with a focus on the empathic style and availability of the doctor, the second factor was eliminated. This resulted in the final, unidimensional 9-item version of the PRDQ-9, with all 9 items loading onto 1 common factor [8]. A mean satisfaction index of all 9 items can be calculated [8]. Validation studies of a Spanish version comprised 188 patients of 6 internal medicine physicians of a university hospital [10] and 405 patients of 6 primary health care centers [11]. A validation study of a Turkish version was performed with 405 patients of a family medicine outpatient center [12].
To date, the psychometric properties of the PDRQ-9 have not been tested in a larger sample of the general population within the setting of public health research. Furthermore, a version for German-speaking patients has not yet been validated. Therefore, the aim of the present study was to test the internal and external validity of the German version of the PDRQ-9 in a representative general population sample.

Ethics statement
All participants were informed of the study procedures, data collection and anonymization of all personal data. Furthermore, a detailed data privacy statement was delivered by the study assistant. The present study posed a low risk to the participants, as procedures such as medical treatments, invasive diagnostics or procedures causing psychological, spiritual or social harm were not included in the present study. Therefore, according to the German law, all participants provided verbal informed consent, which was noted by the trained interviewer before starting with the survey. The additional informed consent of a parent was not required for participants aged 14 or older. The study and procedure, including the consent procedure, were approved by the institutional ethics review board of the University of Leipzig (Az 092-12-05032012). Furthermore, the study adhered to the guidelines of the ICC/ ESOMAR International Code of Marketing and Social Research Practice.

Linguistic adaptation
The PDRQ-9 was first developed in Dutch. As performed in the Spanish [11] and Turkish [12] validation studies, the PDRQ-9 was adapted to German by translating it from its primarily published [8] and used English version. The adaptation to German was performed according to the state-of-the-art procedure of forward-backward translation [13] by 2 medical doctors and 1 English-German bilingual translator. Two forward translations into German were independently completed by 2 medical doctors, both of whom are native speakers of the German language and are fluent in English. The 2 German versions were compared, and an updated German forward version was compiled. This version was translated back into English by a professional translator (a native speaker of English who is fluent in German) with experience in medical translation. This translator had not been involved in the forward translation. The primarily published version and the back-translated version -both in English -were compared by the 2 medical doctors and the expert translator. Thus, an optimized German version was generated. Additionally, this optimized German version was compared with the original Dutch instrument by the German-speaking first author of the PDRQ-9 (van der Feltz-Cornelis), whose native language is Dutch. In a final reconciliation process, the final German version (PDRQ-9 German, see Appendix S1) was generated and approved by all parties. All comparisons between the different versions were conducted item-by-item on 2 dimensions: similarity of language (literal translation) and comparability of interpretation (cultural adaptation). Discrepancies and discussions mainly regarded 2 items. For item 6, the consensus was to translate ''nature'' as ''Wesen'' (rather than ''Natur''). For item 9, the consensus was to translate ''easy accessible'' as ''leicht zu erreichen'' to emphasize organizational rather than emotional accessibility. The measure was not pilot tested before being employed in the full study, as such testing is not a typical step in forward-backward translation.

Design and participants
The current study was part of the 2013 annual representative general population survey that was conducted by the University of Leipzig. This survey assessed political and religious attitudes as well as health topics.
A representative sample of the German population was selected with the assistance of a demographic consulting company (USUMA, Berlin, Germany). The random selection was based on multistage sampling. First, 258 sample point regions, covering rural and urban areas from all regions in Germany, were randomly drawn from the most recent political election register. The second stage was a random selection of households using the random route procedure (based on a starting address). The third stage was a random selection of household respondents using the Kish selection grid. The aim of the sampling procedure was to obtain a sample that was representative of the German population in terms of age, gender, and education. The inclusion criteria for the study were age$14 years and the ability to read and understand the German language.
All subjects were visited by a trained study assistant and informed about the investigation. The subjects were provided with self-rating questionnaires. The survey included several questionnaires on somatic and psychological features (health survey) as well as questionnaires on eating behavior, political attitudes and media use. The survey also asked the participants whether they had a PCP. In the case of a positive response to this question, the person was asked to complete the PDRQ-9. The assistant was available while the participants answered all of the questionnaires and offered help if persons did not understand the meaning of any question. Regarding the questionnaires used in the current study, the trained assistants did not report any systematic misunderstanding of the items.

Validation methods and hypotheses
The methods used to validate the PDRQ-9 German were as follows: a) Acceptance was assessed according to the proportion of missing or invalid items.
b) Data quality was assessed using the mean, median and extent of ceiling and floor effects. Floor and ceiling effects between 1% and 15% were defined as optimal [14]. c) Reliability was assessed as internal consistency (Cronbach's a), which measures the overall correlation between items within a scale. A level of .7 and higher is considered desirable [15].
d) Factorial structure was tested using confirmatory factor analysis (CFA). e) Convergent validity was determined by comparing the mean satisfaction index of the PDRQ-9 with the treatment satisfaction ratings of persons in the general population with chronic pain [16]. We expected a positive correlation between these 2 satisfaction indices. The convergent validity is considered fulfilled if the scale scores for related concepts show acceptable correlation (Spearman rank correlation coefficient..4) [15]. f) Discriminative concurrent criterion validity was tested by comparing the PDRQ-9 total score of persons in the general population with a probable depressive disorder (PHQ-2$3) to persons without a probable depressive disorder. We predicted that participants with a probable depressive disorder would report a lower mean satisfaction index than persons without a probable depressive disorder [17]. This hypothesis was based on the cognitive theory of depression. The cognitive triad of depression is characterized by dysfunctional negative views of oneself, one's life experience (and the world in general), and one's future [18]. We assumed that this negative view would also apply to the PDR. g) Potential associations with socioeconomic variables (age, gender, education, and household income) were tested using multiple linear regression analysis.

Validation instruments
5.1 Demographic questionnaire. Age, gender, partnership status, educational level, employment status, and net family income per month were assessed via a standardized questionnaire that was previously used in German health surveys [19].
5.2 Chronic pain questionnaire. Individuals with chronic non-cancer pain were identified by screening questions based on the International Association of the Study of Pain (IASP) definition of chronic pain [16], as follows: ''Did you have constant or frequently recurring pain during the last 3 months?'' In the case of self-reported current treatment of chronic pain, participants were asked to report their satisfaction with pain treatment (1 = very unsatisfied, 2 = unsatisfied, 3 = satisfied, 4 = very satisfied).
5.3 Depression screening questionnaire. The 2-item Patient Health Questionnaire-2 (PHQ-2) scores 2 DSM-IV criteria of major depression on a scale from ''0'' (not at all) to ''3'' (nearly every day) [20]. A score$3 on this depression scale represents a reasonable cut-off for identifying potential cases of major depression or other depressive disorders. A score$3 has a sensitivity of 82.9% and a specificity of 90% for the diagnosis of major depression and a sensitivity of 62.3% and a specificity of 94% for the diagnosis of any depressive disorder. We used the validated German version of the PHQ-2 [21].

Statistical analyses
We prespecified that up to 2 missing items on an individual's PRDQ-9 would be replaced by the rounded mean of the answered items. If more than 2 items of the scale remained unanswered, the respective person was excluded from further analyses. In addition, descriptive statistics were performed to determine whether a specific item on the German version had many missing values because this might indicate insufficient understanding of the translation of that item.
Because Cronbach's a represents a lower bound estimate of reliability, a composite reliability (CR) score and the average variance extracted (AVE), according to Fornell and Larcker [22], were also calculated.
The factorial structure was tested using CFA, which was computed with the statistical program AMOS 20 (IBM SPSS Inc., Chicago, IL, 2011). The model was tested using covariance matrices and estimated with the maximum likelihood approach. CFA was calculated for the one-factor model. The following model fit indices were used: the minimum discrepancy divided by its degrees of freedom (CMIN/DF); the goodness-of-fit index (GFI); the normed fit index (NFI); the comparative fit index (CFI); the Tucker-Lewis Index (TLI); the standardized root mean square residual (SRMR); and the root mean square error of approximation (RMSEA). For a good model fit, the CMIN/DF ratio should be as small as possible [23,24] and the CFI should range between .97 and 1 [24]. Furthermore, GFI, NFI and TLI values that are near .95 or higher are indicative of a good model fit [24,25]. An SRMR value that is smaller than .05 [23,24] and an RMSEA value that is .08 or smaller indicate an adequate fit [24]. Additional analyses were conducted to test the invariance of the model across gender and different age groups using multi-group CFA. Age groups were defined based on age decades and substantial subsample sizes to conduct the analyses. Therefore, participants in the age range between 14 and 30 years were categorized into the same age group. Measurement invariance was tested in 4 steps using the configural model (no constraints), followed by a metric invariant model (with item loadings constrained to be equal across groups), a scalar invariant model (with item loadings and item intercepts simultaneously constrained to be equal across groups), and a model of strict factorial invariance (with error variances constrained to be equal across groups in addition to the conditions mentioned above) [26]. Following the hierarchy of these nested and increasingly restrictive models, they were compared to each other based on the DCFI and DRMSEA, as the x 2 statistic has often been criticized for its sensitivity to the sample size. Values that are smaller than .01 indicate the invariance of the models [27]. These invariance tests are mandatory in a statistical manner to allow further tests of mean differences between the defined sub-groups [26].
The remaining statistical analyses were conducted using IBM SPSS version 20. Group comparisons were performed by ANOVAs and ANCOVAs. The ANCOVA effect sizes were expressed as partial g 2 , which was interpreted as a small effect size when$.01, a medium effect size when$.06 and a large effect size when$.13. Partial g 2 describes the proportion of total variation that is attributable to the factor, excluding other factors from the nonerror variation [28]. The data are available upon request.

Sample recruitment and response rate
Data were collected between May and June 2013. A first attempt was made at 4,360 addresses, and 2,508 (57.5%) persons participated in this self-report survey. The inclusion and exclusion of participants for the final analyses are shown in the flow chart ( Figure 1). Overall, 2,275 (52.2%) persons were included in the final analyses.

Sample characteristics
The demographic characteristics of the study population are presented in Table 1. The study sample displayed age groups, sex ratio and educational levels that were comparable to those of the general German population, as assessed by the German population census in 2011 [29].

Validity
3.1 Acceptance. The acceptance was high. Only 23 (1.0%) single items were not answered, none of the participants had more than 1 missing item, and there were no items that were predominantly missing.
3.2 Data quality. The means and standard deviations of all items are shown in Table 2. Additionally, supplemental materials on the item score frequency (Table S1) and frequency distribution of the PDRQ-9 total scores (Table S2) are provided.
The mean satisfaction index was 4.12 (SD = .70) (on a scale of 1 (the worst) to 5 (the best satisfaction possible)), with a median of 4.78 (interquartile range 4.00-5.00). Four of every 10 subjects expressed the maximum possible satisfaction (''ceiling effect''). This result is underlined by the skewness of the items (Table 2). Negative values showed a clear left skewed distribution, indicating that most of the values were concentrated on the right of the mean.

Internal reliability.
The corrected item-total correlation coefficients indicated that all items accounted for a substantial amount of the variance of the total scale and did not differ from each other. Furthermore, the internal consistency was high (Cronbach's a = .95). In total, the explained variance was 73.4% and the CR was .96, indicating good internal consistency of the PDRQ-9 German.
3.4 Factorial structure of the PDRQ-9 German. All items of the PDRQ-9 German were positively correlated, and the correlation coefficients were of a substantial amount ( Table 3).
Only the CMIN/DF indicated a relevant deviation between the data and the model, as a value close to 3 or smaller represents appropriate models. This coefficient is sensitive to the sample size. Thus, in line with Joereskog and Soerbom (1993), we focused on the model fit indices described above (GFI, NFI, CFI, TLI, SRMR, RMSEA), which are generally independent of the sample size.
The standardized regression coefficients of the latent variable ''satisfaction with the patient-doctor relationship'' varied between .72 and .88 (Table 3), indicating substantial relationships between the latent variable and each of the 9 items of the PDRQ-9.
Furthermore, the model was tested for invariance across gender and age. As shown in Table 4, the multi-group analyses revealed the invariance across gender and age, as the differences in CFI and RMSEA between the hierarchical nested models were,.01. The x 2 test was significant for several invariance tests between different sub-groups. As mentioned above, this test is sensitive to sample size. Thus, the other fit indices were used to confirm the scalar invariance across gender and age.
3.5 Convergent validity. The Spearman rank correlation between the mean satisfaction index and the satisfaction with pain treatment of 489 participants who reported chronic pain and pain treatment was r = .51. This result demonstrates acceptable convergent validity for this subsample.

Discriminative concurrent criterion validity.
In an ANOVA that adjusted for age, the mean satisfaction index of participants with a potential depressive disorder (N = 218) was 3.66 (SD = .86), and that of participants without a potential depressive disorder (N = 2,030) was 4.12 (SD = .66) (F = 65.8, p,.001). Potential depressive disorder primarily accounted for a group difference in mean satisfaction index (F = 119, p,.0001), with a small effect size (Partial g 2 = .05). The partial g 2 of age was .007 (F = 7.1, p,.001). This result demonstrates acceptable discriminative concurrent criterion validity.

Associations of the PDRQ-9 German total score and socioeconomic variables
To examine the influence of socioeconomic variables on PDRQ-9 German total scores, a simultaneous multiple linear regression analysis was conducted, with age (as a continuous variable), gender, education, and household income (variables coded according to the groups presented in Table 1) as predictors. The results are presented in Table 5. The only significant predictors were age and income, with a higher satisfaction index among older patients and those with higher household income. However, the amount of explained variance due to these variables was small (1.2%).

Discussion
Summary of the main findings: We examined the internal and external validity of the PDRQ-9 German in a representative crosssectional German population survey. We focused on participants who reported that they consulted with a PCP. The internal and external validity of the PDRQ-9 German were good.
Acceptance: The acceptance of the PDRQ-9 German was good, as only a few items were missing in the total sample. The acceptance rate of 99% is similar to those that were found with similar questionnaires in previous population surveys (e.g., 99.3%) [19].
Data quality: Similar to the Dutch [8] and Spanish studies [10,11], ceiling effects were detected in the German PDRQ-9. The   ability of the PDRQ-9 to discriminate within the upper region of satisfaction with PDR is insufficient [8,11]. However, ceiling effects are inherent in all instruments that measure satisfaction with PDR [7]. Nevertheless, this problem should be noted. Furthermore, the results must be interpreted with caution, as the results of the CFA, multigroup analyses and correlation coefficients may be biased by the low variability in the PDRQ-9 scores found in the present study. When evaluating a questionnaire on the patient's perception of the helping attitude of his/her PCP, one should be aware that patients may provide a socially acceptable answer [8]. We attempted to eliminate this problem by assuring patients' anonymity and incorporating the PDRQ-9 into a survey without a specific focus. However, patients for whom a less positive doctor-patient relationship was expected (potential depressive disorder) showed significantly less satisfaction. This suggests that the PDRQ-9 might be able to discriminate between good and moderate doctor-patient relationships [8].
Reliability: The internal consistency of the PDRQ-9 German was high (a = .95), as it was in the Dutch (a = .94) [8], Spanish (a = .92 and .95) [10,11] and Turkish (a = .91) [12] validation studies. Further, the psychometric properties of the German PDRQ-9 were very good with regard to the average variance extracted. From a statistical perspective, the corrected item-total correlations were very high ($. 94). This raises the question of the usefulness of 9 different items and whether 1 item might be sufficient to measure the patient-doctor relationship. Conversely, Table 3. Standardized factor loadings and item correlation coefficients of the PDRQ-9 German.  the use of more than 1 item to measure a latent construct helps even out the measurement error of every single item. Additionally, the items address several related but distinct topics (for example, a trustful atmosphere, the helping attitude of the physician, and the time provided for consultations). Given that these are important aspects of the patient-doctor relationship, separate assessments are warranted.
Factorial structure: The current confirmatory analysis confirmed the factorial structure that van der Feltz-Cornelis et al. [8] and Mingote et al. [11] found using exploratory factor analysis. The PDRQ-9 German was shown to be unidimensional. The model fit indices showed that the assumption of a unidimensional scale fit the empirical data very well, with 1 exception. The CMIN/DF value indicated a relevant deviation between the empirical data and the model. This measure is sensitive to sample size. Thus, in the case of large sample sizes, even a small misspecification of the model can lead to its rejection. Therefore, we based our conclusion on the fit indices that are independent of the sample size, as described above (GFI, NFI, CFI, TLI, SRMR, and RMSEA). Additionally, the multigroup CFA revealed the strict factorial invariance of the model across men and women and for different age groups. Thus, the factor and observed mean scores as well as observed variances and covariances of these subgroups can be compared in a statistical manner [26].
Construct validity: We confirmed our hypotheses concerning the convergent and the discriminative concurrent criterion validity. There was a moderate correlation between the mean PDRQ-9 satisfaction index and the satisfaction with pain treatment in persons with chronic pain, indicating convergent validity in a subsample of participants with chronic pain. The Turkish study found a moderate correlation of the PDRQ-9 Turkish total score with a generic instrument of patient satisfaction [12]. In testing the ability of the PDRQ-9 to discern difference, the Dutch study revealed higher total scores in primary care patients compared to patients from an Epilepsy clinic [8]. The current finding of minor satisfaction with PDR in depressed compared to non-depressed persons is in line with the results of the Heart and Soul study. Specifically, in outpatients with chronic coronary heart disease, depressive symptoms were associated with perceived deficits in doctor-patient communication, whereas medical comorbidities and disease severity were not associated with such deficits [17].
Associations of the PDRQ-9 and socioeconomic variables: The PDRQ-9 total scores slightly increased with rising age and household income. We found no gender differences. Similar to the present study, the validation study of the Spanish version did not find gender differences and detected a higher mean satisfaction index of elder people (aged.65 years) [10]. We speculate that seniors' greater satisfaction with PDR might depend on a more traditional role concept and/or a greater need for PCP consultation due to increasing morbidity. Additionally, we assume that participants with a higher income are more likely to be insured by private health insurance companies and, thus, may receive more attention (time, examinations) from their PCP. However, the impacts of age and income on the satisfaction index were very small.
Limitations: Although the response rate (57.5%) was comparable to those of other German health surveys [19], 42.5% of the persons who were addressed were non-responders. We do not have data to determine whether there were relevant differences between the participants of the survey and those who refused to participate. The data protection laws in Germany do not allow the assessment of the demographic data of non-responders. Additionally, our conclusions in regard to the convergent validity of the PDRQ-9 are based on a special subsample (people with chronic pain). Further empirical evidence is needed to support this assumption and to generalize the results of the present study. Another limitation is the lack of an assessment of discriminant validity, which was not addressed in the present study. Furthermore, we did not control the PDRQ-9 German using a social desirability questionnaire. Therefore, it remains possible that patients were biased toward a positive judgment in the assessment of their PCP.
Conclusions: Despite the limitation of the low variability in the PDRQ-9 scores, the German version of the PDRQ-9 is a brief and useful measure of the doctor-patient relationship from the patient's perspective. It has good psychometric properties and can be used for research in primary care, public health research and population surveys.