Development of a universal short patient satisfaction questionnaire on the basis of SERVQUAL: Psychometric analyses with data of diabetes and stroke patients from six different European countries

Objective A short questionnaire which can be applied for assessing patient satisfaction in different contexts and different countries is to be developed. Methods Six items addressing tangibles, reliability, responsiveness, assurance, empathy, and communication were analysed. The first five items stem from SERVQUAL (SERVice QUALity), the last stems from the discussion about SERVQUAL. The analyses were performed with data from 12 surveys conducted in six different countries (England, Finland, Germany, Greece, the Netherlands, Spain) covering two different conditions (type 2 diabetes, stroke). Sample sizes for included participants are 247 in England, 160 in Finland, 231 in Germany, 152 in Greece, 316 in the Netherlands and 96 in Spain for the diabetes surveys; and 101 in England, 139 in Finland, 107 in Germany, 58 in Greece, 185 in the Netherlands, and 92 in Spain for the stroke surveys. The items were tested by (1) bivariate correlations between the items and an item addressing ‘general satisfaction’, (2) multivariate regression analyses with ‘general satisfaction’ as criterion and the items as predictors, and (3) bivariate correlations between sum scores and ‘general satisfaction’. Results The correlations with ‘general satisfaction’ are 0.48 for tangibles, 0.56 for reliability, 0.58 for responsiveness, 0.47 for assurance, 0.53 for empathy, and 0.56 for communication. In the multivariate regression analysis, the regression coefficient for assurance is significantly negative while all other regression coefficients are significantly positive. In a multivariate regression analysis without the item ‘assurance’ all regression coefficients are positive. The correlation between the sum score and ‘general satisfaction’ is 0.608 for all six items and 0.618 for the finally remaining five items. The country specific results are similar. Conclusions The five items which remain after removing ‘assurance’, i.e. the SERVQUAL-MOD-5, constitute a short patient satisfaction index which can usefully be applied for different medical conditions and in different countries.


Introduction
The first outcome addressed by any health care is patients' health. However, in addition to this, patient satisfaction is a further important outcome as this can affect the extent to which the patients adhere to their health care and/or to the health care providers. Moreover, it also has a value in itself. Hence, there are good reasons to design health care in such a way that patients are satisfied. With regard to this purpose, adequate questionnaires for assessing patient satisfaction are required. Ideally, these questionnaires should be indices in the sense of Streiner [1]. This means the individual questionnaire items should address those characteristics of the health care which can be assumed to affect satisfaction; and a total value reflecting patient satisfaction should be formed by aggregating the values for the individual items. Such indices of patient satisfaction not only make possible to estimate the level of satisfaction; they also provide starting points for improving satisfaction. To be specific, those characteristics which are perceived as least sufficient are the first candidates for modification.
For many research purposes patient satisfaction questionnaires are needed which go beyond the sole property of being a satisfaction index. One of these properties is that the patient satisfaction questionnaire is as universal as possible, i.e. that it can be applied to all kinds of care and all kinds of care providers and in all cultural contexts. Such a universal satisfaction questionnaire would make it possible to investigate cultural differences in valuing different aspects of care and such a universal questionnaire would make possible comparisons between different kinds of cares and different kinds of care providers in different cultural contexts. This, in turn, would enhance the possibility of learning between different settings. A further property which is essential in many research contexts is that the questionnaire is short. This distinctly enhances patients' willingness to complete the questionnaire; especially when variables other than patient satisfaction are also being assessed.
There are numerous examples of questionnaires which constitute indices of patient satisfaction . These indices themselves are quite diverse. Some address satisfaction with a very specific kind of care such as neonatal intensive care [15] or psychiatric care for outpatients [22]. Other indices have a broader scope such as satisfaction with inpatient care in general [9,16,18,28,29]. However, some of the instruments with a broader scope are designed for a specific cultural context [14,17,28] and there are only a few attempts for providing universal produced five different components: 'Tangibles', 'Reliability', 'Responsiveness', 'Assurance' and 'Empathy' [30]. As the SERVQUAL items address possible causes of satisfaction and not its effects, the component structure is not implied by the construct measured, i.e. satisfaction, but by the characteristics of the services investigated. Correspondingly, the component structure cannot be seen as a characteristic of the measurement instrument and can, therefore, not be expected to be stable across different contexts [45][46][47]. However, those features which highly correlate for the services investigated in one study will presumably also correlate highly for different services. Hence, those SERVQUAL items which best reflect a component structure which has already been found are also likely to reflect the component structures in different contexts quite well. Accordingly, for each of the five components found by the SERVQUAL developers that item with the highest loading on this component was selected for the basic item set investigated here. The final basic item set resulted by adding an item addressing 'carefulness of communication' (see Table 1).
The basic item set was first formulated in English and then translated into the other five study languages. Following the rules of cultural adaptation the translations were performed in four steps: (1) two professional interpreters who were native speakers of the target language translated the English original independently of each other into the target language; (2) a member of the study team in the respective country discussed differences between the two translations with both interpreters and constructed one single version which could be approved by both interpreters; (3) a professional interpreter with English as their native language translated the resulting version back into English; (4) a member of the study team in the respective country discussed possible difference between the back translation and the original version with the back interpreter and, in case of essential differences, modified the target language version so that the back interpreter thought that his or her back translation for the modified version would have been close enough to the original version.

Study settings and study participants
The basic item set was applied in two different surveys, one with type 2 diabetes patients and one with stroke patients.
The diabetes survey was performed for six different networks of providers of type 2 diabetes care, one for each study country. These networks were: the London Borough of Tower Hamlets in England; the region of Keski-Suomi in Finland; the city and rural district of Bamberg in Germany; the regional unit of Herakleion on the island of Crete in Greece; the region Nieuwe Waterweg Noord en Delft Westland Oostland in the Netherlands; and Valencia-La Fe Health Department in Spain. In England seven general physician practices associated with the Tower Hamlets Primary Care Trust were investigated; in Finland the health centers of eight municipalities within Keski-Suomi; in Germany the practices of one general physician and one diabetologist in the city of Bamberg, and of two general physicians and one diabetologist in the rural district of Bamberg; in Greece, five different institutions providing outpatient care for diabetes; in the Netherlands, five general practitioner health centres; and, in Spain, one primary healthcare area [48].
The stroke survey was performed similarly for six different networks of providers of stroke care, one for each study country. The core or each of these networks was a hospital with a stroke unit. The investigated hospitals were the Brighton and Sussex University Hospitals in England, Keski-Suomi Central Hospital in Finland, the neurological hospital at the University Medical Center of Erlangen in Germany, the General Hospital of Athens 'Alexandra' in Greece, TweeSteden Ziekenhuis and St. Elisabeth Ziekenhuis in Tilburg, which are now merged into ElisabethTweesteden Ziekenhuis, in the Netherlands, and Valencia-La Fe Health Department in Spain.

Tangibles
The diabetes-related services have up-to-date equipment.

Reliability
The diabetes-related services provide their service at the time they promise to do so.

Responsiveness
Personnel of the diabetes-related services react promptly to my requests.

Assurance
Personnel of the diabetes-related services are polite.

Empathy
Personnel of the diabetes-related services give me personal attention. Both surveys were performed with the assistance of the care providers investigated. These providers selected the patients to be approached for participation according to criteria defined by the researchers. Inclusion criteria for participants of the diabetes survey were 1) that they were being treated for type 2 diabetes by the health providers investigated in the project and 2) that they were at least 18 years old [48]. Inclusion criteria for participants of the stroke survey were 1) that they had been treated for stroke by the health providers investigated in the project in the year 2010 and 2) that they were at least 18 years old. The patients were contacted either by post or directly given the questionnaire when visiting their health care provider. The patients who participated in the survey completed their questionnaires on their own without any intervention by personnel from the service provider or research team. Depending on the most feasible method for the particular provider, the participants returned their completed questionnaires either by mail directly to the local project study centres, or to the care provider who then passed them on to the study centres. Data for the diabetes survey were collected between October 2011 and March 2012 [48], those for the stroke survey between September 2011 and February 2012.

Ethics statement
The English diabetes survey was approved by the NHS National Research Ethics Service. The English stroke survey was performed as part of a service development exercise and therefore did not require ethics committee approval. The Finnish surveys were approved by the Ethics Committee of the Central Finland Health Care District. The German surveys were approved by the Ethics Committee of the Medical Faculty of the Friedrich-Alexander University in Erlangen-Nürnberg. The Greek diabetes survey was approved by the Scientific Committee of the hospital in Herakleion and the Greek stroke survey by the Ethics Committee of the hospital Alexandra. The Dutch diabetes survey was approved by the board of directors of the Primary Care Group ZEL and the stroke survey by the Ethics Committee of the St. Elisabeth Hospital in Tilburg. The Spanish surveys were approved by the Hospital La Fe Ethical Committee.
Permission for use of data was received from the NHS National Research Ethics Service (statistical data and access of patient records through the clinicians of the local diabetes research network), the Ethics Committee of the Central Finland Health Care District (statistical data at aggregate level), the Ethics Committee of the Medical Faculty of the Friedrich-Alexander University in Erlangen-Nürnberg (statistical data at aggregate level), the Scientific Committee of the hospital in Herakleion (statistical data and access to patient records), the Ethics Committee of the hospital Alexandra (statistical data and access to patient records), the

The survey questionnaires
Both survey questionnaires contained the basic item set. In the diabetes survey the items referred to the type 2 diabetes-related services (see Table 1), in the stroke survey to the hospital in which the patients had been treated. Accordingly, in the stroke surveys the items were formulated in the past tense whereas they were formulated in present tense in the diabetes surveys. In addition to the basic item set both questionnaires contained several further questions (most of which are not relevant for the analyses presented here). Those questions which are relevant, in both questionnaires, are those addressing age, gender, educational attainment, mastery of the language in which the questionnaire was formulated and the 'general satisfaction' with the entity which was referred to by the basic item set. Educational attainment was assessed by asking participants whether they had left school at the minimum school leaving age of their country. Those answering 'yes' were classified as having a lower level of educational attainment than those who answered 'no'. Mastery of the questionnaire language was assessed via two questions. In the English version of the questionnaire the first question was 'What is your first language?' and the categories 'English' and 'Other, please specify' were given as answer options. The second question was 'If English is not your first language, how well do you master it?' with the answer options 'Not at all', 'Poorly', 'Moderately', 'Well' and 'Perfectly'. In the other language versions the word 'English' was replaced with the word for the language in which the questionnaire was formulated [49]. 'General satisfaction' was assessed with one question. In the diabetes survey this question was: 'How satisfied are you with the supply of diabetes-related services you have experienced?'. In the stroke survey it was: 'How satisfied were you with the hospital in which you were treated because of your stroke?'. In both surveys a 7-categorical scale with the lowest category labelled by 'Extremely dissatisfied' and the highest category by 'Extremely satisfied' was provided for answering the question.

Statistical analyses
Not all study participants returning a questionnaire were included in the analyses. One exclusion criterion was that the questionnaire language was not the respondent's first language and that the respondent mastered the questionnaire only moderately or worse. A further exclusion criterion was that data for the basic item set or for the 'general satisfaction' question were missing.
As a prerequisite for the statistical analyses the six basic items and the 'general satisfaction' item were coded numerically with -3 for the lowest category and +3 for the highest category. The six basic items were then aggregated into a sum score. To get a general impression of the study participants, descriptive statistics for age, gender, educational attainment, the six basic items, the sum scores for the six basic items and the 'general satisfaction' item were computed. These descriptive statistics were mean, standard deviation, minimum and maximum for age, the six basic items, the sum scores and the 'general satisfaction' item; and relative frequencies for gender and educational attainment. The analyses were performed for all relevant partitions of the sample, i.e. separately for each combination of medical condition and country, for each medical condition with countries pooled, for each country with medical conditions pooled and for the total sample with countries and medical conditions pooled.
Differences with regard to age, the six basic items, the sum scores and the 'general satisfaction' item were tested using t-tests when medical conditions were compared and using analyses of variance when countries were compared. Differences with regard to gender and educational attainment were tested using Fisher's exact test when medical conditions were compared and chi-square tests for contingency tables when countries were compared. As the questionnaire items are bounded to both sides and as, therefore, violations of the normality assumption must be expected; differences with regard to the six basic items, the sum scores, and the 'general satisfaction item were also tested with distribution-free tests. These were the Mann-Whitney-U-test for comparisons between medical conditions and the Kruskal-Wallistest for comparisons between countries. By way of this 186 different significance tests were performed. However, this was only done in order to give an impression of the specific features of the study samples and not for substantiating any general statements about the six study countries or the two medical conditions. Therefore no control for multiple testing was performed.
The psychometric analyses performed here are strictly based on the idea that the items constitute an index, i.e. that the items describe causes and not effects of the variable to be measured. This implies that the correlational structure between the items is not determined by the variable to be measured. This, in turn, implies that this correlational structure must be expected to be different within different contexts and that, for this reason, neither this structure nor statistics based upon this structure can be interpreted as a feature of the measurement instrument [1,[45][46][47]. For this reason several analyses which have previously often been performed with patient questionnaires are not adequate. This includes analyses with models of item-response-theory, as for example the Rasch-model, and attempts to estimate the sum score's reliability using Cronbach's alpha. Accordingly, such analyses were not performed here.
However, although the correlations between the individual items are not primarily determined by the quantity to be measured, they reflect nevertheless important aspects of the contexts in which the surveys were performed. Therefore, the inter-item correlations were computed for all relevant partitions of the sample. Differences between the corresponding variance-covariance-matrices of different medical conditions or, respectively, different countries were tested. This was performed by comparing the variance-covariance-matrices determined under the assumption that the matrices are equal for the different countries or, respectively, medical conditions with the empirically found variance-covariance-matrices using the chisquare test provided by the statistic package AMOS in SPSS.
In addition to the statistical test, a descriptive measure for the similarity between the iteminter-correlation-matrices was also determined. This measure was particularly developed for the analyses presented here and will be referred to as the Normed Euclidean Distance Coefficient (NEDC) in the following text. This measure is with m the number of items, r ij1 the correlation between items i and j in matrix 1, and r ij2 the correlation between items i and j in matrix 2. Note that ð Euclidean distance between the upper right off-diagonal triangles of the two matrices, whereas (m(m−1)/2) 1/2 is the Euclidean distance between the upper right off-diagonal triangles of two matrices of the same size with one matrix only containing zero correlations and the other only correlations equal to one. In other words, the term subtracted from one is equal to the Euclidean distance between the two investigated matrices standardized with regard to a reference distance. This reference distance, in turn, is equal to the Euclidean distance between a matrix with only zero correlations in the off-diagonal cells and a matrix with only correlations equal to one. Correspondingly, the NEDC is equal to one when both matrices to be compared are equal; on the other hand, the NEDC is equal to zero when the Euclidean distance between the two matrices equals the reference distance. Matrices belonging to the two different medical conditions were directly compared using the NEDC. For matrices belonging to the six different study countries the means of the NEDCs determined over all 15 different pairs of countries were applied.
As a first step for testing the validity of the individual six basic items their correlations with 'general satisfaction' with the health care or, respectively, health care provider were computed. The 'general satisfaction' item addresses exactly that construct which is intended to be measured by the patient satisfaction index; however, it is presumed to be less reliable than the sum score because the sum score is based on several items. The correlations with 'general satisfaction' were computed for all relevant partitions of the sample.
As a second step for testing the validity of the individual items, cumulative logistic regression analyses with the items as predictors and 'general satisfaction' as the criterion with enforced equal distance between the categories were computed. Cumulative logistic regression rather than linear regression was applied because the basic assumptions of the linear regression model are necessarily violated when the criterion variable is bounded to both sides (as holds true for the 'general satisfaction' item). The regression analyses were performed separately for each combination of medical condition and country, for each medical condition with countries pooled, for each country with medical conditions pooled and for the total sample with countries and medical conditions pooled. Study participants with the same medical condition or from the same country might be more similar to each other than participants with different medical conditions or from different countries., For this reason, descriptive and inferential statistics might be distorted. To cope with this possibility, dummy variables for each combination of medical condition and country (except for one reference combination) were added when more than one combination was considered in the same analysis. Where an item was consistently shown to have a statistically significant negative contribution to the prediction of 'general satisfaction' then this item was removed from the item set. The multivariate analyses just described were then repeated with the remaining items.
For the final item set differences between regression coefficients from different countries or medical conditions were also tested. For this purpose, regression analyses with interaction terms between items and countries or respectively medical conditions were computed and compared with regression analyses without such interaction terms. A statistically significant decrease of deviance due to adding the interaction terms was interpreted as evidence for differences between the regression coefficients belonging to different countries or respectively different medical conditions. Moreover, to judge the extent to which the SERVQUAL-items predict general satisfaction, a specific kind of Nagelskerke's pseudo R-square was computed for each partition of the data. The specific characteristic of these R-squares was their basis model, i.e. the model with which the regression model is compared. Usually, the predictions of the regression model are only compared with the relative frequency of the criterion in the total sample. Instead, in the analyses presented here, the model including the SERVQUAL-items was compared a model without the SERVQUAL-items but with all further predictor variables included in the model with the SERVQUAL-items.
The validity of the sum scores of all items sets emerging in the process just described was also tested. This was performed via the correlations with the item addressing 'general satisfaction'. These correlations were computed for all relevant partitions of the sample.

Results
In the diabetes survey, 6245 questionnaires were distributed of which 1638 were returned and 1202 met the inclusion criteria (see Table 2). The proportion of excluded questionnaires was largest in England (48.0%) which was due to the fact that about 40% of all respondents in this sample were of Bangladeshi ethnicity who, due to lower levels of stated proficiency in the English language, did not meet the inclusion criteria for this analysis. Altogether, 19.2% of the questionnaires distributed in the diabetes survey were included in the final analyses with the inclusion proportions varying from 7.4% for England to 50.0% for Germany. In the stroke survey, 2369 questionnaires were distributed of which 826 were returned and 682 met the inclusion criteria (see Table 2). In the stroke survey nearly all respondents had sufficient proficiency in the questionnaire language so that only a very few respondents had to be excluded due to insufficient proficiency. Altogether, 28.8% of the questionnaires distributed in the stroke survey were included in the final analyses with the proportions of the inclusion proportions ranging from 23.2% for Finland to 46.0% for Greece. For both surveys together the proportion of finally included questionnaires in relation to the questionnaires distributed is 21.9% (see Table 2).
The respondents tended to be older with the age mean of the total sample being 66.6. The majority was male and higher educated (see Table 3). Educational attainment differs essentially between the countries both for the two medical conditions separately and for the total sample. There is also a statistically significant effect between the countries with regard to age within the two medical condition specific sub-samples but these effects level out in the total sample. The two medical condition specific sub-samples differ distinctly with regard to age with the members of the stroke sub-sample being older than those of the diabetes sub-sample (see Table 3). The average values for the six basic items, the corresponding sum score, and the 'general satisfaction' are all in the positive half of the measurement range (see Table 4). The two significance tests which have both been applied for testing the same differences, i.e. a test presupposing a normal distribution and a distribution-free test, mostly yield the same results.
Most of the differences between the countries and several of the differences between the medical conditions are statistically significant (see Table 4). All basic six items correlate positively with each other in all investigated partitions of the data set (see Table 5). With one exception, i.e. the correlation between tangibles and assurance Development of a universal short patient satisfaction questionnaire on the basis of SERVQUAL in the Greek stroke survey, the deviation from zero is statistically significant for all correlations. All investigated differences between variance-covariance-matrices belonging to the item-inter-correlation-matrices are statistically significant (see Table 5). In spite of these statistically significant differences, the NEDCs show much similarity between the item-inter-correlation-matrices. This similarity, however, is higher between matrices belonging to different medical conditions than between matrices belonging to different countries. In all partitions of data, all items correlate positively with 'general satisfaction'. With two exceptions, the deviations of these correlations from zero are statistically significant. The two exceptions are the correlations of 'general satisfaction' with tangibles and with assurance both in the stroke survey in Spain. In the total sample, the correlations are 0.48 for tangibles, 0.56 for reliability, 0.58 for responsiveness, 0.47 for assurance, 0.53 for empathy, and 0.56 for communication.   Table 2. b Differences between medical conditions: two-tailed t-tests for independent samples with unequal variances (two-tailed Mann-Whitney-U-test). c Differences between countries: analyses of variance (Kruskal-Wallis-test). https://doi.org/10.1371/journal.pone.0197924.t004 Development of a universal short patient satisfaction questionnaire on the basis of SERVQUAL In the regression analysis performed for the total sample with 'general satisfaction' as criterion and the six basic items as predictors the regression coefficients are 0.143 for tangibles, 0.183 for reliability, 0.319 for responsiveness, -0.209 for assurance, 0.208 for empathy, and 0.257 for communication. For all coefficients, the deviations from zero are statistically significant. This means that five of the six items actually contribute positively to the prediction of satisfaction, but one, i.e. assurance, contributes negatively. This effect also exists in both medical condition specific analyses with all countries pooled and in three of the six country specific analyses with medical conditions pooled. For the other three countries, there is no statistically significant effect, but a negative tendency for the assurance item. The assurance item also contributes negatively to the prediction of 'general satisfaction' in seven of the 12 regression analyses performed for the individual combinations of medical condition and country. In six of seven cases this contribution is statistically significant whereas there is no statistically significant effect for the five analyses in which assurance contributes positively to predicting 'general satisfaction'.
Following the results just described, the assurance item was removed from the item set and the regression analyses were repeated with the remaining five items. In the analysis for the total sample, the regression coefficients of all five items are positive and their deviation from zero is statistically significant (see Table 6). There are strong differences between the regression coefficients obtained for the different countries and slight differences between the coefficients obtained for the different medical conditions. With one exception, i.e. the differences associated with medical conditions in England, all differences are statistically significant (see Table 6). Eleven of the 60 regression coefficients computed for the individual combinations of medical condition and country are negative and, in three of these cases, the deviation from zero is statistically significant. However, the negative coefficients are distributed over four of the five items with communication being the exception (see Table 6). Hence, there seems to be no need for removing a further item.
In the total sample the correlation between the sum score of the six basic items and the 'general satisfaction' is 0.608. The correlations for the individual combinations of country and   Table 7). After removing the assurance item, the correlations for the sum scores for the remaining five items increase in all partitions of the data except for the diabetes survey in Spain and the stroke surveys in England and Germany. In the latter four cases, the decrease is very small. In the total sample the correlation between the sum of the five included items and 'general satisfaction' increases to 0.618 (see Table 7). Development of a universal short patient satisfaction questionnaire on the basis of SERVQUAL

Assets and limitations of the study
The study presented here has both certain assets and limitations. An important asset is that the study has been conducted with regard to the care for two different medical conditions and in six different countries. Such a study design provides evidence as to how the results differ between different contexts and, thereby, to which extent they can be generalised. Hitherto no study has been published in which a patient satisfaction questionnaire has been investigated with a comparable study design. Hence, the study presented here not only provides new information about the specific questionnaire investigated here but also new information about the generalisability of results pertaining to patient satisfaction questionnaires in general. One limitation of the study is that the investigated medical conditions and countries have not been selected at random from the universe of all medical conditions and countries. Hence, it is difficult to judge to which extent and in which way the results found here can be generalized. A further limitation of the study is that only 21.9% of the persons approached for participation could be included in the final analyses. Such a small exhaustion rate constitutes a high risk that percentages and means determined from these data deviate from those means and percentages which would have been obtained for the total sample. However, relationships between variables can often be expected to be similar for responders and non-responders. Hence, the low exhaustion rate will most probably not constitute a great danger for the validity of the analyses regarding the central research questions considered here.

Relationships between the SERVQUAL items
A major part of the analyses presented here addresses the relationships between SERVQUAL items. All six basic items correlate positively with each other in all investigated partitions of the data set (see Table 5). Considering that in an ideal index measurement instrument all items should be independent from each other [49], the correlational pattern found here is not desirable. One reason for the high positive inter-correlations might be that all health care providers will, if possible, try to affect all satisfaction relevant characteristics likewise. Hence, these characteristics usually correlate with each other because they are affected by common third variables. This effect will presumably always be present and, thereby, preclude achieving independence between the items. Perhaps, due to this effect, much less dependence than that found here will hardly be possible.
A second reason for the lacking independence of the items might be that, although the items describe possible causes of patient satisfaction, there can also be a causal effect from patient satisfaction on the responses to the items. There might be a so-called 'halo effect'. The most frequent expression of this effect is that persons with a general positive feeling towards a given object usually bias their judgments of specific characteristics of this object in a positive direction whereas persons with a general negative feeling towards this object do the opposite. This effect produces positive correlations. In index measurement, halo-effects are not welcome as they reduce the extent to which the responses to the items give information about the objective characteristics. Therefore, the items of patient satisfaction indices should be formulated so clearly that they can be answered without resorting to general impressions. This would reduce halo-effects, although it is unlikely to avoid them completely. For this reason, they should be taken into consideration when data are interpreted.
The correlations between the six basic items contain some evidence that the responses to the items are not only produced by halo-effects, but that they actually reflect the characteristics to be judged. Those items which address closely associated characteristics correlate more with each other than items which do not have such closely associated characteristics. For example, empathy and communication are two characteristics which usually are very closely associated. People who feel empathy towards their interaction partner will try to communicate as correctly as possible and, on the other hand, this type of communication presupposes a certain degree of empathy. This relationship corresponds very well to the correlational patterns. The correlation between empathy and communication is highest not only within the total sample but also within nine of the 12 combinations of medical condition and country (see Table 4). On the other hand, the way in which persons interact with each other is only determined by the physical environment to a moderate degree whereas the different aspects of the interaction mostly depend on each other. This also corresponds very well to the correlational patterns. The correlations of assurance, empathy and communication with tangibles are not only the lowest in the total sample; they all also belong to the five lowest correlations in 10 of the 12 combinations of medical condition and country.
The NEDCs reveal that the different item-inter-correlation-matrices are by and large very similar. This is in line with the different effects just discussed. On the other hand, the variancecovariance-matrices which belong to the item-inter-correlation-matrices all differ from each other with a very high level of statistical significance. This reflects that the items relate in a different way to each other in the different contexts. The NEDCs suggest that the differences between the health care given in different countries for the same medical condition are larger than the differences between the health care given for different medical conditions within the same countries. This holds true even when these medical conditions have such different characteristics as diabetes (a chronic medical condition requiring long-time care intervention), and stroke (a sudden traumatic event requiring a direct and fast reaction). This finding suggests that the constraints imposed by the country specific health care systems and health care cultures are stronger than the constraints imposed by the medical conditions to be cared for.
Altogether, the pattern of similarities found here suggests that item-inter-correlationmatrices for different medical conditions and/or in different countries with a Western health system culture will slightly differ from the matrices found here, but that there will be large similarities. These similarities will presumably be larger between different medical conditions in the same country than between the cares given in different countries for the same medical condition.

Relationships of the SERVQUAL-items with general satisfaction
A further key component of the analyses presented here addresses the relationships of the SERVQUAL-items with 'general satisfaction'. When 'general satisfaction' is regressed to all six basic items in a multivariate regression analysis five of these six items have a statistically significant positive regression coefficient whereas one item, i.e. assurance, has a statistically significant negative regression coefficient. The latter holds true although the bivariate correlation between assurance and 'general satisfaction' is positive. Presumably, this pattern of results is mainly an effect of the collinearity of the predictors. This collinearity causes so-called suppressor effects.
To investigate how the collinearity influences the pattern of regression coefficients in the multivariate regression analysis additional computations were performed. To be specific, instead of the assurance item, the items most closely correlated with it were removed in a stepwise fashion. In the order of their correlation with the item 'assurance' these were: 'empathy', 'communication', and 'responsiveness'. When the item 'empathy' is removed the regression coefficient for the item 'assurance' in the complete sample remains negative and the deviation from zero remains statistically significant, but the regression coefficient is much closer to zero than when all six items are included. When additionally the item 'communication' is removed, the regression coefficient for the item 'assurance' becomes slightly positive without deviating from zero in a statistically significant manner. When additionally the item 'responsiveness' is removed, the regression coefficient for the item 'assurance' is positive and the deviation from zero is statistically significant.
The results just reported suggest that the item 'assurance' has, at least, two components. One of these components is, by and large, the same as the core meaning of the items 'empathy', 'communication' and 'responsiveness'; the other component reflects whether the respondents overrate the different characteristics addressed by the different items in comparison with their judgments of 'general satisfaction'. The items 'empathy', 'communication' and 'responsiveness' seem to cover the first meaning component better than the item 'assurance' and therefore obtain positive regression coefficients in the regression analysis, whereas the item 'assurance' obtains a negative coefficient because mainly its second meaning component becomes effective. Altogether, these results suggest that the item 'assurance' should not be applied together with the other 5 item in a common index measurement instrument.
When 'general satisfaction' is regressed to those five items which remain when the item 'assurance' has been removed, all regression coefficients obtained in the total sample are positive and differ from zero in a statistically significant manner (see Table 6). This result suggests that no further items should be removed. The regression analyses performed with the five remaining items for the individual combinations of medical condition and country show that there are slight differences between the regression coefficients for the two different medical conditions and quite remarkable differences between the regression coefficients for the 6 different countries (see Table 6). This suggests that the individual characteristics of the health care or, respectively, the health care provider are valued differently by people with different medical conditions and, especially, from different countries. For example, tangibles seem to have a huge impact on the 'general satisfaction' of the Greek patients whereas this item only produces a suppressor effect for the Spanish patients. On the other hand, reliability only produces a suppressor effect in Greece, while it is the second strongest predictor of 'general satisfaction' of the Spanish patients.
In the total sample, the correlation between the sum score for the included five items and 'general satisfaction' is 0.618. The corresponding statistics for the individual combinations of country and medical condition range from 0.323 for the stroke survey in Spain to 0.790 for the stroke survey in Greece. To evaluate these results a comparison with results from those few studies is helpful for which the correlation between a sum score and 'general satisfaction' was reported [4,17,26]. Albashayreh et al. [2] found a correlation of 0.72 with perception of nursing care quality and of 0.82 with the overall quality of care in the hospital using a sum score based on 17 items, Cimas et al. [4] found a correlation of 0.70 with a sum score based on 10 items, Milutinovic et al. [17] found a correlation of 0.75 with a sum score based on 19 items, and Tso et al. [26] found a correlation of 0.85 with a sum score based on nine items.
All correlations just reported are higher than the correlation found for the total sample in the study presented here. However, in all these cases the sum score is based on more than five items. Accordingly, in all these cases more relevant characteristics could have been addressed by the sum score. Hence, taking the results from these studies as a bench mark for the results obtained with the five-item sum score presented here may be regarded as slightly unfair. In any case, the correlation found for this five-item sum score suggests that this score already covers essential determinants of satisfaction, whereas the comparison with the results from the literature suggests that there might still be further determinants which are not addressed by this score.

Conclusion
All in all the empirical evidence presented here suggests that the item set which results when the item 'assurance' is removed constitutes a quite acceptable universal short patient satisfaction questionnaire. With its five items, it is definitively very short and, in spite of its shortness, it possesses quite an acceptable validity. The latter not only holds for the total sample but also, more or less, for the different country specific samples (with perhaps not such convincing results for the Spain case studies). However, the results for the other five investigated countries justify considering the index based upon the selected five items as universal. However, the fact that the regression coefficients differ between the medical conditions and differ even stronger between the countries means that the sum score should, if possible, not be applied without additional analyses. As soon as the investigated sample is large enough, regression analyses with an item addressing general satisfaction should also be performed. Moreover, the means and standard deviations of the individual items should also be considered. All this information will give more detailed suggestions as to which components of the care should be changed in order to improve satisfaction.
There might, of course, be a better five-item set than that identified here. This would be an item-set for which the corresponding sum score correlates more with general satisfaction for all medical conditions and in all the countries and perhaps an item set for which the regression coefficients differ less between medical conditions and countries than in the study presented here. However, finding such an item set needs much further research. Until there is no fiveitem selection with a more valid sum score, the five-item selection found here could and should be used when only a very short instrument can be applied. This five-item selection should then be referred to as the SERVQUAL-MOD-5 with 'MOD' meaning 'modified' and five referring to the number of items.