Inequalities in health system responsiveness among asylum seekers and refugees: A population-based, cross-sectional study in Germany

Global migration has sparked renewed interest in Universal Health Coverage in high-income countries. However, quality of care has received little attention. This study uses the concept of responsiveness to study quality of care for asylum seekers and refugees (ASR) in Germany and identify inequalities among this group. We report results from a population-based, cross-sectional health monitoring survey in Germany’s third-largest federal state using random sampling methods. Established instruments were used to measure responsiveness, health status and socio-demographic factors. Data were weighted and adjusted logistic regression models applied to identify inequalities related to health status, structural and socio-demographic factors. N = 344 survey participants were included in the analysis (response rate 39.2%). Combined responsiveness was 77% (95%CI: 68%; 83%) but varied between domains. Responsiveness was poor for individuals with symptoms of anxiety (OR 0.35, 95%CI 0.13,0.99), longstanding illness (OR:0.42, 95%CI:0.17,1.06) and diminished health-related quality of life (OR:0.24, 95%CI:0.06,0.95). Individuals from Southern Asia (OR: 0.24, 95%CI: 0.07,0.86) and young participants (OR:0.31, 95%CI:0.12,0.82) also reported less responsive care. Unique patterns of explanatory factors were identified within each responsiveness domain. We found important differences in responsiveness related to health, socio-demographic and structural factors, both in combined responsiveness and in individual domains. Inequalities related to health status factors are particularly concerning given the potential implications for equity of access. Future research should explore responsiveness for different sectors, include individuals who have not utilised healthcare and allow for the adjustment of differential expectations of care between population groups.


Introduction
The "summer of migration" in 2015 and its aftermath saw large numbers of asylum seekers and refugees (ASR) being displaced across the Middle East, Europe and elsewhere, with an estimated 25.9 million ASR globally in 2018 [1]. Germany alone received over 1 Keeping this in mind, the objective of the current analysis is twofold. First, we aim to assess the overall level of responsiveness of health services for ASR in Germany. Secondly, we aim to assess inequalities in responsiveness with regard to sociodemographic, structural and health status factors.

Data collection
Data were collected in ASR reception centres and regional accommodation centres in Baden-Württemberg, Germany's third largest federal state, which received 17 055 first-time applications for asylum in 2021 [16]. Data were collected in 2018 using rigorously tested and translated questionnaires in nine languages. The questionnaire development process included pretesting, cognitive interviews, and professional translation processes, which has been extensively described elsewhere [15]. The questionnaires covered items relating to socio-demographic characteristics, the asylum process, health status, healthcare utilization and quality of care, including the short-form WHO Responsiveness Instrument [9]. The questions were adjusted slightly in format in response to a qualitative, cognitive pre-test carried out prior to data collection [17]. Responsiveness questions were specifically aimed at healthcare received in Germany, but did not specify whether this was out-patient or in-patient care. The domain "Social Support" was omitted from the responsiveness questionnaire due to this joint assessment of both out-patient and in-patient care, resulting in a responsiveness instrument with a total of seven questions (see Box 1). Box 1. Questions included in the RESPOND questionnaire to capture responsiveness-answer options to all questions were "very good", "good", "moderate", "bad", "very bad" and "cannot say" We are interested in hearing about your experience with healthcare services in Germany. We would like you to think about the last time you went to visit a doctor or another healthcare provider. How would you rate. . . In accommodation centres, random sampling was carried out on the basis of 1938 accommodation centres, aiming to include 1% of the ASR population resident in state accommodation centres. Sampling was balanced on the number of ASR per region and the size of the accommodation centre. All residents in the chosen 65 accommodation centres were approached for data collection. For the data collection in reception centres, six out of nine facilities were sampled purposively to select large centres in a variety of administrative districts. Based on administrative lists of occupied rooms in each centre, 25% of rooms were randomly selected and all residents of these rooms approached for data collection. The sampling approach for reception and accommodation centres has been extensively described elsewhere [15,18].
Data collection was carried out in a "door-to-door" approach, recruiting individuals personally with multi-lingual field teams [15,18].

2.2.Data management
For this analysis, we included individuals with at least one response to one of the seven responsiveness questions. We excluded all individuals who indicated they had never utilised any health services (GP, specialist, dentist or psychologist), as the measurement of responsiveness specifically relates to a previous healthcare encounter. We treated all "cannot say" or "don't know" answers as missing.
The 5-point answer scale of responsiveness was dichotomised for each dimension, using "good" or "very good" answers as an indicator for responsiveness [9]. A combined responsiveness scale was calculated by averaging responsiveness scores across domains and setting the cut-off for a "responsive" rating for values averaging above 2.5 (analogous to a dichotomization at the cut-off between "good" and "moderate").
Selection of independent variables (Table 1) was guided by the available literature on factors influencing responsiveness, including socio-demographic variables such as age, sex, educational status, social status and country of origin [19][20][21] and health variables relating to physical and psychological well-being [22]. Education was included as an ordinal variable ranging from 1 (least educated) to 3 (most educated) based on educational attainment in school and on professional qualification (S1 Table). Measurement of subjective social status in Germany was based on an adapted version of the MacArthur Scale [23] and categorized to represent high, medium and low subjective social status [24]. To avoid issues with empty cells, data on nationality was grouped into regions based on the UN Geoscheme [25]. The three regions with the most participants (West Africa, South Asia and West Asia) where then included for analysis as binary variables. Age was included as a linear variable if it was treated as a covariate, and as a binary variable (30 years or younger/ 31 years or older) if it was treated as an independent variable in order to facilitate comparisons across factors. Health status variables covering general health status, health-related limitations, longstanding illnesses [26], Patient Health Questionnaire 2-item version (PHQ2) [27] and Generalised Anxiety Disorder 2-item version (GAD2) [28] were included as binary variables as reported in Table 1, except for health-related quality of life (HRQoL; EUROHIS-QOL) [29], which was divided into tertiles as no accepted thresholds for categorisation currently exist.
In addition to socio-demographic and health status differences in responsiveness, we also analysed structural factors relating to the specific situation of ASR, hypothesizing that conditions of the living environment and the asylum process may also influence the experience of responsiveness. These factors included residence status, type of accommodation (reception centre vs. regional accommodation centre) and urban-rural characteristics of the residential facility. Furthermore, ASR typically receive a health insurance card when a positive decision has been made on the asylum claim or after 18 months (15 months at the time of data collection), whichever comes first [7]. This has the potential to substantially reduce bureaucratic barriers in the health-seeking process, and therefore was also included as a structural factor.

Survey weighting
Adjustments for sampling frame were made by weighting for the probability of selection of each participant within the sampling design (S1 Text). The sample was calibrated using the population characteristics of ASR from the years 2016, 2017 and 2018 [30], using sex, age group and region of origin variables. To enable calibration with a full data matrix, missing values were imputed using the single imputation technique in the R-package mice [31] (S1 Text).

Statistical analysis
For the purposes of logistic regression, missing values were imputed using multiple and multivariate chained equations (Tables A and B in S1 Text), under the assumption that missing data are missing at random. Proportions of missing values per variable ranged from 0% (responsiveness: timeliness domain) to 29.4% (responsiveness: choice domain; Table B in S1 Text). The effect of imputation on the variance of each model was assessed using the fraction of missing information (FMI).
To address the first research aim, descriptive analysis was carried out without imputation of missing values. In the descriptive analysis, combined responsiveness scores were calculated only for those who had completed all questions of the responsiveness questionnaire (n = 207). Key sociodemographic, structural and health factors were tabulated for all included participants by sex. Proportions of unweighted and weighted responsiveness scores were calculated for each domain and for the overall responsiveness score with 95% confidence intervals. The design effect (DEFF) [32] was calculated to quantify the increase in variance due to weighting.
To address the second research aim, odds ratios (OR) and corresponding 95% confidence intervals (CI) for responsiveness were modelled using logistic regression on the fully weighted and imputed dataset. For the multiple logistic regression, combined responsiveness scores were calculated for all individuals included in the analysis (n = 344) using the fully imputed values. Combined responsiveness and each domain functioned as dependent variables in separate models. Independent variables representing key socio-demographic, health and structural factors (Table 1) were included in these models one at a time, adjusting for variables apriori defined as potential confounders, namely age (linear), sex and education. All independent variables were included in a final, multiple logistic regression model for combined responsiveness and each domain. Certainty of observed effects was assessed at the p�0.05 significance level.
Multicollinearity was assessed for all independent variables on the fully imputed dataset, calculating the Variance Inflation Factor (VIF) and Condition Index (S1 Text). VIFs were low, ranging between 1.09 and 1.69 for independent variables (Table 1), while the Condition number was moderate at 28.14. No variables were excluded from the model due to multicollinearity.
Model fit was assessed using the model F-statistic, testing the hypothesis that all coefficients are equal to zero in each logistic model.
Both descriptive analysis and logistic regression were carried out in STATA version 15.

Ethics statement
Ethical approval was obtained from the ethics committee of the Medical Faculty Heidelberg on 12.10.2017 (S-516/2017). Participants were informed verbally and in writing about the aims of the study and the handling of their data. Participants gave consent to participation in the study by virtue of returning their questionnaires to us.

Participants
A total of 560 ASR took part in the survey, representing a response rate of 39.2% (S1 Fig). Of these, n = 344 individuals provided accurate information on responsiveness and were included in the analysis.
Participants were predominantly male (66.1%) and young, with 47.3% of participants being under the age of 31. Educational status was varied, with just under half of participants (44.9%) reporting a medium educational status. The three most frequently reported regions of origin are Western Asia (26.4%), Southern Asia (26.8%) and Western Africa (20.4%). While most participants reported not having received a decision on their asylum application yet (59.1%), 61.7% of participants reported arriving in Germany over a year ago. The majority of participants (77.1%) reported living in collective accommodation centres, with the remaining participants living in reception facilities. While only 19.0% of participants reported being in bad or very bad health, a larger proportion reported longstanding illness (46.9%), symptoms of depression (PHQ2) (49.3%) and symptoms of anxiety (GAD2) (48.5%) ( Table 2).
Weighted and unweighted estimates do not appear to differ substantially, with differences on point estimates ranging from 0 percentage-points (timeliness domain) to 5 percentagepoints (choice domain). The effects of weighting on the variance of estimates (DEFF) are moderate, ranging from 0.6-fold to 2.1-fold increases in variance when the sampling design is taken into account (Table 3).

Inequalities in combined responsiveness
Multiple logistic regression models show that poor health status is a predictor of low combined responsiveness once age, sex and educational status have been adjusted for. In particular, lower adjusted odds for "(very) good" responsiveness can be observed for individuals with a longstanding illness compared to those without a longstanding illness (OR: 0.47, 95%CI: 0.23; 0.98), for individuals with high PHQ2 scores compared to those with low PHQ2 scores (OR: 0.33, 95%CI: 0.16; 0.68) and for individuals with high GAD2 scores compared to those with low GAD2 scores (OR: 0.45, 95%CI: 0.11; 0.46). Furthermore, lower adjusted odds of "(very) good" responsiveness can be observed for those individuals with worse HRQoL compared to those with the best HRQoL (third quintile): an adjusted OR of 0.14 (95%CI:0.04; 0.51) for the first quintile, and an adjusted OR of 0.24 (95%CI: 0.07; 0.82) second quintile. A borderline significance can also be observed for the adjusted odds of "(very) good" responsiveness in individuals with a "(very) bad" health status compared to those in very good to moderate health (OR: 0.43, 95%CI: 0.17, 1.07) (S2 Table).
With regard to other factors, individuals from Southern Asia had lower adjusted odds of reporting "(very) good" combined responsiveness compared to individuals from other regions (OR:0.41, 95%CI: 0.19,0.89). However, no further substantial differences in combined responsiveness rating were found across other socio-demographic and structural variables in the models adjusted for apriori confounders (S3 and S4 Tables).
When all factors were included in the full model, lower adjusted odds for "(very) good" responsiveness could still be observed for those with longstanding illness (OR:0.42, 95% CI:0.17,1.06), those with a lower HRQoL (lowest tertile-OR:0.23, 95%CI:0.05,1.12; medium tertile-OR:0.24, 95%CI:0.06,0.95), those with high GAD2 scores (OR 0.35, 95%CI 0.13,0.99) and individuals from Southern Asia (OR: 0.24, 95%CI: 0.07,0.86). In the case of longstanding illness and the lowest HRQoL tertile, these effects were only borderline significant in the full model. While no substantial effects are observed for individuals with "(very) bad" general health or those with a high PHQ2 score in the full model, lower odds for "(very) good" responsiveness could be observed in those participants under 31 years of age (OR:0.31, 95% CI:0.12,0.82) once all other factors had been adjusted for (Table 4).

Inequalities across domains
Results from fully adjusted logistic regression models for individual domains diverted partly from the results of combined responsiveness (Table 4). With regard to the health factors, significantly lower adjusted odds of reporting "(very) good" responsiveness were observed for participants with a high GAD2 score across the domains of timeliness (OR:0.33, 95%CI: 0. 16 Although structural factors did not show effects on responsiveness in the combined model, residence status and type of accommodation demonstrate important effects in individual domains. Residents in reception centres have lower adjusted odds of "(very) good" responsiveness in the choice (OR:0.23, 95%CI:0.08,0.66) and cleanliness (OR:0.35, 95%CI:0.12,1.01) domains when compared to individuals in regional accommodation centres. Participants with an asylum rejection or a toleration have lower adjusted odds of "(very) good" responsiveness P-values of the F-test ranged from p<0.001 to 0.199 (Table 4), indicating that the variables included in the full models were generally well suited to assessing responsiveness outcomes. Only two models, namely for the confidentiality (p = 0.129) and cleanliness (p = 0.199) domains, did not have sufficient certainty of model fit at the p�0.05 level.

Discussion
Overall, over three-quarters of ASR reported "(very) good" responsiveness of the health system in Germany, but rating differs markedly between domains, with choice and timeliness domains receiving particularly poor results. Inequalities in combined responsiveness could be observed in particular in relation to health status factors, with lower responsiveness reported by those with worse mental health, worse HRQoL and a longstanding illness. Being young and having a South Asian nationality emerged as important socio-demographic predictors of low combined responsiveness, while inequalities related to structural factors emerged only in particular responsiveness domains. Participants from both reception centres and regional accommodation centres reported low scores for the choice and timeliness domains. These, together with the cleanliness domain, represent "client orientation" domains, as opposed to the "respect for persons" domains of respect, communication, autonomy and confidentiality [33]. Features of the health system that lead to delays and a lack of choice in patient care may be expected in the setting of reception centres, where a single walk-in clinic often provides care for many residents [34]. However, these results are unexpected for ASR residing in regional accommodation centres, as they have formal access to regular health service structures. Further research is required to understand why these individuals reports poor choice and timeliness in order to improve the responsiveness of care for this population.
The client-orientation domains were rated comparatively well, indicating the efforts made by frontline staff to continue the same quality of services despite the unusual care setting and the potential challenges in the healthcare encounter. However, communication proved to be a difficult issue, coming in third from last compared to other domains. This may be expected given the pervasive issue of language barriers in the provision of health services for ASR [34]. However, it is unclear whether this rating relates to the lack of adequate interpreting services or the communication abilities of the physicians.
Our study found inequalities in responsiveness related to health status factors. These may result in horizontal inequities in health system access if low responsiveness impedes further engagement with the healthcare system: "Responsiveness that is systematically worse for certain social groups with the same or greater needs than other social groups could lead to inequities in access" (9). Thus, individuals with pre-existing conditions and illnesses, who may be most in need of responsive health services, are currently being sold short. This finding is particularly striking given the common assumption that responsiveness ratings improve with continued engagement and interaction with the health system [33]. Further research is required to understand the particular reasons for worse responsiveness rating for individuals in poor health and the development of system-based interventions to improve quality of care for this patient group.
Furthermore, lower responsiveness can also be observed for younger individuals and those with a South Asian nationality. This may indicate discrimination against these population groups within the health system. However, the observed differences may also be grounded in different expectations of what the health system should be able to deliver along responsiveness domains. Responsiveness has been recognized as a health system outcome which is strongly influenced by differential rating according to underlying health system expectations [19].
Thus, further studies should investigate whether the observed inequalities remain stable once differential expectations have been adjusted for, for example through the use of vignettes.
The factors found to be relevant in predicting combined "(very) good" responsiveness were not entirely consistent across different domains. This highlights the added value of creating a combined responsiveness score. While in some cases (e.g. age and high GAD2 score), relationships with responsiveness could be observed in several domains, in other cases (e.g. South Asian nationality and HRQoL), important effects may not have been picked up by analysing only the individual domains. Consistently low scores in each domain, although not necessarily relevant by themselves, resulted in strong predictions in the combined model. However, there is no commonly accepted and validated method of combining results across responsiveness domains, and the method used for this analysis should be assessed for reliability and validity in further research.
While there is no agreed cut-off for the level of responsiveness health systems should achieve, comparison with a recent population-based survey on responsiveness among German patients provides a good reference for putting these figures in context [35]. German patients report responsiveness ratings between 54.2%-96.2% across domains, with higher responsiveness ratings than our study in every domain. This comparison is suggestive of inequalities in responsiveness between ASR and the resident German population, and should be further explored in future studies.
A key feature of this study is that it is the first to apply the WHO Responsiveness survey in a diverse ASR population [36]. The comprehensiveness of the WHO Responsiveness instrument makes it a good tool to survey patient-rated quality of care and identify areas for improvement. The analysis of individual domains revealed complex patterns, reflecting both the heterogeneity of the ASR population and the complexity of the responsiveness concept. This study cannot comprehensively assess all dimensions encompassed in the responsiveness concept, but gives an overview of potential issues and topics for further inquiry. Further research into specific domains, including the use of qualitative methods, should be conducted to understand reasons behind responsiveness answers and differences in particular population groups. For example, recently published qualitative research into the "patient journey" of ASR following arrival in Germany shows that positive responsiveness ratings may be attributed to a high level of generalised trust in the healthcare profession [37].
This study benefits from a population-based, random sampling approach, a personal data collection approach [15,18] and rigorous weighting and imputation prior to analysis. A limitation of this study was a result of our data collection approach: in order to keep the questionnaire short only one set of responsiveness questions was included for all healthcare sectors. This makes it impossible to provide responsiveness results for ambulatory and in-patient care separately, as was done by the WHO. Future studies should consider asking for the sector which was being rated if the inclusion of two separate questionnaires proves difficult.
A further limitation is the potential non-response bias introduced by both the linguistic diversity and the literacy of the sample population. Non-response may also be an issue for the analysis of responsiveness itself, as individuals which did not visit a healthcare provider in the last 12 months are excluded from the measurement by design. As has been noted previously, this may lead to an upward bias in results if individuals who do not come into contact with the health system due to low responsiveness of care are not captured [9].
Finally, due to the different requirements of the imputation for the weighting and the analysis steps, two imputations were performed. This may mean that weights were calculated on the basis of slightly different data than was used for the final analysis. However, given the relatively small effects of the weighting approach we do not expect that this substantially affected the results of the final models.

Conclusion
This analysis found important differences in responsiveness related to health, socio-demographic and structural factors both in combined responsiveness and in individual domains among a diverse population of ASR. Inequalities related to health status factors are particularly concerning given the potential implications for equity in access to health care. The WHO responsiveness survey proved to be a useful concept to quantify quality of care among ASR and map existing inequalities from the patient perspective. This study therefore makes a novel contribution to the current literature on Universal Health Coverage for ASR by introducing aspects of the quality of care to existing analyses of eligibility and accessibility. Further research is needed to understand the relationships between responsiveness and the socio-demographic, health and structural factors explored in this analysis in more detail and to derive relevant system-level interventions to improve quality of care.