Measuring Quality of Maternal and Newborn Care in Developing Countries Using Demographic and Health Surveys

Background One of the greatest obstacles facing efforts to address quality of care in low and middle income countries is the absence of relevant and reliable data. This article proposes a methodology for creating a single “Quality Index” (QI) representing quality of maternal and neonatal health care based upon data collected as part of the Demographic and Health Survey (DHS) program. Methods Using the 2012 Indonesian Demographic and Health Survey dataset, indicators of quality of care were identified based on the recommended guidelines outlined in the WHO Integrated Management of Pregnancy and Childbirth. Two sets of indicators were created; one set only including indicators available in the standard DHS questionnaire and the other including all indicators identified in the Indonesian dataset. For each indicator set composite indices were created using Principal Components Analysis and a modified form of Equal Weighting. These indices were tested for internal coherence and robustness, as well as their comparability with each other. Finally a single QI was chosen to explore the variation in index scores across a number of known equity markers in Indonesia including wealth, urban rural status and geographical region. Results The process of creating quality indexes from standard DHS data was proven to be feasible, and initial results from Indonesia indicate particular disparities in the quality of care received by the poor as well as those living in outlying regions. Conclusions The QI represents an important step forward in efforts to understand, measure and improve quality of MNCH care in developing countries.


Background
Poor quality of care is a major impediment to efforts aimed at improving the health of populations in developing countries, particularly with respect to Maternal, Neonatal and Child Health (MNCH) [1,2]. Most recently, poor quality of care has been implicated in the disappointing outcomes of large scale programs aimed at increasing the coverage of maternal health services in developing countries including, the Janani Suraksha Yojana (JSY) conditional cash transfer program in India [3] and the Jamkesmas social insurance program in Indonesia [4]. It is suspected that while coverage of health services for these populations has increased, the quality of care provided is substandard [5]. Development of suitable measurement tools is necessary to support the improvements in quality of care and improve population health outcomes.
One of the greatest obstacles facing such efforts however is the current lack of data relating to quality, especially in low and middle income countries (LMICs) [6]. In the absence of fully functional health information systems, evidence on quality of care is scarce, and is often only available when specialised studies are conducted [7]. Additionally, there are multiple definitions of health care quality encompassing a multitude of dimensions ranging from efficacy and patient safety through to system efficiency and cultural appropriateness of care [8].There is as yet no international standard in how to measure quality of care, and as a result, differing definitions and choice in indicators limit comparability between studies [7].
The existing measures of quality of maternal care are typically focused on high level facility based care with caesarean and episiotomy rates [9], maternal near miss events [10], and maternal mortality commonly reported. Even fewer measures of quality of care exist for neonates, and the few that are commonly reported also emphasise tertiary level care. Consequently, existing measures tend to exclude women who deliver at home or in smaller clinics; in many developing contexts this can represent the majority of the population. The availability of data may also be hindered by the existence of largely unregulated private sectors that provide a high proportion of maternal and neonatal health services [11][12][13][14].
Given this situation, there has been interest in the use of surveys to collect information on people's experiences of MNCH services. The collection of detailed population level data relating to multiple dimension of quality of care, such as provider actions and patient satisfaction, has been conducted using specially constructed surveys over small populations [15][16][17][18], but at the same time the availability of quality related measures in large scale population surveys has been limited. Attempts to use such surveys to report population level indicators of quality have been almost solely based on the basic coverage of antenatal care as reported by country level Demographic and Health Surveys (DHS) [19][20][21]. There is potential to increase the number of quality related indicators included in such surveys [22,23], and more recent surveys do collect additional data related to care during and after delivery, but as of yet few studies have utilised these indicators within the context of examining quality of care.
This article proposes a methodology for creating a single "Quality Index" (QI) representing quality of maternal and neonatal health care based upon data collected as part of the DHS program and standard econometric techniques used to create composite indexes in other areas of development studies. As a result of the standardised, modular nature of all surveys conducted through the DHS program, existing health indicators generated by DHS surveys can be reliably compared across countries, and, in countries that regularly conduct these surveys every three to five years, over time. DHS surveys thus have great potential in relation to the estimation and monitoring of quality of care should it be feasible to derive such estimates from the available data. The 2012 Indonesia DHS [24] will be used as an example to demonstrate the viability of the method, as well as the potential for analysis of differences in quality of care.

Ethics approval and consent to participate
This research used pre-anonymized quantitative datasets, for which ethical approval was given at the time of data collection. The data was collected with the intention of being used for future research, and only quantitative responses to pre-approved questions are recorded in the dataset.

Data sources
With the introduction of the Phase 6 DHS questionnaire in 2008, several additional variables related to the timing and content of particular actions during pregnancy and in the immediate postnatal period were included in the survey design. These questions were asked with regards to the last pregnancy experienced by all women with a live birth in the past five years, providing an opportunity to measure indicators associated with the quality of routine pregnancy and delivery care. The Indonesia 2012 dataset was chosen as it represents a large Phase 6 survey that included a number of questions relating to the content and timing pregnancy and delivery care. Comparison of measures derived from the standard DHS questions and those derived from all available questions would thus provide a test of the adequacy of the existing DHS questions in creating indicators of quality care.

Selection of indicators
The Donabedian conceptual framework considers "quality" as the combination of structural elements (affecting the context in which care is delivered), process elements (all the actions that make up health care) and outcome elements (the effects of healthcare on the population) [25]. However, as standard DHS do not contain questions related to patient satisfaction, or to health inputs or outcomes, the definition of quality to be used for this analysis by necessity must be based on process indicators representing actions taken during contact with the health services in question. As a result, indicators were identified based on the recommended actions outlined in the WHO's Integrated Management of Pregnancy and Childbirth (IMPAC) guidelines [26]. These guidelines are designed to outline essential practices by front line workers that address key areas of maternal and perinatal health programs. As such, they provide an objective, albeit heavily service oriented, framework on which to base indicator selection.
Based on these guidelines both the standard DHS questionnaire and the Indonesia 2012 questionnaire were examined for the presence of questions that could be used to construct relevant indicators of quality care. Table 1 shows the final set of indicators chosen for this analysis: a full list of all the potential indicators identified, including those not available in the Indonesia 2012 dataset are provided in S1 File with a rationale for each indicator's use.

Data preparation
The sample was first limited to women of reproductive age with at least one live birth in the past five years. Due to difficulties in reconciling different populations at risk, childhood healthcare was omitted from the analysis, and the unit of observation was the mother and her lastborn child (the postnatal experience of the child was considered as a continuation of the mother's experience during pregnancy and birth). Where possible, indicators were transformed into binary variables taking a value of either 0 (not present) or 1 (present). Observations with missing data for any of the indicators were excluded from the analysis, however in order to minimise the impact of missing observations, particularly from under-sampled areas, the following assumptions were made prior to data being dropped: Firstly, for variables related to yes/ no questions, a response of "don't know" was treated the same as a "no" response. This assumption does potentially increase the risk of recall bias affecting the sample, and creates a more conservative estimate, however unless there is a large proportion of cases where this response is prevalent it is unlikely to have a major effect on the overall validity of the sample. Secondly, for indicators where a quantitative value such as timing or quantity of service provided is missing or coded as "don't know", but other variables indicate that the service did occur, the observation was given the sample mean value of the quantitative variable. This approach is less likely to exclude observations for which recall bias hinders accurate quantification and is unlikely to be problematic unless a large proportion of observations are missing in this data.

Index construction
One of the most important considerations in the construction of any composite index is the use of indicator weights to determine the final score. The simplest option is to apply equal weighting, where all indicators contribute equally to the index and the final score is a simple average of all indicators. For example, the "Skilled Attendance Index" proposed by Hussein and colleagues [27] consisted of a score representing the simple percentage of 43 predetermined criteria met by that delivery (based on facility records). However this example also demonstrates one of the major limitations of equal weighting, as using equal weighting the provision of routine oxytocics contributed the same amount to the index as recording whether or not the patient started labour.
Another method of deriving weights is through the use of a statistical analysis of the dataset itself. The most commonly used technique is Principal Components Analysis (PCA)-a multivariate statistical technique that uses the correlation between multiple variables to determine the presence of coherent subsets of variables that may collectively represent an underlying factor (such as household wealth or social development) that cannot be directly measured [28]. These underlying factors, or "components", are ordered such that the first component explains the largest possible amount of variation in the sample, the second (uncorrelated) components explaining additional variation, and with further components explaining progressively less and less variation. Examples of indexes using PCA derived weights include the Wealth Index [29] and the Indices of Social Development [30].
The most direct method of creating weights from the results of PCA is to assume that the first component corresponds to the underlying process that the index is attempting to measure [31,32].
An index is then created by calculating a score for each observation consisting of the sum of the variable values multiplied by the calculated weight. The index produced by this method will be a relative one-as the index is based on the unique properties of the dataset itself, the resulting scores are not comparable between datasets. Likewise, it is possible that the principal components may vary between subgroups within the dataset-rural populations may have a different asset profile to those in urban areas. PCA derived indexes may therefore be of limited use in producing cross country comparisons, but are well suited for examining within country differences. In contrast the use of equal or theoretically derived weights provides a clearly understood measure that can be compared over different datasets; however the index will not be sensitive to changes in the relative importance of different variables in different contexts. For this reason two methods of indicator weighting were chosen for use in this analysis-one based on PCA derived weights, and a second based on a modified version of equal weighting.
The Equal Weight (EW) indices use a slight modification to equal weighting, similar to the theoretical component method used by the Human Development Index [30]. All original indicators carried equal weight in the final index; however indicators that did not take a binary form (that is, indicators where multiple levels of quality may exist) were treated as if made up of equally weighted subcomponents. This allowed for some level of discrimination between different levels of coverage for given indicators, while keeping to the equal weighting principle.

Testing of indices
Four QI were constructed as a starting point of the analysis. These indices were based on the complete and DHS standard set of indicators and employed both the EW and PCA weighting methodologies. Additional indices were then created to determine if the number of categories for non-binary indicators, or the presence of absence of particular indicators, affected the robustness of the results. The indices were tested for internal coherence and robustness, as well as their comparability with each other, using the example set by Pritcher and Filmer [31]with regards to the development of the wealth index. In addition, Cronbach's alpha was calculated as a measure of internal consistency for each set of indicators, with a coefficient of 0.7 or above considered to be acceptable [33]. Finally, a single QI was chosen to explore the variation in index scores across a number of known equity markers in Indonesia, including wealth, urban rural status and geographical region.

Results
The 2012 Indonesia DHS originally provided a sample size of 15,262 women who had had at least one live birth in the five years prior to the survey. Following the initial construction of the indicator variables and the application of the stated imputation processes to deal with missing values, a total of 14,864 observations were included in the final analysis. Table 2 provides a breakdown of observations with no missing data, observations with at least one imputed variable and observations that were dropped due to missing data, by selected demographic factors. Two proportion z-tests were used to compare the imputed and missing observations to those with no missing data; there are no significant differences between the non-missing and dropped observations with the exception of wealth, with the dropped observations containing a higher proportion of observations from the poorest wealth quintile. In contrast, the imputed observations do appear to vary substantially from the non-missing observations with regards to urban rural residence, education and wealth. As these observations account for nearly 13% of the total sample, their omission from the remainder of the analysis might affect the representative ability of the dataset as a whole. Following a sensitivity analysis (S2 File) a decision was made to include these observations in the remainder of the analysis. Table 3 provides a full list of the initial categorisation used to create variables for index construction. The table also reports the mean and standard deviation of each variable. Coverage of different indicators varied substantially; some, such as stomach examination and blood pressure measurement, were over 90%, while other such as discussion about blood donors were quite low. In general however coverage of the indicators was high enough to allow for meaningful differentiation between high and low levels of quality. Table 4 reports the PCA derived variable weights for a number of scenarios in which the included indicators, and the number of quality levels for each indicator, differ. In the initial scenario (column 1), all potential indicators were included and up to five categories of quality were available for each indicator. The results are as expected-variables such as "No ANC visits in third trimester" and "No Tetanus Protection" are strongly negative while their counterparts thought to represent a high level of quality ("2+ ANC visits in third trimester" and "Full Tetanus Protection") score quite positively. Interestingly, the variables with the strongest effect on the final quality score are those related to interpersonal communication during pregnancy; advice on pregnancy complications scored particularly highly, as did discussion of transportation, place and payment for delivery. The results of the same PCA process carried out only on core DHS indicators is shown in column 2. The driving variables remain roughly the same, however the reduction in the number of variables has increased the magnitude of the weights assigned to the remaining indicators. Pregnancy complication advice remains the highest scored variable, but the influence of urine and blood tests during ANC and provision of timely postnatal care becomes more apparent.
The variable representing no prelacteal feeding (believed to be an indicator of good quality care) has a negative score, reflecting the known decrease in exclusive breastfeeding in wealthier and more urbanised populations [34].Columns 3 and 4 report the results of the previous scenarios with the removal of prelacteal feeding as an indicator. There are no major changes in the weights assigned to other variables, and the overall proportion of variance explained by the first component increased only slightly, suggesting that the omission of this indicator did not overly affect the index.
Because it is possible that the number of categories used to define quality within a given indicator may affect overall representation of the indicator within the dataset, two additional scenarios were included, in which the levels of quality allowed for each indicator were limited to "Full", "Partial" and "None" (columns 5 and 6). The change in classification only affected three indicators; iron supplementation during pregnancy, maternal PNC and neonatal PNC. For both PNC indicators the consolidation of the partial quality variables resulted in relatively little effect-however while having no iron supplementation carries a strongly negative weight  as expected, "partial" iron supplementation carries a much higher positive weight than "full" iron supplementation. As the magnitude of the partial and no supplementation variables is considerable, complete exclusion of this variable was inappropriate and might reduce the explanatory ability of the index. As such, a scenario in which a replacement indicator representing the presence of iron supplementation rather than its duration was created. The results of this scenario can be seen in last two columns in Table 4. Again, this change resulted in only minor increases in the variance explained by the principal component. This consistency in weights and variance explained by the principal component was also seen during sensitivity testing involving the recreation of a single QI (Scenario 5) using random subsamples of the dataset (see S2 File). With regards to internal consistency, only the DHS based index that included prelacteal feeding (in Column 2) had an unacceptably low alpha coefficient. This is understandable as Cronbach's alpha is designed to reflect the degree to which the indicator set reflects a single, unidimensional construct and prelacteal feeding has already been identified as an outlier with regards to the other indicators. Overall these results suggest that the PCA based QIs were not overly sensitive to minor variation in the choice and classification of indicators; the greatest differences in results occurred as a result of the reduced number of variables in the standard DHS indicator set.
To test that this robustness extended also to indices created through the EW method, observations were ranked according to their scores measured using the PCA and EW indices for the scenarios mentioned above and divided into quintiles. The mean value for each variable was then compared according to each index; as an example, Table 5 shows the mean value for each  variable according the PCA and EW index quintiles for the first scenario. A full table of these results is provided in S2 File. In general the results appear to be robust, with "positive" variables such as blood testing during ANC having higher means in the higher quintiles and "negative" variables such as no tetanus protection having higher means in the lower quintiles. Table 6 compares the correlation between quintiles created using QIs that utilise different indicator sets, categorisation and weighting methods. Despite the considerable variation in content the classification of observations into quintiles is relatively stable; even scenarios in which the underlying indicator set and weighting methodology differ have over 75% correlation. This suggests that the indices are reflecting differences in the underlying quality of care rather than differences in a subset of dominant indicators. For the purpose of summarising differences in quality of care within Indonesia, a single QI was chosen. From a policy perspective, relative variation in quality of care is important in order to provide an overview of which groups are doing better or worse than their peers; for this reason the following sections utilise the PCA index based on all available indicators using simplified categorisation (Scenario 5 from Table 4).
Urban areas have a substantially higher score compared to rural areas (Fig 1). Given that rural areas are known to have issues with access to services [35], it is likely that these areas may have issues with key inputs that limit the level of quality available to the population. Wealth also appears to have an effect on quality- Fig 2 shows the mean scores by wealth quintile.
Indonesia is a large and diverse country, and health outcomes have been known to vary substantially between different geographic regions, and as seen in Fig 3 and quality of care is no exception. The impressively high score in Yogyakarta is overwhelmed by the massively negative score seen in Papua and provinces in the Java/Bali region have relatively good scores while outlying regions such as Sulawesi and Maluku are generally negative.
Examining these provincial scores by urban rural status makes the situation even clearer. Fig 4 shows that in the provinces that are doing well there are only minor differences between rural and urban populations, while the rural population in these outlying regions appear to be receiving much worse care than their urban counterparts. As it is possible that this difference in urban/rural outcomes is due to differences in wealth between the two populations, Fig 5 illustrates the mean quality score for each wealth index by region. Perhaps unsurprisingly, the provinces showing great differences between urban and rural populations also show large differentials between the poorest and wealthiest quintiles. It is interesting however that even the wealthiest quintile in remote regions do not score highly; conversely in high performing regions such as Yogyakarta even the poorest appear to be receiving a high level of care.

Discussion
This is the first study to utilise PCA based techniques to attempt to quantify variation in quality of care using DHS data. As one of the major assumptions of the PCA process is that the resulting index is capturing an underlying dimension of quality, it is heartening that both equal weighting and PCA weighting techniques produced indexes that were internally consistent and robust to the inclusion or exclusion of individual indicators. Similarly, the similarities between the results of the country-specific and DHS indicator sets suggests that the process may create similarly robust results in other countries where only the standard set of DHS questions were included.
There is concern however that the current methodology creates an index that is more reflective of access to care rather than the quality of care provided; a woman without access to health services might score the same as a woman with access to only very poor quality services. From a systemic perspective, lack of services may in fact an indicator of poor quality, but from a policy perspective it is far more useful to quantify quality within the scope of those who do receive care in some form. The difficulty lies in defining who does and does not have access to care: should availability of services be considered or only usage? Should the types of services available matter or is having access to any form of care enough? These are issues that should be addressed as the methodology is further refined.
Such refinement would be highly beneficial, as the use of a single composite index to allow the direct comparison of different population subgroups has allowed for greater understanding of the factors driving inequities in MNCH care. In particular, the differing patterns of quality scores seen between and within regions in Indonesia suggests that place of residence may have a greater influence on determining the quality of maternal and neonatal care than wealth or urban-rural status alone. This is perhaps unsurprising given the heavily decentralised nature of Indonesia's health system; significant regional variation has been noted for other health related measures such as child mortality [36,37], and observational data suggest that the performance of local health systems varies substantially from district to district. Countries with more highly centralised health systems would be expected demonstrate different patterns of variation. Importantly, as the questions used to construct the QI score are based around actions that should be undertaken as part of routine care by health providers, it is possible that these results may provide a mechanism through which poorly performing systems may be identified for policy intervention. For example, the positive association between having over a month of iron supplementation and other quality markers, but lack of association with having over six months of iron supplementation, suggests that the continuity of supplementation is an area in need of further attention.
This does however reveal another limitation of the current methodology; without indicators related to the types of health messages provided by health staff, it is difficult to determine why existing practices are ineffective and thus design an appropriate health system intervention to address the problem. In terms of health education, advice about recognising pregnancy complications during ANC is the sole indicator available as part of the standard DHS questionnaire, and while there was a very strong association between the QI and this indicator, as well as the Indonesia specific indicators relating to birth preparedness (discussion of which should occur as part of standard ANC) it remains unknown if other critical health messages communicated to the client. Given the acknowledged importance of interpersonal interaction with the health provider in encouraging continuance of care, through health promotion and education as well as patient satisfaction [38,39], future QI would benefit from questions addressing these issues.
Another limitation of the current indicators is the lack of coverage for routine interventions such as clean delivery, thermal management active management of third stage of. Similarly, there is scant information relating to postnatal care-both in terms of content and timing of follow up visits. As the major benefits of PNC come from the prompt identification of issues that require additional care [40,41] this represents an area that should ideally be included in any holistic measure of MNCH quality. Despite concerns regarding recall bias, questions relating to these interventions have occasionally been included in individual countries' DHSs; for example, Nepal 2011 [42] included a questions about oxytocin use during delivery, Philippines 2013 [43] included questions about the type of examinations given during maternal PNC and Bangladesh 2011 [44] included questions about cord care and temperature management (albeit only for home deliveries). Similarly, questions about advice and counselling for specific MNCH issues have been included in a number of other country surveys [45]. While there is need for a consensus as to the most appropriate indicators to be used, it does appear that such questions could feasibly be included as part of the standard DHS questionnaire.
One inherent issue with the use of standard DHS methodology however, is that it is reliant upon self-reporting for all variables related to pregnancy and childbirth, with a recall period of up to five years. In addition to the known variability of recall bias in general [46], validation studies comparing self-reported coverage of MNCH indicators to either health care records [47] or direct observation [48]suggest that the sensitivity and specificity of self-reported coverage can vary substantially, both between indicators and between contexts. Care should thus be taken in interpreting the QI, especially with regards to the potential for social desirability and recall bias to affect self-reporting of actions taken during the antenatal, delivery and postpartum period.
Similarly, while provision of emergency obstetric care can have a large impact on both maternal and neonatal mortality rates [49,50], survivorship bias precludes the DHS from providing reliable measures relating to the treatment of potentially fatal conditions. Additionally, the variable weights produced by the PCA process do not reflect the relative importance of any given intervention in preventing death or disability. As a result, we can only consider the QI to be a partial indicator of true quality of maternal and neonatal care-however given the lack of regularly available measures in the existing literature even this imperfect measure may provide significant benefits to our understanding of quality of care in LMICs. Ideally, these findings will stimulate further research into the inclusion of a more diverse range of quality indicators in standard DHS based surveys. With further refinements, it is possible that the QI might be used to compare quality between countries, and provide an additional tool for researchers and policymakers to investigate of the effect of different health system elements on quality of care.

Conclusions
As demonstrated using the Indonesia 2012 DHS, the Quality Index provides a method through which data collected as part of routine DHS programs may be used to examine disparities in the quality of maternal and neonatal health care in LMICs. The resulting analysis can provide important insights into both the current state of quality of care and the potential avenues for health system intervention. In Indonesia, for example, the analysis noted particular disparities in the quality of care received by the poor as well as those living in outlying provinces. The QI thus represents an important step forward in efforts to understand and improve quality of MNCH care in developing countries. It allows policymakers and development partners to measure, track progress and compare the quality of MNCH care across countries and within target populations.