Figures
Abstract
Background
Worse health outcomes including higher morbidity and mortality are most often observed among the poorest fractions of a population. In this paper we present and validate national, regional and state-level distributions of national wealth index scores, for urban and rural populations, derived from household asset data collected in six survey rounds in India between 1992–3 and 2007–8. These new indices and their sub-national distributions allow for comparative analyses of a standardized measure of wealth across time and at various levels of population aggregation in India.
Methods
Indices were derived through principal components analysis (PCA) performed using standardized variables from a correlation matrix to minimize differences in variance. Valid and simple indices were constructed with the minimum number of assets needed to produce scores with enough variability to allow definition of unique decile cut-off points in each urban and rural area of all states.
Results
For all indices, the first PCA components explained between 36% and 43% of the variance in household assets. Using sub-national distributions of national wealth index scores, mean height-for-age z-scores increased from the poorest to the richest wealth quintiles for all surveys, and stunting prevalence was higher among the poorest and lower among the wealthiest. Urban and rural decile cut-off values for India, for the six regions and for the 24 major states revealed large variability in wealth by geographical area and level, and rural wealth score gaps exceeded those observed in urban areas.
Conclusions
The large variability in sub-national distributions of national wealth index scores indicates the importance of accounting for such variation when constructing wealth indices and deriving score distribution cut-off points. Such an approach allows for proper within-sample economic classification, resulting in scores that are valid indicators of wealth and correlate well with health outcomes, and enables wealth-related analyses at whichever geographical area and level may be most informative for policy-making processes.
Citation: Bassani DG, Corsi DJ, Gaffey MF, Barros AJD (2014) Local Distributions of Wealth to Describe Health Inequalities in India: A New Approach for Analyzing Nationally Representative Household Survey Data, 1992–2008. PLoS ONE 9(10): e110694. https://doi.org/10.1371/journal.pone.0110694
Editor: Koustuv Dalal, Örebro University, Sweden
Received: March 10, 2014; Accepted: September 25, 2014; Published: October 30, 2014
Copyright: © 2014 Bassani et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that, for approved reasons, some access restrictions apply to the data underlying the findings. NFHS data are available on request from the Demographic & Health Surveys Program (www.dhsprogram.com). DLHS data are available on request from the International Institute for Population Sciences (www.iipsindia.org).
Funding: The authors have no support or funding to report.
Competing interests: The authors have declared that no competing interests exist.
Background
Worse health outcomes including higher morbidity and mortality are most often observed among the poorest fractions of the population [1]. This is in part due to lower health service use, more limited access to health interventions and poorer nutritional status [1], [2], but health inequalities are a consequence of complex processes including multidimensional drivers reflecting differences in economic status and social characteristics such as gender and ethnicity. The growing need to better understand the influence of poverty on health has dramatically increased the interest in research [3], [4] and also the programmatic attention on health inequalities in low- and middle-income countries [2], [5], [6], [7].
The development of new methods for estimating household economic status has facilitated new research on the effects of wealth disparities on health [8], [9]. Preferred measures of economic status require data on household income or consumption, but these indicators are hard to define in some settings, difficult to collect on a large scale and prone to misclassification [10]. Conversely, data on ownership of durable goods, housing characteristics and access to infrastructure are easier to measure and commonly available from household surveys, and these can be used compositely to classify households’ relative wealth [11]. As such, asset-based wealth indices derived through principal components analysis are increasingly being used to characterize economic status in household survey analyses of health inequalities [12], [13] In addition, such surveys are often repeated periodically in a given population, allowing indices to be updated as needed to ensure the most relevant assets are included.
While asset-based wealth indices are typically constructed at the national level, the use of national wealth score distributions for sub-national analyses is problematic [14], [15], [16]. For example, ignoring the wealth score distribution at the geographic level of interest (e.g. district, state, or region) may result in a large proportion of one population (e.g. state) being assigned to the top or bottom of the wealth distribution of another population (e.g. region), thereby hiding level-specific wealth gradients. The use of geographical-level wealth distributions allows one to correctly classify households according to the most appropriate wealth score distribution, enabling proper comparisons across different states, regions or countries and across different geographical levels.
In India, due to the large, socio-economically diverse population and the decentralized decision-making and policy-setting structures, the use of wealth distributions at multiple geographic levels is especially important for analyzing and addressing health inequalities. However, while national and sub-national wealth distributions in India have been devised and employed previously [17], [18], [19], a comprehensive set of wealth distributions at multiple geographic levels in India has not been made available in the literature before now. In this paper we present national, regional and state-level distributions of national wealth index scores, for urban and rural populations separately, derived from household asset data collected in the three rounds of the Demographic and Health Survey, known as the National Family Health Survey (NFHS) in India [17], [18], [20], and in three rounds of the District Level Household Survey (DLHS) [21], [22], [23]. The six surveys cover a period between 1992–3 and 2007–8 and allow for a standardized measure of wealth that can be used in survey-specific analyses as well as for comparisons across surveys/time-points. We validate our indices by analyzing height-for-age as one example of a health inequality which has previously been shown to have marked differences by wealth quintile [24]. Further, we illustrate the important misclassification of households that may result from sub-national analyses that use national wealth distributions. We propose that the urban and rural wealth score decile cut-off values that we present for different geographical levels can be used to improve future analyses of health inequalities in India and ultimately inform the decentralized policy-making processes by which such inequalities can be effectively addressed.
Methods
Ethics statement
This secondary analysis of anonymized survey data available in the public domain did not require prior approval from an ethics review board. The original surveys received approval by the relevant ethics review boards.
Data
The National Family Health Survey (NFHS) is a large-scale, nationally representative survey of Indian households providing state- and national-level estimates of key demographic and health indicators. Three rounds of the survey have been conducted to date (NFHS-1 in 1992–3, NFHS-2 in 1998–9 and NFHS-3 in 2005–6), each using an equivalent multi-stage sampling approach and including more than 85,000 households, with an overall response rate above 98%. Sampling design, sample size and response rate details are published in the round-specific survey reports [17], [18], [20]. In addition to demographic and health information, the NFHS collects data on household socioeconomic characteristics, including ownership of various assets, housing construction materials, and access to electricity. The assets included in the survey questionnaire varies between rounds.
The District Level Household Survey (DLHS) has been conducted in four rounds to date: DLHS-1 in 1998–9, DLHS-2 in 2002–4 DLHS-3 in 2007–8 and DLHS-4 in 2012–13. This survey collects information similar to the NFHS surveys but uses a sampling frame tailored to be representative at the district level [21], [22], [23]. Here we use data from the first three DLHS rounds, as datasets from the most recent round are not yet available in the public domain.
Wealth indices
We initially constructed separate indices for urban and rural setting in each survey, using different lists of assets. While there are fundamental differences in infrastructure and lifestyle between urban and rural areas, our comparison of the separate indices to a single national index revealed that the national index performed as well as the separate urban and rural indices in all states, with the advantage of being simpler to develop and implement in future research. However, because the assets on which data were collected through the surveys differed over time, a separate national index was constructed for each of the six surveys.
We derived our indices through principal components analysis (PCA) using Stata 12 [25]. PCA is a multivariate statistical technique for reducing a larger number of variables to a smaller number of dimensions [26]. PCA can summarize the variance of different types of variables with no specific distribution, generating a score that captures, in its first component, the greatest amount of data variability explained by one linear combination of variables. This approach is well-suited for handling the mixture of discrete and continuous data typically collected in household surveys [13]. The use of variables measured on different scales can result in different variances and this may produce quite different results in the PCA depending on whether one uses covariance or correlation matrices for the calculations. Large variances will dominate the first principal component if covariance matrices are used. For this reason, the PCA was performed using standardized variables from a correlation matrix, which minimizes the differences in variance. To generate valid indices that were as simple as possible, each index was constructed with the minimum number of variables/assets that would produce scores with enough variability to allow us to define unique cut-off points for each urban and rural area of all states.
The indices include 16 assets for NFHS-3 (2005–6), 14 assets for NFHS-2 (1998–9) and 11 assets for NFHS-1 (1992–3). The index for DLHS-3 (2007–8) includes 14 assets, the index for DLHS-2 (2002–4) includes 10 assets and the index for DLHS-1 (1998–9) includes 9 assets (Tables 1–6). Binary coding (i.e. yes/no) was applied to all but two assets; highest education level achieved by the household head was categorized as none/primary/secondary/higher than secondary, while the number of bedrooms in the dwelling was categorized as one/two/three/four or more (thereby ensuring that at least 5% of households were included in the highest category).
An index coefficient c for each asset was calculated using the expression rounded to the nearest integer. The wealth scores for each household were then calculated using the expression where ci represents the index coefficient and vi the coded value of the ith asset.
From the resulting score assigned to each household, the national, regional and state score distributions were derived for each survey round, for urban and rural areas separately, and the score value for each stratum-specific decile was then identified. To account for the complex survey design, the sampling weights provided with the survey datasets were used for all analyses.
Results
Tables 1–6 give the indexed variables for each survey respectively, with their factor loadings, standard deviations and index coefficients. For the NFHS-3 (Table 1), the first component explained 38.5% of the data variability. The first component explained 43.1% of the variability in the NFHS-2 data (Table 2), and 40.0% of the variability in the NFHS-1 data (Table 3). For the DLHS surveys, the first component explained 39.8% of the data variability in DLHS-3 (Table 4), 36.0% of the variability in the DLHS-2 data (Table 5), and 37% of the variability in DLHS-1 (Table 6).
Using the 2006 WHO growth standards [27] we analyzed the distribution of the mean height-for-age z-scores across wealth quintiles defined by local reference cut-off points. As expected, the mean height-for-age z-score increased from the poorest to the richest wealth quintiles, and similarly, prevalence of stunting was higher among the poorest and lower among the wealthiest. These trends were consistent for all three rounds of the NFHS. We calculated the Pearson correlation between the continuous wealth score and height-for-age z-score for all children under age 5. The values were 0.23 (p-value<0.0001) in urban areas and 0.18 (p-value<0.0001) in rural areas (NFHS-3). Spearman rank correlations had very similar results: 0.27 in urban areas and 0.20 in rural areas (p-value<0.001). These values were similar to correlations obtained between height-for-age z-score and the originally constructed NFHS-3 wealth index based on the DHS methodology [28]. The Pearson correlations were 0.25 and 0.19 (p-value<0.001) and the Spearman rank correlations were 0.28 and 0.21 (p-value<0.001) in urban and rural areas, respectively.
In Figures 1 and 2 we present state-level analyses for Kerala and Uttar Pradesh in 2005–6 showing mean height-for-age z-score (Figure 1) and stunting prevalence (Figure 2) by wealth quintile, and comparing estimates for locally defined quintiles with estimates for the national quintiles originally defined in the NFHS-3. Kerala and Uttar Pradesh were chosen to represent the diverse levels of economic development and health indicators. Kerala is among the richest states in India and ranks highest in terms of conventional measures of health and economic development, while Uttar Pradesh is one of the poorest states and ranks among the lowest by infant mortality rate, literacy, and per capita income [18], [29], [30]. Based on the original NFHS-3 national quintiles, nearly 50% of children with survey height measurements in Kerala are classified in the richest quintile, whereas local cut-offs result in a much more even distribution of children across quintiles. The wealth gradient for child linear growth in Kerala appears steeper when the national quintiles are used compared to the locally defined quintiles. The strength of this relationship is likely overstated because, with fewer individuals classified in the poorest quintiles based on the national cut-offs, there is additional uncertainty in estimating the mean height-for-age in these groups. This exaggeration of the state-specific wealth gradient when using national quintiles is similarly shown in Uttar Pradesh, where only 10% of children were classified in the richest national quintile.
For analyzing health inequalities, the importance of using reference distributions from the most appropriate geographical level is further illustrated in Figure 3. We compare the wealth score distributions of a sub-sample of eight rural villages in Himachal Pradesh with the full rural distribution for Himachal Pradesh (top panel) and with the rural distribution for all of India (bottom panel). If the sub-sampled villages had a similar wealth distribution to that of the state, all bars in the upper histogram (representing each quintile) would include approximately 20% of the sub-sampled village households. However, the sub-sample distribution is in fact largely skewed towards the lowest state-specific wealth quintile. Alternately, when compared to the rural wealth distribution of the whole country the sub-sample distribution is skewed to the higher national quintiles.
The plotted distributions of household wealth scores by urban and rural areas for each survey round are given in Figures 4 and 5. For the most recent round of the NFHS, in 2005–6, score values for urban households across India ranged from 20 to 955, with a mean score of 547 (standard deviation of 230) and a median score of 552. In rural India, the mean score was 317 (standard deviation of 221) and the median score was 268.
Urban and rural decile cut-off values for India, for the six regions and for the 24 major states are presented by survey round in in Tables 7–12. These wealth score distributions reveal large variability between states, regions, and urban and rural areas. For the most recent NFHS round (2005–6), median scores in urban areas were highest in Delhi (665), followed by Goa (663), Uttaranchal (653), Himachal Pradesh (649), and Punjab (648), with all but Goa located in the North region. The North region’s median score (646) is similar or higher than the seventh decile cut-off values of all other regions. The poorest urban areas were in the states of Tamil Nadu (382) and Andhra Pradesh (394), with median scores that are lower than the third decile cut-off values of 11 other states.
Rural wealth score gaps are even larger than those observed in urban areas. Median scores in rural areas for 2005–6 (NFHS) were highest in Delhi (608), followed by Punjab (597), Goa (595) and Kerala (588), with very low median scores in the Eastern states of Jharkhand (114), Orissa (114), Bihar (124) and the Central states of Madhya Pradesh (145) and Uttar Pradesh (144). The median scores of Jharkhand and Orissa are lower than the first decile cut-off values for 11 other states, and lower than the second decile cut-off values for 17 other states. The region with the richest rural areas is the North, with a median score of 496, followed from a considerable distance by the West region with a median score of 291.
Discussion
PCA has been previously evaluated and used for the development of wealth scores based on household asset data, including the NFHS itself [17], [18], [20]. The present analysis adds a careful consideration of the sub-national variation in wealth and the differences in wealth index scores and components by rural and urban areas. As has been shown previously in Brazil [15], there is large variability in sub-national distributions of scores and there are many benefits in taking the variation into account. For example, it allows for within-sample economic classification, and for comparisons across geographical and urban/rural distributions.
Unlike the original PCA-based wealth score that is made available with the NFHS datasets, which has a common number of items and scores for a national distribution, the indices were constructed so as to allow for the identification of regional and state-level decile cut-off points for urban and rural households separately. This enables the scores to be used for comparisons at different levels of aggregation, and the importance of local distribution cut-off points is illustrated by the state-level examples in Figures 1 and 2. In addition, changes in the index components and in their coefficients over time – from 1992–3 (NFHS-1) to 2005–06 (NFHS-3) for example – illustrate the need to revise indices periodically.
The items that we included in the national wealth indices are relatively simple to measure in population surveys, and are limited 15 or fewer assets, thereby limiting the time needed to collect wealth data during a household interview. In addition, future analyses of the NFHS datasets can take direct advantage of the wealth indices and the sub-national score distributions presented here. Other variables that were available in the NFHS surveys were not included because they did not contribute importantly to the score and/or were not required to improve the distribution. Radio and bed are two examples of items that had a lower loading (less than 0.2) and were kept in the calculations to improve the distribution of the score by avoiding accumulation of households in a specific decile (a function of having too many households reporting ownership of a very limited number of items, especially in rural areas).
The resulting scores are valid indicators of wealth that correlate well with health outcomes, as seen by the variation in the mean height-for-age scores (Figure 1) and in stunting prevalence across the wealth score quintiles (Figure 2). The proportion of the total variability explained by the first component of the urban (ranging from 39.5% and 41.4%) and rural scores (ranging from 33.6% to 33.8%) can be considered high given the size of India’s population and its income inequality (Gini index: 33.9 in 2010 [31]).
In summary, we constructed valid asset-based wealth indices from six nationally representative surveys of households in India conducted between 1992–3 and 2007–8, and we present the regional and state-level distributions of these wealth scores for urban and rural areas separately. These scores can be used for analyses within the source surveys to understand differences within and across geographical levels, and for ecological analyses that combine the source surveys with other datasets. In addition to the wide variety of scenarios in which these indices can be currently applied, they are also based on data that could be collected relatively easily in future studies.
Author Contributions
Analyzed the data: DGB DJC. Wrote the paper: DGB MFG. Developed the research question and study methodology: DGB AJDB.
References
- 1. Wagstaff A (2002) Poverty and health sector inequalities. Bulletin of the World Health Organization 80: 97–105.
- 2. Victora CG, Wagstaff A, Schellenberg JA, Gwatkin D, Claeson M, et al. (2003) Applying an equity lens to child health and mortality: more of the same is not enough. Lancet 362: 233–241.
- 3. Boerma JT, Bryce J, Kinfu Y, Axelson H, Victora CG (2008) Mind the gap: equity and trends in coverage of maternal, newborn, and child health services in 54 Countdown countries. Lancet 371: 1259–1267.
- 4. Howe LD, Galobardes B, Matijasevich A, Gordon D, Johnston D, et al. (2012) Measuring socio-economic position for epidemiological studies in low- and middle-income countries: a methods of measurement in epidemiology paper. International Journal of Epidemiology 41: 871–886.
- 5. Gwatkin DR (2000) Health inequalities and the health of the poor: what do we know? What can we do? Bulletin of the World Health Organization 78: 3–18.
- 6.
McNeil M, Yazbeck AS, Gwatkin DR, Excobar ML, Coady DP, et al.. (2005) Putting knowledge to work for development. Washington DC: The World Bank.
- 7. Barros AJ, Ronsmans C, Axelson H, Loaiza E, Bertoldi AD, et al. (2012) Equity in maternal, newborn, and child health interventions in Countdown to 2015: a retrospective review of survey data from 54 countries. Lancet 379: 1225–1233.
- 8.
Gwatkin DR, Rutstein S, Johnson K, Suliman E, Wagstaff A, et al.. (2007) Socio-economic differences in health, nutrition, and population within developing countries: an overview. Washington, DC: The World Bank.
- 9. Houweling TA, Kunst AE (2010) Socio-economic inequalities in childhood mortality in low- and middle-income countries: a review of the international evidence. British Medical Bulletin 93: 7–26.
- 10.
Rutstein S, Johnson K (2004) DHS Comparative Reports No. 6: The DHS Wealth Index. Calverton, Maryland: ORC Macro.
- 11. Filmer D, Pritchett LH (2001) Estimating wealth effects without expenditure data–or tears: an application to educational enrollments in states of India. Demography 38: 115–132.
- 12. Vyas S, Kumaranayake L (2006) Constructing socio-economic status indices: how to use principal components analysis. Health Policy and Planning 21: 459–468.
- 13. Howe LD, Hargreaves JR, Huttly SR (2008) Issues in the construction of wealth indices for the measurement of socio-economic position in low-income countries. Emerging Themes in Epidemiology 5: 3.
- 14. Pande RP, Yazbeck AS (2003) What's in a country average? Wealth, gender, and regional inequalities in immunization in India. Social Science & Medicine 57: 2075–2088.
- 15. Barros AJ, Victora CG (2005) [A nationwide wealth score based on the 2000 Brazilian demographic census]. Revista de saude publica 39: 523–529.
- 16. Pathak PK, Singh A (2011) Trends in malnutrition among children in India: Growing inequalities across different economic groups. Social Science & Medicine 73: 576–585.
- 17.
International Institute for Population Sciences (IIPS) (1995) National Family Health Survey (MCH and Family Planning), India 1992–93. Mumbai: IIPS.
- 18.
International Institute for Population Sciences (IIPS), Macro International (2007) National Family Health Survey (NFHS-3), 2005–06: India: Volume II. Mumbai: IIPS.
- 19. Mohanty SK (2009) Alternative wealth indices and health estimates in India. Genus LXV: 25.
- 20.
International Institute for Population Sciences (IIPS), ORC Macro (2000) National Family Health Survey (NFHS-2), 1998–99: India. Mumbai: IIPS.
- 21.
International Institute for Population Sciences (IIPS) (2001) Reproductive and Child Health Project, Rapid Household Survey (Phase I & II) 1998–1999. Mumbai: IIPS.
- 22.
International Institute for Population Sciences (IIPS) (2006) District Level Household Survey (DLHS-2), 2002–04. Mumbai,: IIPS.
- 23.
International Institute for Population Sciences (IIPS) (2010) District Level Household and Facility Survey (DLHS-3), 2007–08: India. Mumbai: IIPS.
- 24. Subramanyam MA, Kawachi I, Berkman LF, Subramanian SV (2010) Socioeconomic inequalities in childhood undernutrition in India: analyzing trends between 1992 and 2005. PLoS One 5: e11392.
- 25.
StataCorp (2011) Stata Statistical Software Release 12.0. College Station, Texas.
- 26.
Dunteman GH (1985) Principal Component Analysis. Newbury Park, California: Sage University.
- 27. WHO Multicentre Growth Reference Study Group (2006) Assessment of differences in linear growth among populations in the WHO Multicentre Growth Reference Study. Acta Paediatrica 450: 56–65.
- 28.
International Institute for Population Sciences (IIPS), Macro International (2007) National Family Health Survey (NFHS-3), 2005–06: India: Volume I. Mumbai: IIPS.
- 29.
India Planning Commission (2002) National Human Development Report, India. New Delhi: Oxford University Press.
- 30. Bassani DG, Kumar R, Awasthi S, Morris SK, Paul VK, et al. (2010) Causes of neonatal and child mortality in India: a nationally representative mortality survey. Lancet 376: 1853–1860.
- 31.
World Bank (2013) World Development Indicators 2013. Washington DC: World Bank.