Local Distributions of Wealth to Describe Health Inequalities in India: A New Approach for Analyzing Nationally Representative Household Survey Data, 1992–2008

Background Worse health outcomes including higher morbidity and mortality are most often observed among the poorest fractions of a population. In this paper we present and validate national, regional and state-level distributions of national wealth index scores, for urban and rural populations, derived from household asset data collected in six survey rounds in India between 1992–3 and 2007–8. These new indices and their sub-national distributions allow for comparative analyses of a standardized measure of wealth across time and at various levels of population aggregation in India. Methods Indices were derived through principal components analysis (PCA) performed using standardized variables from a correlation matrix to minimize differences in variance. Valid and simple indices were constructed with the minimum number of assets needed to produce scores with enough variability to allow definition of unique decile cut-off points in each urban and rural area of all states. Results For all indices, the first PCA components explained between 36% and 43% of the variance in household assets. Using sub-national distributions of national wealth index scores, mean height-for-age z-scores increased from the poorest to the richest wealth quintiles for all surveys, and stunting prevalence was higher among the poorest and lower among the wealthiest. Urban and rural decile cut-off values for India, for the six regions and for the 24 major states revealed large variability in wealth by geographical area and level, and rural wealth score gaps exceeded those observed in urban areas. Conclusions The large variability in sub-national distributions of national wealth index scores indicates the importance of accounting for such variation when constructing wealth indices and deriving score distribution cut-off points. Such an approach allows for proper within-sample economic classification, resulting in scores that are valid indicators of wealth and correlate well with health outcomes, and enables wealth-related analyses at whichever geographical area and level may be most informative for policy-making processes.


Background
Worse health outcomes including higher morbidity and mortality are most often observed among the poorest fractions of the population [1]. This is in part due to lower health service use, more limited access to health interventions and poorer nutritional status [1,2], but health inequalities are a consequence of complex processes including multidimensional drivers reflecting differences in economic status and social characteristics such as gender and ethnicity. The growing need to better understand the influence of poverty on health has dramatically increased the interest in research [3,4] and also the programmatic attention on health inequalities in low-and middle-income countries [2,5,6,7].
The development of new methods for estimating household economic status has facilitated new research on the effects of wealth disparities on health [8,9]. Preferred measures of economic status require data on household income or consumption, but these indicators are hard to define in some settings, difficult to collect on a large scale and prone to misclassification [10]. Conversely, data on ownership of durable goods, housing characteristics and access to infrastructure are easier to measure and commonly available from household surveys, and these can be used compositely to classify households' relative wealth [11]. As such, asset-based wealth indices derived through principal components analysis are increasingly being used to characterize economic status in household survey analyses of health inequalities [12,13] In addition, such surveys are often repeated periodically in a given population, allowing indices to be updated as needed to ensure the most relevant assets are included.
While asset-based wealth indices are typically constructed at the national level, the use of national wealth score distributions for sub-national analyses is problematic [14,15,16]. For example, ignoring the wealth score distribution at the geographic level of interest (e.g. district, state, or region) may result in a large proportion of one population (e.g. state) being assigned to the top or bottom of the wealth distribution of another population (e.g. region), thereby hiding level-specific wealth gradients. The use of geographical-level wealth distributions allows one to correctly classify households according to the most appropriate wealth score distribution, enabling proper comparisons across different states, regions or countries and across different geographical levels.
In India, due to the large, socio-economically diverse population and the decentralized decision-making and policy-setting structures, the use of wealth distributions at multiple geographic levels is especially important for analyzing and addressing health inequalities. However, while national and sub-national wealth distributions in India have been devised and employed previously [17, 18,19], a comprehensive set of wealth distributions at multiple geographic levels in India has not been made available in the literature before now. In this paper we present national, regional and state-level distributions of national wealth index scores, for urban and rural populations separately, derived from household asset data collected in the three rounds of the Demographic and Health Survey, known as the National Family Health Survey (NFHS) in India [17,18,20], and in three rounds of the District Level Household Survey (DLHS) [21, 22,23]. The six surveys cover a period between 1992-3 and 2007-8 and allow for a standardized measure of wealth that can be used in survey-specific analyses as well as for comparisons across surveys/time-points. We validate our indices by analyzing height-for-age as one example of a health inequality which has previously been shown to have marked differences by wealth quintile [24]. Further, we illustrate the important misclassification of households that may result from subnational analyses that use national wealth distributions. We propose that the urban and rural wealth score decile cut-off values that we present for different geographical levels can be used to improve future analyses of health inequalities in India and ultimately inform the decentralized policy-making processes by which such inequalities can be effectively addressed.

Ethics statement
This secondary analysis of anonymized survey data available in the public domain did not require prior approval from an ethics review board. The original surveys received approval by the relevant ethics review boards.

Data
The National Family Health Survey (NFHS) is a large-scale, nationally representative survey of Indian households providing state-and national-level estimates of key demographic and health indicators. Three rounds of the survey have been conducted to date (NFHS-1 in 1992-3, NFHS-2 in 1998-9 and NFHS-3 in 2005-6), each using an equivalent multi-stage sampling approach and including more than 85,000 households, with an overall response rate above 98%. Sampling design, sample size and response rate details are published in the round-specific survey reports [17,18,20]. In addition to demographic and health information, the NFHS collects data on household socioeconomic characteristics, including ownership of various assets, housing

Wealth indices
We initially constructed separate indices for urban and rural setting in each survey, using different lists of assets. While there are fundamental differences in infrastructure and lifestyle between urban and rural areas, our comparison of the separate indices to a single national index revealed that the national index performed as well as the separate urban and rural indices in all states, with the advantage of being simpler to develop and implement in future research. However, because the assets on which data were collected through the surveys differed over time, a separate national index was constructed for each of the six surveys.
We derived our indices through principal components analysis (PCA) using Stata 12 [25]. PCA is a multivariate statistical technique for reducing a larger number of variables to a smaller number of dimensions [26]. PCA can summarize the variance of different types of variables with no specific distribution, generating a score that captures, in its first component, the greatest amount of data variability explained by one linear combination of variables. This approach is well-suited for handling the mixture of discrete and continuous data typically collected in household surveys [13]. The use of variables measured on different scales can result in different variances and this may produce quite different results in the PCA depending on whether one uses covariance or correlation Table 6. Assets selected to create a national wealth index from the DLHS-1 (1998-9) survey, with coding definitions.     matrices for the calculations. Large variances will dominate the first principal component if covariance matrices are used. For this reason, the PCA was performed using standardized variables from a correlation matrix, which minimizes the differences in variance.
To generate valid indices that were as simple as possible, each index was constructed with the minimum number of variables/ assets that would produce scores with enough variability to allow  us to define unique cut-off points for each urban and rural area of all states. The indices include 16 assets for NFHS-3 (2005-6), 14 assets for NFHS-2 (1998-9) and 11 assets for NFHS-1 (1992-3). The index for DLHS-3 (2007-8) includes 14 assets, the index for DLHS-2 (2002-4) includes 10 assets and the index for DLHS-1 (1998-9) includes 9 assets (Tables 1-6). Binary coding (i.e. yes/no) was applied to all but two assets; highest education level achieved by the household head was categorized as none/primary/secondary/ higher than secondary, while the number of bedrooms in the dwelling was categorized as one/two/three/four or more (thereby ensuring that at least 5% of households were included in the highest category).
An index coefficient c for each asset was calculated using the expression loading=s:d:x100 rounded to the nearest integer. The wealth scores for each household were then calculated using the expression P i c i v i where c i represents the index coefficient and v i the coded value of the ith asset.
From the resulting score assigned to each household, the national, regional and state score distributions were derived for each survey round, for urban and rural areas separately, and the score value for each stratum-specific decile was then identified. To account for the complex survey design, the sampling weights provided with the survey datasets were used for all analyses.

Results
Tables 1-6 give the indexed variables for each survey respectively, with their factor loadings, standard deviations and index coefficients. For the NFHS-3 (Table 1), the first component explained 38.5% of the data variability. The first component explained 43.1% of the variability in the NFHS-2 data (Table 2), and 40.0% of the variability in the NFHS-1 data (Table 3). For the DLHS surveys, the first component explained 39.8% of the data variability in DLHS-3 (Table 4), 36.0% of the variability in the DLHS-2 data (Table 5), and 37% of the variability in DLHS-1 ( Table 6).
Using the 2006 WHO growth standards [27] we analyzed the distribution of the mean height-for-age z-scores across wealth quintiles defined by local reference cut-off points. As expected, the mean height-for-age z-score increased from the poorest to the richest wealth quintiles, and similarly, prevalence of stunting was higher among the poorest and lower among the wealthiest. These trends were consistent for all three rounds of the NFHS. We calculated the Pearson correlation between the continuous wealth score and height-for-age z-score for all children under age 5.   obtained between height-for-age z-score and the originally constructed NFHS-3 wealth index based on the DHS methodology [28]. The Pearson correlations were 0.25 and 0.19 (p-value, 0.001) and the Spearman rank correlations were 0.28 and 0.21 (pvalue,0.001) in urban and rural areas, respectively. In Figures 1 and 2 we present state-level analyses for Kerala and Uttar Pradesh in 2005-6 showing mean height-for-age z-score ( Figure 1) and stunting prevalence (Figure 2) by wealth quintile, and comparing estimates for locally defined quintiles with estimates for the national quintiles originally defined in the NFHS-3. Kerala and Uttar Pradesh were chosen to represent the diverse levels of economic development and health indicators. Kerala is among the richest states in India and ranks highest in terms of conventional measures of health and economic development, while Uttar Pradesh is one of the poorest states and ranks among the lowest by infant mortality rate, literacy, and per capita income [18,29,30]. Based on the original NFHS-3 national quintiles, nearly 50% of children with survey height measurements in Kerala are classified in the richest quintile, whereas local cutoffs result in a much more even distribution of children across quintiles. The wealth gradient for child linear growth in Kerala appears steeper when the national quintiles are used compared to the locally defined quintiles. The strength of this relationship is likely overstated because, with fewer individuals classified in the poorest quintiles based on the national cut-offs, there is additional uncertainty in estimating the mean height-for-age in these groups. This exaggeration of the state-specific wealth gradient when using national quintiles is similarly shown in Uttar Pradesh, where only 10% of children were classified in the richest national quintile.
For analyzing health inequalities, the importance of using reference distributions from the most appropriate geographical level is further illustrated in Figure 3. We compare the wealth score distributions of a sub-sample of eight rural villages in Himachal Pradesh with the full rural distribution for Himachal Pradesh (top panel) and with the rural distribution for all of India (bottom panel). If the sub-sampled villages had a similar wealth distribution to that of the state, all bars in the upper histogram (representing each quintile) would include approximately 20% of the sub-sampled village households. However, the sub-sample distribution is in fact largely skewed towards the lowest statespecific wealth quintile. Alternately, when compared to the rural wealth distribution of the whole country the sub-sample distribution is skewed to the higher national quintiles.
The plotted distributions of household wealth scores by urban and rural areas for each survey round are given in Figures 4 and 5. For the most recent round of the NFHS, in 2005-6, score values for urban households across India ranged from 20 to 955, with a mean score of 547 (standard deviation of 230) and a median score of 552. In rural India, the mean score was 317 (standard deviation of 221) and the median score was 268.  Urban and rural decile cut-off values for India, for the six regions and for the 24 major states are presented by survey round in in Tables 7-12. These wealth score distributions reveal large variability between states, regions, and urban and rural areas. For the most recent NFHS round (2005-6), median scores in urban areas were highest in Delhi (665), followed by Goa (663), Uttaranchal (653), Himachal Pradesh (649), and Punjab (648), with all but Goa located in the North region. The North region's median score (646) is similar or higher than the seventh decile cutoff values of all other regions. The poorest urban areas were in the states of Tamil Nadu (382) and Andhra Pradesh (394), with median scores that are lower than the third decile cut-off values of 11 other states.
Rural wealth score gaps are even larger than those observed in urban areas. Median scores in rural areas for 2005-6 (NFHS) were highest in Delhi (608), followed by Punjab (597), Goa (595) and Kerala (588), with very low median scores in the Eastern states of Jharkhand (114), Orissa (114), Bihar (124) and the Central states of Madhya Pradesh (145) and Uttar Pradesh (144). The median scores of Jharkhand and Orissa are lower than the first decile cutoff values for 11 other states, and lower than the second decile cutoff values for 17 other states. The region with the richest rural areas is the North, with a median score of 496, followed from a considerable distance by the West region with a median score of 291.

Discussion
PCA has been previously evaluated and used for the development of wealth scores based on household asset data, including the NFHS itself [17, 18,20]. The present analysis adds a careful consideration of the sub-national variation in wealth and the differences in wealth index scores and components by rural and urban areas. As has been shown previously in Brazil [15], there is large variability in sub-national distributions of scores and there are many benefits in taking the variation into account. For example, it allows for within-sample economic classification, and for comparisons across geographical and urban/rural distributions.
Unlike the original PCA-based wealth score that is made available with the NFHS datasets, which has a common number of items and scores for a national distribution, the indices were constructed so as to allow for the identification of regional and state-level decile cut-off points for urban and rural households separately. This enables the scores to be used for comparisons at different levels of aggregation, and the importance of local distribution cut-off points is illustrated by the state-level examples in Figures 1 and 2. In addition, changes in the index components and in their coefficients over time -from 1992-3 (NFHS-1) to 2005-06 (NFHS-3) for example -illustrate the need to revise indices periodically. The items that we included in the national wealth indices are relatively simple to measure in population surveys, and are limited 15 or fewer assets, thereby limiting the time needed to collect wealth data during a household interview. In addition, future analyses of the NFHS datasets can take direct advantage of the wealth indices and the sub-national score distributions presented here. Other variables that were available in the NFHS surveys were not included because they did not contribute importantly to the score and/or were not required to improve the distribution. Radio and bed are two examples of items that had a lower loading (less than 0.2) and were kept in the calculations to improve the distribution of the score by avoiding accumulation of households in a specific decile (a function of having too many households reporting ownership of a very limited number of items, especially in rural areas). The resulting scores are valid indicators of wealth that correlate well with health outcomes, as seen by the variation in the mean height-for-age scores ( Figure 1) and in stunting prevalence across the wealth score quintiles (Figure 2). The proportion of the total variability explained by the first component of the urban (ranging from 39.5% and 41.4%) and rural scores (ranging from 33.6% to 33.8%) can be considered high given the size of India's population and its income inequality (Gini index: 33.9 in 2010 [31]).
In summary, we constructed valid asset-based wealth indices from six nationally representative surveys of households in India conducted between 1992-3 and 2007-8, and we present the regional and state-level distributions of these wealth scores for urban and rural areas separately. These scores can be used for analyses within the source surveys to understand differences within and across geographical levels, and for ecological analyses that combine the source surveys with other datasets. In addition to the wide variety of scenarios in which these indices can be currently applied, they are also based on data that could be collected relatively easily in future studies.