Multidimensional Profiles of Health Status: An Application of the Grade of Membership Model to the World Health Survey

Background The World Health Organization (WHO) conducted the World Health Survey (WHS) between 2002 and 2004 in 70 countries to provide cross-population comparable data on health, health-related outcomes and risk factors. The aim of this study was to apply Grade of Membership (GoM) modelling as a means to condense extensive health information from the WHS into a set of easily understandable health profiles and to assign the degree to which an individual belongs to each profile. Principal Findings This paper described the application of the GoM models to summarize population health status using World Health Survey data. Grade of Membership analysis is a flexible, non-parametric, multivariate method, used to calculate health profiles from WHS self-reported health state and health conditions. The WHS dataset was divided into four country economic categories based on the World Bank economic groupings (high, upper-middle, lower-middle and low income economies) for separate GoM analysis. Three main health profiles were produced for each of the four areas: I. Robust; II. Intermediate; III. Frail; moreover population health, wealth and inequalities are defined for countries in each economic area as a means to put the health results into perspective. Conclusions These analyses have provided a robust method to better understand health profiles and the components which can help to identify healthy and non-healthy individuals. The obtained profiles have described concrete levels of health and have clearly delineated characteristics of healthy and non-healthy respondents. The GoM results provided both a useable way of summarising complex individual health information and a selection of intermediate determinants which can be targeted for interventions to improve health. As populations' age, and with limited budgets for additional costs for health care and social services, applying the GoM methods may assist with identifying higher risk profiles for decision-making and resource allocations.


Introduction
Currently, the concept of health as defined by the World Health Organization (WHO) is ''a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity'' [1]. Taking this perspective, one moves beyond disease absence as defining health status to one that incorporates the complex perceptions about health and health conditions. Measuring the multidimensional character of perceived health status, and then using the results meaningfully, remains a challenge for policy and research purposes. Many of the available analytical techniques used to reduce variables, make assumptions about distributions or use summary variables in the calculations. The alternative technique, Grade of Membership (GoM) model, is presented in this paper.
GoM is a non-parametric method that identifies latent health profiles and the degree to which an individual fits these profiles. The GoM method has been applied in previous studies for depressive symptoms and personality disorders [2,3,4], older adult health status [5,6,7,8,9] and genetic health studies [10,11]. A method that helps to define and predict transitions from robust health to frailty or the reverse, as well as identify pre-disability states would be helpful in planning for an ageing population [12,13,14,15].
The WHO's World Health Survey (WHS) gather data to quantify population health status in 70 countries based on WHO's definition of health. The main aim of this study was to summarize, using the Grade of Membership model, the full set of health and health-related variables included in the WHS into a smaller set of meaningful health profiles [16]. In order to make these derived health profiles useful in helping to inform health policy, WHS data have been grouped in four economic areas according to the World Bank economic categories [17].
This paper is organized into three sections. First, a description of the data set is provided, which includes details of the survey design, socio-demographic characteristics of the sample and health data. Then the GoM procedure and results of the GoM analysis are described for each economic category. The final section summarizes the results.

Data
The WHS was conducted between 2002 and 2004 in 70 countries to establish levels of health and to develop methods to improve data comparability within and across countries [18,19]. The principal aim of the WHS was to provide valid, reliable and comparable information about population health status. It used a common survey instrument in nationally representative populations for assessing, amongst other issues, the health of individuals in eight of the 22 explicit health domains, health system responsiveness and household health care expenditures.
A probability sampling design was employed in each country using multi-stage, stratified, random cluster samples. The population included all selected persons aged 18 years and older who lived in selected households. Most of the countries had nationally representative survey samples and each country decided which interview method to use: face-to-face interview, computerassisted telephone interview (CATI) and/or computer-assisted personal interview (CAPI).
The WHS utilized two types of questionnaires: the Household Questionnaire (to describe the health, economic physical characteristics at the household level) and the Individual questionnaire (to describe the individual health status and well-being characteris-tics). In order to construct the final dataset, data was extracted from the WHS Individual Questionnaire. The overall dataset was then divided into four economic areas for the analyses based on the World Bank categories: high income, upper middle income, lower middle income and low income ( Table 1).
The World Bank's uses gross national income (GNI) per capita as its main criterion for classifying economies. Based on its 2006 GNI per capita, every country's economy was classified as low income, middle income (subdivided into lower middle and upper middle), or high income. The four groups are defined as: low income, $905 or less; lower middle income, $906-$3595; upper middle income, $3596-$11,115; and high income, $11,116 or more.

Physical and mental health
The dataset included self-reported diagnosis of three physical and one mental health condition (arthritis, angina pectoris, asthma and depression), self-reported difficulties in functioning in eight health domains (mobility, self-care, pain and discomfort, cognition, interpersonal relationships, vision, sleep and energy, and affect) plus one self-reported overall health question. Presence or absence of a diagnosed condition was based on self-report.

Grade of Membership method
Grade of Membership (GoM) model [20] is a flexible, nonparametric, multivariate method, designed to identify health profiles. In our work we used the self-reported health state and health conditions in order to determine the latent profiles (pure types) of health and the degree to which individuals correspond to the identified profiles (grade of membership). Briefly, as outlined in Manton et al [20], the GoM model assumes there are K fuzzy states (pure types) to be defined. The study population consists of I individuals with J categorical variables, where the jth variable has L j response levels. Each L j response is encoded as a binary variable x ijl , so that if x ijl = 1 then the ith individual has the lth response to the jth variable. A first coefficient, l kjl , concerns the likelihood of a response l to the jth question by an individual belonging to the kth health pure type; the second entity, g ik , represents weights quantifying the grade of similarity of the health features of the ith individual with the characteristics of each K pure types, with the following constraints: 0#l kjl #1, X l kjl~1 , 0#g ik #1 and X k g ik~1 .
By summing over all potential GoM health pure types: we obtain the probability that the ith individual responds l to question j. Assuming independence of individual observations, the likelihood function for the GoM model is: We used the DSIGoM software [21] to estimate the GoM parameters. In particular the modified Newton-Raphson algorithm was employed, where the coefficients g ik and l kjl are estimated simultaneously to maximize the likelihood function L. The parameters are estimated iteratively: the L function is maximized first with l kjl fixed, producing an initial estimate of all g ik ; then using the obtained g ik the L function is maximized to update the l kjl , This process is repeated until convergence, where the parameters are such that within-group homogeneity is maximized and between-group homogeneity is minimized [20].
The optimal number of profiles is established by performing a likelihood ratio test on the change in explanatory power between K and K+1 model. This ratio is x 2 distributed, with degrees of freedom equal to the difference in the number of parameters to be estimated between models [10].

Grade of Membership application
Prior to analyzing data, it was necessary to define the external and the internal variables [10]. External variables do not affect the definitions of the pure types and included five socio-demographic variables (age, sex, marital status, education and employment); however, the association between the pure types and the external variables provides valuable information about the relationship between empirically-derived pure types and demographic characteristics. Turkey was not included in the dataset because the external variable marital status was not available.
The continuous age variable was recoded into three categories: younger adult (18-29 years); adult (30-59 years), and older adult (60 years and older). Marital status was recoded into four categories: never married, currently married/cohabiting, separated/divorced and widowed. Sex, education levels (highest level completed) and sector of current employment (governmental, nongovernmental, self-employed, employer, homemaker, unem-ployed, student, retired and other) were the remaining external variables.
Internal variables included self-reported overall health (based on a five point scale: very good, good, moderate, bad or very bad), scores from the eight health domains (none, mild, moderate, severe or extreme/cannot do), and the set of four reported conditions (yes, no).
For each of the four country categories, the GoM analysis was applied with 2, 3 and 4 pure types to test for the optimal number of pure types. The GoM parameters estimation was derived using the DSIGoM software. Log likelihood ratio test indicated that three pure types provided the best description of the structure of the variables included in this analysis for each economic area. Each pure type was described by the values obtained for the l coefficients. In general, l kjl .0.50 was considered to be characteristic of a pure type being endorsed by more than 50% of individuals in that pure type. The lambda coefficients were produced for each of the external and internal variables. Additionally, the distribution of respondents' GoM scores (g ik ) was generated for each pure type and country category. The crude prevalence estimates refer to the sum of individual membership in the kth pure type, divided by the total number of respondents, where N is the total number of respondents, and g ik is the GoM coefficient for the ith individual's degree of membership in the kth pure type [4]. In order to compare the prevalence rates across the four groups, age-standardized prevalence estimates were calculated. For each pure type and economic category, age-specific (younger adult 18-29, adult 30-59, and older adult 60+) prevalence ratios were computed. To calculate adjusted age-specific prevalence rates we used the direct standardization method with the WHO world standard population table [22].

Socio-demographic characteristics
The final dataset contained 217,472 respondents from 69 countries. These countries were grouped into the four World Bank income categories for analysis. The socio-demographic characteristics of the respondents are provided in Table 2.
Three country categories (high, upper-middle and lower-middle income) had more women than the other group (low income), although the subdivision between males and females is almost symmetric for every region. More young respondents (aged 20-29 years) were noted in the upper-middle, lower-middle and low income categories. The higher income categories had more old respondents than the other groups. The low income category had a higher percentage (8.1%) of the youngest respondents (aged 18-19 years), while the high income category had the highest percentage of older adults with 14.2% of individuals aged 70+ years. Respondents in the low income category were more likely to be currently married (70%) and had the highest levels of respondents with no formal education (40.1%). The high income category had the highest education rate. Current employment sector/issues differed by category: the two highest income categories (high and upper-middle income) had more respondents employed in the private sector (39.9% and 23.6% respectively). The lower-middle income category had more homemakers and self-employed (20.8% and 19.5%) while low income countries had higher levels of self-employed (45.8%). The category with the most retired respondents was the high income (13.8%). An association between all socio-demographic characteristics and the four economic country categories was found (p,0.0001).

Physical and mental health data
The internal variables in the GoM analysis included selfreported physical and mental health data. Descriptive statistics of these variables for the four economic categories were provided in Table 3.
The majority of the respondents reported good or very good health status for the self-reported overall general health question, with the lower-middle income category having the highest percentage of respondents reporting a bad or very bad health status. In all the eight health domains (mobility, self-care, pain and discomfort, cognition, interpersonal relationships, vision, sleep and energy, and affect), the prevalence rates followed a positive trend. The majority of the respondents (more than 50%) reported no difficulties on any of the physical or mental health issues, with the exception of the domain ''pain and discomfort'', where prevalence rates ranged from 44.6% of lower-middle income category to 52.1% of high income group. Finally, over 85% of respondents reported no diagnosed health conditions (arthritis, angina pectoris, asthma and depression). Among these conditions, it was noted that  GoM parameters and pure type estimation Table 4 provides a summary of the pure types/health profiles by World Bank category. The components of pure types I (ROBUST) and II (INTERMEDIATE) are very similar across all the categories. Moving from type I to type II resulted in increasing difficulty in some health domains, with respondents more likely reporting ''moderate difficulty'' (INTERMEDIATE) instead of ''no difficulty'' (ROBUST) for the given health domain. The third health profile, FRAIL, was again a distinctly lower level of health based on difficulties with the health domains and presence of one or more of the health conditions.
Similarly, the lambda probability variables for each of the external variables by country category show discernable patterns for each of the health profiles (Table 5).
High income economies Table 6 shows the distribution of individual GoM coefficients (g ik ) for the 26,358 high income respondents. Sixty-four percent Table 4. General characteristics of the internal variables by pure type and World Bank economic category (listing the predominant Lambda probability l kjl by variable (for more details see the appendix)).   Table 7 shows the exact breakdown of the lambda probability values for each pure type. Respondents belonging to pure type I were equally distributed between men and women (lambda equal to 49.8% and 50.2%, respectively), mainly adults (62.6%), married or cohabiting (63.4%), intermediate or higher educated persons (30.1% and 31.1%, respectively), and not government employed (37.8%). They reported good health status (55%), had no difficulties with physical and mental activities (100%) and none of the four health conditions (lambda equal to 100% and 94.9%).
Individuals in pure type II differed from those in pure type I in that they were mainly female (64.1%) and had some difficulty with physical and mental activities, especially moving around (53%), pain and discomfort (100%), concentration (100%), sleeping (81.3%), feeling sad or depressed (100%).
Finally, respondents in pure type III were mainly female (74.7%), old (68.9%), married or cohabiting (50.2%), less educated (29.2%), and retired (52.9%). They reported moderate health status (68.4%) and had more difficulty with physical and mental activities, especially moving around (65.2%), pain and discomfort (65.3%), concentration (75.1%), sleeping (53.7%), feeling sad or depressed (66.8%). Moreover they reported having arthritis (61.5%) and depression (60.3%). Table 8 includes the g ik coefficients for the 51,090 respondents in the upper-middle income category. Almost 69 percent (N = 35,209) had a high grade of membership (g ik .0.50) for pure type I (ROBUST). Over 27 percent of respondents (N = 13,940) belonged exclusively to a single pure type. Most respondents from this region (62.3%) belong to the ROBUST health profile. Table 9 shows the lambda coefficient distributions of external and internal variables for each pure type. Respondents belonging to the pure type I were male (lambda equal to 63.8%), mainly adults (61.2%), married or cohabiting (65.5%), intermediate education levels (41.1%) and were not government employed (33.5%). They reported good health status (67.7%), had no difficulties with physical and mental activities (100%) and did not report any of the four health conditions (lambda equal to 100% and 98.2%).
Respondents in pure type III were mainly female (76.3%), old (55.3%), married or cohabiting (47.9%), intermediate or lower education levels (24.2% and 25.3%, respectively), and retired (46.5%). They reported moderate health status (62.7%) and had more difficulty with physical and mental activities, especially moving around (65.3%), pain and discomfort (66.6%), concentra-     Table 10 presents the distribution of g ik scores for the 58,799 respondents from the lower-middle income category. Almost 64 percent of respondents (N = 37,537) had a high grade of membership (g ik .0.50) for pure type I. Twenty-two percent (N = 13,019) were exclusively in one pure type and 58.2 percent were in the ROBUST profile. Table 11 shows the exact breakdown of the lambda probability values for the external and internal variables in each pure type. Respondents that belonged to the pure type I were male (lambda equal to 53.7%), mainly adults (56.4%), married or cohabiting (64.2%), lower education levels (26.3%), and not government employed (27.1%). They reported good health status (61.7%), no difficulties with physical or mental activities (lambda equal to 100% and 98.2%) and did not report any of the four health conditions (lambda equal to 100% and 98%).
Finally, respondents in pure type III were mainly female (70.3%), adults (47.6%), married or cohabiting (58.5%), not educated (34.4%), and homemakers (28.6%). They reported moderate health status (56.6%) and had more difficulty with physical and mental activities, especially moving around (70.1%), pain and discomfort (62%), concentration (68%), personal relationships (43.9%), seeing and recognizing persons (42.6%), sleeping (63.8%), feeling sad or depressed (63%); but did not report any of the four conditions. Table 12 shows the distribution of g ik scores for the 81,225 low income category respondents. Sixty-five percent (N = 53,151) of respondents had a high grade of membership (g ik .0.50) in pure type I (ROBUST) and over 25 percent (N = 20,667) of respondents were exclusively in one pure type. Overall, 58 percent of low  income respondents belonged to the ROBUST profile (pure type I). Table 13 shows the lambda coefficient distributions of external and internal variables for each pure type. Respondents belonging to pure type I were male (lambda equal to 53.4%), young or adults (lambda equal to 47.8% and 50%, respectively), married or cohabiting (70.7%), not educated (30.2%), and self-employed (52.1%). They reported good health status (52.1%), no difficulties with physical and mental activities (100%) and none of the four health conditions (100%).
Age-standardized prevalence ratios of pure type I by economic category indicate similarity between the high and upper-middle income countries (both over 62%) and the lower-middle and low income countries (both less than 59%). Likewise, the two higher income categories had less than 16% membership in the FRAIL pure type whilst the lower-middle and low income countries had higher rates (21.8% and 19.2%, respectively).   Table 9. cont.   Table 11. cont.

Discussion
This paper described the application of the Grade of Membership models to summarize population health status using World Health Survey data. The GoM model provided a meaningful method to reduce and summarize health variables from health surveys.
A number of techniques have previously been applied to WHS data to summarize and report on health status [18]. Comparing health results using Ustun's method to the GoM results indicated good face validity, with similar response patterns. Establishing comparable levels of health for different populations is extremely useful, but in addition to this, GoM provides a discrete set of profiles which are possibly easier to interpret and use for decisionmaking. The three health profiles for higher to lower income countries are digestible and realistic groupings of functioning and well-being. If universal health coverage were to be rolled out or expanded for the older population in a country, a policy maker might choose a stepped strategy starting with characteristics common in the frail profile. For example, this might include improving identification and treatments for selected comorbidities like depression and arthritis, along with items that have potential to improve functioning (like addressing pain or sleep problems) to allow for ageing (well) in place.
The GoM procedure differs from other classification methods, like Factor Analysis, which use indicators to calculate latent continuous variables that represent one-dimensional constructs. Factor Analysis results derive the parameter values from normally distributed data, whereas, the GoM model is a non-parametric method where identification of parameters does not rely on any distributional assumptions. Estimation of factor scores in Factor Analysis supports on distributional assumptions relating to the factor loadings [7]. GoM parameters are estimated in an iterative method: firstly, the likelihood function is maximized with l kjl fixed, giving a first estimate of all g ik , then, fixing g ik , the likelihood is maximized to update the l kjl , which is repeated until convergence.
Grade of Membership modelling shares similarities with other data reduction methods, such as Factor Analysis, Principal Component Analysis and Cluster Analysis. However, in the GoM model, all parameters are simultaneously identified, while, individual parameters in Factor Analysis and Principal Components methods are usually calculated using summary variables derived from within the dataset [7].  Additionally, in contrast to the Factor Analysis and Principal Component methods, GoM is a classification methodology where respondents are allocated to discrete and meaningful groups based on their grade of membership profile. Unlike other classification methodologies (such as Cluster Analysis), GoM does not generate groups of similar entities but considers individual heterogeneity [7]. GoM was, therefore, well suited for the planned analysis.
Grade of Membership analysis has been previously used to summarize health data from surveys for depressive symptoms and personality disorders, older people health status and genetic studies of health. Woodbury et al. [11] employed GoM analysis in a clinical setting to determine if the DSM-III-R personality disorder diagnostic criteria cluster into recognizable disorders. Four pure types provided the most satisfactory solution to the data. Portrait et al. [7] analyzed Longitudinal Aging Study Amsterdam data and identified six profiles to characterize health. Finally, Manton et al. [10] identified five profiles within the 1999 National Long Term Care Survey data, a national longitudinal survey based upon a list sample of US Medicare enrollers aged 65 years and above, which was used to demonstrate the compression of morbidity in the United States.
In this study, the GOM model produced three pure types (health profiles) for each economic category. Each health profile described unique facets of physical and mental health (internal GoM model variables) plus differences in socio-demographic characteristics (external GoM model variables) with a clear economic gradient (lower education and employment sector) when moving from high to low economic categories within each profile. Type I (ROBUST) and Type II (INTERMEDIATE) health profiles were more similar in both external and internal variables, as well as World Bank economic category, when compared to the Type III (FRAIL) profile. N The frail profile clearly differed by external variables (older, more widowed, more retired or homemakers), internal variables (more difficulties with more of the eight health domains plus more likely to have a diagnosis of at least one of the health conditions) and economic category (the two lower economic categories had significantly higher rates of membership in the frail profile).
All four economic categories had somewhat similar robust and intermediate health profiles (pure types I and II). The two higher economic categories had more respondents in the robust pure type (greater than 64%) than the lower economic categories (less than 59%). Likewise, the two lower economic categories had more respondents in the frail pure types, (21.8% and 19.2%, respectively), with similar rates of membership in the intermediate profile across all four economic categories. The frail profile types may provide a logical focus for attention at all levels of country wealth, with policies targeted, for example, at older widowed women with mobility, sleep and cognition problems.
These analyses have provided a robust method to better understand health status and the components which can help to identify healthy and non-healthy individuals. Three profiles, robust, intermediate and frail, were obtained for respondents in each of the four economic categories. These profiles have described concrete  levels of health as well as clearly delineating characteristics of healthy and non-healthy respondents. Areas for specific consideration include difficulties with sleep, mobility and depression, largely regardless of presence of specific health conditions or country of residence. The GoM results provided both a useable summary health measure and a selection of intermediate determinants which can be targeted for interventions to improve health. With limited health budgets, these results can help to make decisions about where health gains can be achieved. GoM would help to define specific characteristics within groups of individuals that can be targeted by health promotion efforts. As an example, specific health policy targets could address unmet need in a subpopulation that encompass components of the frail profile. This could include a public health education campaign for health care professionals to look more closely at older married women reporting moderate health and problems with pain, sleep. It's more likely that they would also have comorbidities, such as arthritis and depression, to treat and may be undertreated. Treatment of these types of individuals could be part of a more comprehensive package to address well-being at older ages.
In future, we plan to investigate the transitions between health profiles, both improving and declining health, as well as the impact of the health-wealth relationship on shifts between profiles. We will additionally, look at the use of frailty definitions and profiles across different settings and the impact on disability assessments. These will provide the basis to inform policy about aging populations and measures to redress the determinants of more vulnerable health profiles. With a view to make results more cross-nationally comparable, vignette adjustments would improve ability to differentiate and correct for any reporting bias across countries and categories. This adjustment would also likely show more dramatic differences in health for respondents in lower income countries.