Applying Multivariate Clustering Techniques to Health Data: The 4 Types of Healthcare Utilization in the Paris Metropolitan Area

Background Cost containment policies and the need to satisfy patients’ health needs and care expectations provide major challenges to healthcare systems. Identification of homogeneous groups in terms of healthcare utilisation could lead to a better understanding of how to adjust healthcare provision to society and patient needs. Methods This study used data from the third wave of the SIRS cohort study, a representative, population-based, socio-epidemiological study set up in 2005 in the Paris metropolitan area, France. The data were analysed using a cross-sectional design. In 2010, 3000 individuals were interviewed in their homes. Non-conventional multivariate clustering techniques were used to determine homogeneous user groups in data. Multinomial models assessed a wide range of potential associations between user characteristics and their pattern of healthcare utilisation. Results We identified four distinct patterns of healthcare use. Patterns of consumption and the socio-demographic characteristics of users differed qualitatively and quantitatively between these four profiles. Extensive and intensive use by older, wealthier and unhealthier people contrasted with narrow and parsimonious use by younger, socially deprived people and immigrants. Rare, intermittent use by young healthy men contrasted with regular targeted use by healthy and wealthy women. Conclusion The use of an original technique of massive multivariate analysis allowed us to characterise different types of healthcare users, both in terms of resource utilisation and socio-demographic variables. This method would merit replication in different populations and healthcare systems.


Introduction
In the European context of cost-containment policies and the post-2008 economic and financial crisis [1], cost optimisation and, in some countries, cost reduction of public expenditure has become unavoidable and the healthcare system is no exception. For this reason, the healthcare system may need to be adapted to costcontainment goals while at the same time meeting patients' needs and expectations as closely as possible. This requires, among other issues, accurate characterisation of healthcare resource utilisation by the user population, as well as identification of determinants of use.
Many studies have previously addressed the use of the healthcare system (either individual services, or globally) by the general population or by specific population subgroups. For example, several studies have examined healthcare system utilisation from a systemic point of view or from a decision-making approach [2][3][4], or by subgroups of the population, such as cancer survivors [5], migrants [6,7], or the underserved and low-income people [8][9][10]. In addition, determinants of utilisation of specific healthcare services have been investigated, including mental healthcare services [11], emergency care units [12], primary care resources [13], dental care [14] and specialist consultations [15]. Associations between health insurance and healthcare research have also been regularly documented [16].
It has been suggested that healthcare systems themselves could not be analysed through a classical reductionist approach but should be considered as complex systems [17] which require analysis with non-conventional techniques. In particular, it could be interesting to identify distinct groups of patients which would exhibit different homogeneous patterns of resource utilisation. If such groups can be identified, then factors associated with each utilisation profile can be examined using conventional approaches [18][19][20].
Identifying such utilisation patterns requires the use of particular multivariate techniques, which are capable of taking into account a vast amount and variety of variables simultaneously, documented from the largest population possible. These techniques, particularly clustering techniques, have been applied and validated in a wide range of areas of medicine, including genetics [21][22][23], imaging [24][25][26], clinical medicine [27,28] and public health [29].
In this study, we aimed to identify and characterise distinct profiles of users of the French healthcare system in an urban environment, through analysis of data from a representative, population-based study in the Paris metropolitan area, using clustering techniques.

Methods
This work is based on the SIRS cohort study that received legal authorization from two French national authorities for non-biomedical research: the Comité consultatif sur le traitement de l'information en matière de recherche dans le domaine de la santé (CCTIRS) and the Commission nationale de l'informatique et des libertés (CNIL) [30]. The participants provide their verbal informed consent. Written consent was not necessary because this survey did not fall into the category of biomedical research (as defined by French law).
This study represents a cross-sectional analysis of data collected in the SIRS cohort study in 2010 among a representative sample of 3,000 French-speaking adults in the Paris metropolitan area (Paris and its suburbs, a region with a population of 6.5 million).

The SIRS cohort
The SIRS cohort was constituted in 2005 using a 3-level random sampling method. In a first step, 50 census blocks (with about 2000 inhabitants each) were randomly selected using a stratification based on socioeconomic status and whether they qualified or not for ''underprivileged urban area'' according to the central government list. In the next step, 60 households were randomly chosen from a complete list of households within each selected census block. In the final step, one adult was randomly selected from each household by the birthday method. The refusal rate among the newly contacted people was 29%. The methodology of the SIRS study and detailed characteristics of the study population have been described previously elsewhere, for example in [31].

Characterisation of healthcare utilisation
A comprehensive, detailed profile of the French healthcare system is provided in reference [32]. Interviewees were asked in detail about their own use of healthcare services during the twelve months preceding the interview. All responses were codes as categorical variables and all reference periods were the last twelve months. Resource use was grouped into categories as detailed below. Unless otherwise specified, all consultation frequencies fell into one of five categories (none, only once, only twice, 3-5 times, or $6 times).
Primary care: French people seeking healthcare may consult a general practitioner (GP), whether as an end in itself or as an entry point to specialists (the French system has adopted this gate-keeping model since 2004 [32]). Patients need to respect this procedure so that they can be reimbursed. Exceptions are made for four kinds of specialists who can be consulted directly, namely gynaecologists (who are mainly community-based in France), ophthalmologists, paediatricians and psychiatrists; these four specialities will henceforward be referred to as direct access specialists (DAS). We used six variables to characterise primary care utilisation, namely date of the latest dental consultation (4 categories: less than 2 years, between 2 and 3 years, more than 3 years, never), having declared a referring GP (yes/no), frequency of GP consultation, frequency of DAS consultation, undergoing a medical check-up in a dedicated Social security centre (yes/no), frequency of requests for medical advice from friends and relatives (4 categories: none, 1 or 2, 3 to 10, more than 10 times).
Indirect access to a specialist (IAS): IAS concerns all other specialists except DAS. The patient may access to them only when referred by their GP (or from their own initiative but at full cost). A single variable was documented, the frequency of IAS consultations.
Paramedical or alternative care: two variables were considered: having consulted an acupuncturist or an osteopath (yes/no) and having consulted for non-conventional or alternative healthcare (yes/no). Traditional Chinese medicine fell into the latter category.
Site of healthcare consumption: in France, healthcare can be delivered in three principal settings: public hospitals or clinics, private hospitals or clinics, and community settings. Since the place of consultation was systematically documented for each medical consultation over the previous twelve months, six distinct variables were considered: having consulted (at least once) a GP in a public hospital or clinic, a private hospital or clinic, or in a community setting, and having consulted (at least once) a specialist in a public hospital or clinic, a private hospital or clinic, or in a community setting.
Emergency care: two variables documented healthcare utilisation in emergency situations, depending on the place where care was delivered; these were the frequency of home visits for emergency reasons and the frequency of consultations in an emergency unit.

Population characteristics and factors associated with healthcare utilisation
Five dimensions were explored as possibly associated with healthcare utilisation, all of them made up of three items (except for the socioeconomic status, with four items).
Socioeconomic status: education level (none or primary/secondary/tertiary), employment status (employed, unemployed, inactive or retired), monthly household income per consumption unit (in quintiles, and computed as the total household income divided by the number of consumption units [adult: 1; child $14 years: 0.5; child ,14 years: 0.3]), according to the usual OECD-modified scale recommended by Eurostat, and health insurance status (full coverage by the statutory health insurance -SHI, SHI plus a voluntary health insurance -VHI, full coverage by a special insurance for the poor, partial coverage by the SHI only, and no insurance at all).
Stance regarding health and medicine: general attitude toward medical consultation (people were asked if they generally consult a doctor as a last resort, or as soon as they are not feeling well), having a relative or a friend suffering from a severe condition, and having medical professionals among relatives.
Social integration: feeling of isolation (very isolated, rather isolated, rather supported, very supported), level of social support (low, medium, high) and frequency of social contacts (quartiles), both as described in [34].
Perceived health: as measured by the Minimum European Health Module [35,36] that assembles the global perceived health status (good, average, bad), the global activity limitation indicator (presence of a long-standing activity limitation in the previous six months), and the presence of a chronic or long standing health problems over the twelve months.
These dimensions are similar to those identified by Anderson in the late 60's [37,38]. In his Behavioral Model of Health Service Use (BMHSU), this author distinguished between three classes of factors: predisposing factors (such as age, gender, education, occupation, social relationships, attitudes and knowledge related to health services and professionals), enabling factors (such as income or health insurance), and need factors (such as perceived health status or functional disability). According to this model, Andersen suggested that the respective roles of these factors may provide clues for measuring equity in service use. For example, if the main drivers of health service use are need factors, access can be considered equitable. Conversely, if the main drivers are constituted of social factors, beliefs and enabling factors, access can be considered as not equitable.

Clustering methods
The use of clustering techniques available for health scientists has been described previously [39][40][41]. Clustering techniques require the determination of some data-specific parameters, such as the number of groups to be retrieved. In order to identify the different types of healthcare system utilisation, we used the partitioning around medoïds (PAM) algorithm with the Euclidean distance as a reference analysis and applied it to the healthcare system utilisation variables [40]. A resampling-based scheme and cluster-robustness approach [42] was used to determine the key parameters of the algorithm and in particular the number of clusters. Other clustering methods or sets of parameters were further used for sensitivity analyses. The PAM algorithm was run with alternative distance or similarity measure (Manhattan distance and Gower measure [40]). A fuzzy-logic version of PAM, the FANNY algorithm [40], was also applied to the data, with several values for the fuzziness parameter. All analysis was conducted on R 2.13.1 (R Foundation for Statistical Computing, 2012), with the clusterCons package.

Statistical analyses
We accounted for the three-level sampling design of the SIRS cohort by using the survey command options of the STATA IC 10 software (STATA Corp, 2007) for descriptive statistics and multinomial models. Classical ratio tests (chi-square or exact Fisher) were used to compare population characteristics according to their type of care utilisation.
We used multinomial regression models to investigate significant associations between variables of each of the five dimensions studied, and the types of healthcare utilisation as categorical outcomes. Adjusted odd ratios (OR) are reported with a p value for linear trend. Statistical significance was assessed at a bilateral p value,0.05.

Cluster identification
The optimal number of individual clusters that would account for the data was four. This was the value at which mean cluster robustness was maximal and the range of robustness values narrowest ( Fig. 1). A cluster robustness of unity indicates that no member of a given cluster was likely to be assigned to another cluster when the algorithm was reiterated and also directly reflects the stability of the cluster. In our analysis, mean robustness for the four-cluster model was high .0.99, indicating that very few individuals could not be assigned unequivocally to a given cluster.

Sensitivity analyses
No qualitative differences in cluster distribution or robustness were identified by reiterating the algorithm with alternative distance or similarity measures. Qualitatitively identical findings were obtained in all models (results not shown).

The four types of healthcare utilisation
The four clusters identified in our data were associated with four distinct types of healthcare utilisation, accounting for 30.0% (Type 1), 21.0% (Type 2), 25.7% (Type 3) and 23.3% (Type 4) of the study population. Table 1 shows the contribution of each variable of healthcare utilisation to each Type.
Type 1 represents the largest users of primary care. These individuals used all available resources, including GPs, social security centres for check-ups and medical advice from relatives, and used these resources extensively. For example, 71.6% had consulted their GP three times or more in the past twelve months and also consulted DAS extensively (50.7% with at least one visit). They were also the most frequent users of IAS, with 100% having consulted at least one IAS in the last 12 months and 69.5% consulted more than two times. They were also the largest users of paramedical or alternative care. Type 1 individuals consulted in all settings (principally in public hospitals for specialists and in community care for GPs) and were the largest users of the private sector. Finally, they were also the principal users of emergency care, both in the home (12.9%) and in emergency units (22.5%).
Type 2 was the mirror image of Type 1. Together with Type 3, individuals in Type 2 were the least frequent users of primary care. In Type 2, 28.5% of individuals had no referring GP and only 12.3% had consulted a DAS (furthermore, only once in most cases). Although 18.9% had consulted an IAS in the previous year, but only 4.9% consulted more than twice. These individuals rarely used paramedical or alternative resources. Whatever the setting, Type 2 users had the lowest rate of healthcare utilisation, and this was especially true for private hospitals and clinics. Type 2 individuals rarely required emergency care in the home (3.5%) or in emergency units (9.8%) and, when they did, they usually (9.7%) consulted only once and hardly ever more than twice (0.1%).
Healthcare resource utilisation by Type 3 users was closer to that observed in Type 2 than that in Types 1 or 4. Type 3 individuals were characterised by extensive recourse to GPs, with 20.3% of users have consulted more than six times in the last twelve months. In contrast, they rarely consulted DAS (6.4%) or IAS (only 21.6% of them had consulted an IAS and never more than once). They seldom used paramedical or alternative care (9.4% and 3.3% respectively). When consulting GPs, they were more likely to consult in community care, and had little use for the public sector. Nonetheless, Type 3 users constituted the second most frequent user of emergency resources, especially in emergency units (19.3% consulted at least once in emergency units).
Type 4 shared similarities with Type 1, in that it was constituted by people who were heavy users of the healthcare system. Type 4 did not present a particularly  high rate of GP consultations (being the third-most frequent user) but had the highest use of DAS (100%). Type 4 individuals were also the second most frequent users who referred to relatives for medical advice, and who consulted for paramedical or alternative care. Type 4 was also the second highest user of IAS, although the frequency of consultation was relatively low (21.6% had consulted once and none consulted more than once). Individuals in Type 4 used all settings when consulting specialists (public and private hospitals, as well as community care). For emergency care, Type 4 presented the lowest level of use compared to the other Types, with 17.0% having consulted in emergency units, 13.0% only once, and 1.6% between three and five times (second user in that case).

Factors associated with healthcare utilization
Univariate associations between independent variables and each of the four profiles of healthcare utilisation are presented in Table 2. Only three variables were not associated with significant differences between profiles, namely having   medical professionals among relatives, feelings of isolation and frequency of social contacts. The five multivariate multinomial models are successively presented in Table 3, with Type 4 being considered as the reference type. As in the univariate analysis, many significant associations were observed. For instance, the probability of belonging to Type 1 increased with age, and women were most likely to belong to Type 4. Foreigners was more likely to belong to Type 3 (OR51.95, 95% CI5 [1.30-2.93]), as did individuals with a low education level (primary school or none; OR52.26, 95% CI5 [1.39-3.69]). Inactive or wealthy people were most likely to belong to Type 1 (OR52.08, 95% CI5 [1.59-2.71], and OR51.80, 95% CI5 [1.20-2.70], respectively). Individuals with a SHI only had the highest probability of belonging to Type 2. Referring to their GP for the slightest health issue was an attitude associated with Type 1 (OR51.55, 95% CI5 [1.19-2.01]), while the opposite attitude was associated with Type 2. Among the three variables related to social integration, only the frequency of social contacts tended to be associated with the type of healthcare utilisation; although the association was not significant, the point estimate indicated that frequent social contacts were more characteristic of Type 4 people. In terms of health status, reporting a chronic condition was significantly associated with Types 1 and 3 (OR52.91, 95% CI5 [2.27-3.71], and OR51.34, 95% CI5 [1.02-1.76], respectively), while reporting a good health status tended to be more frequent in Type 2 (OR50.53, CI5 [0.33-0.84]).

Discussion
In this study, we took advantage of a database which was representative of the general population of French-speaking adults in the Paris metropolitan area. Because data were recorded from face-to-face interviews, independently from medical registers or medical consumption records, our sample has the advantage of taking into account non-users of healthcare. Social and subjective variables are particularly richly documented in the SIRS cohort, which was originally designed to study social inequalities in health and access to healthcare. We used an original and methodologically robust approach to identify homogenous and consistent types of healthcare system users.
We identified four different types of healthcare user through this approach. The findings of the cluster analysis exhibited strong robustness in terms of sensitivity to parameter tuning and of group stability. One type of user (Type 1) typically consisted of elderly individuals of French origin, who were wealthy but unhealthy, inactive and socially isolated, and who benefited from a good health insurance and took advantage of all kinds of healthcare services, which they used extensively. Type 4 was typically constituted by young, working women of French origin, with a high educational level, who tended to be wealthy and healthy, socially integrated and supported and fully insured. These users were the most likely to frequently consult specialists in the community and to make use of non-conventional care. A third type (Type 2) was constituted by young men, frequently foreigners, who tended to be unemployed and rather poor, healthy but with a mediocre access to health insurance, and who had the lowest utilisation of healthcare services. The last type (Type 3) was constituted of a population of diverse ages, often foreigners, with a poor educational level and low incomes. These users were typically inactive, with a mediocre health insurance, rather socially isolated and unhealthy, and principally used GP services or emergency healthcare.
Our study has certain limitations. Firstly, we dealt with declarative data, without any linkage to medical records or objective measures, so we were unable to estimate possible reporting and recall biases. It is also specific to the French healthcare system and we can thus make no direct comparison with or extrapolate to healthcare systems of other countries, each of which has specific regulation policies (especially in terms of gate-keeping, extent of public and supplementary health insurance and out-of-pocket payments) and provision of healthcare services. For example, if individuals with Type 4 profile present a higher use of specialists, it is partly both because most of them are women and consult a gynaecologist every year. In France, gynaecologists represent general ''women's health'' doctors, responsible for all aspects of gynaecological follow-up (including contraception and cervical smears) and who are directly accessible without going through a GP. The place of the gynaecologist in the French healthcare system is atypical and not found in many other countries. Also, despite the existence of a universal basic health insurance, income and insurance status may still influence access to healthcare, which could account for the Type 3 profile. Apart from individuals with the lowest income levels and those suffering from costly chronic diseases, approximately 30% of health expenses are supported by patients (copayment). They may be reimbursed by a voluntary (supplementary) insurance; but sometimes only partially, according to their contract. In many situations, people also have to pay upfront and are then reimbursed by the basic public health insurance. The gate-keeping system can be bypassed, although this incurs a higher cost or lower reimbursement for patients. In addition, access to IAS and prescription of paraclinical tests such as imaging or laboratory analyses can be prescribed without consulting a GP during an emergency unit consultation instead (with the possibility to get them at the same time rather than in a second step after the GP consultation).
Moreover, even in France, the Paris metropolitan area region is not representative of the whole country, being very urbanised, more wealthy on average and with a higher density of medical provision than the rest of the country, but also with much more social inequalities and spatial segregation than other French regions [43].
Technically speaking, the robustness and stability of the four clusters could result, at least in part, from too many constraints in the analysis methods. In other words, the identified clusters may grossly reflect reality, but lack accuracy. If so, this is likely to be linked to the intrinsic geometric assumptions underlying the clustering methods we used. The overall shape of the groups identified cannot represent complicated configurations, such as reticulated patterns, and the clusters generated by the model are spheroid with little scope for interpenetration. Moreover, we may have encountered a lack of statistical power for some underrepresented categories such as the utilisation of emergency resources.
From the health inequality point of view, our results help clarify some of the differences in behaviours with respect to healthcare system use and to opportunities for healthcare. While it is usually assumed that social inequalities in access to healthcare mainly stem from economic inequalities [44][45][46], it would have been expected that the introduction of universal health insurance in France in 1999 should have removed such barriers [47]. However, it is clear that this has not happened systematically [16,43,48]. In fact, the social causes and processes underlying inequalities in access to healthcare are complex, going beyond purely economical or materialistic factors, and involving different psychosocial and behavioural factors as well [49,50]. Our study may help unravel some of this complexity and diversity. Indeed, we observed several associations between the type of healthcare resource utilisation and social factors previously found to be associated with healthcare indicators, such as objective or perceived health status [45,51] or, more broadly, health expectations and perceived needs [52], social capital [53] or social integration [54]. These determinants coexist, but contribute to different extents to the four types of utilisation. For example, health status, measured by chronic diseases and functional limitations, was indeed associated with greater use of the healthcare system, together with higher educational level and higher income. Stance regarding healthcare was also found to influence extent of use of the healthcare system and established differences regarding gender and healthcare use were observed. Among all the variables evaluated, only having medical professionals among relatives was not significantly associated with profiles; this may be explained by this question being too general, as it could be interpreted in a wide variety of different ways with respect to closeness, confidence, availability and the professional skills involved. With respect to social integration, the frequency of social contacts appeared to be poorly discriminant. In this context, we believe that the ''crude'' frequency of social contacts, without any further details on their frequency, quality, context or content, is less meaningful than direct interrogation of global and subjective feelings of isolation. For the latter, significant differences were found for people reporting a ''very isolated'' status in Types 1 and 3 (Table 3). Taken together, these results, which are consistent with the literature, provide some face validity of our typology.
As mentioned above, this typology can be interpreted according to Andersen's BMHSU, in terms of equity in access to healthcare services. When looking at the variables that discriminated the four types of user best (OR$2 or #0.5), we observed that the most important predisposing factors for the pattern of healthcare utilisation were age, gender, origins, educational level, a feeling of social isolation and the general attitude toward medical consultation (Types 2 and 4 only), whereas the most prominent enabling factors were the feeling of social isolation and health insurance (Types 2 and 4 only).
Individuals corresponding to the Type 1 profile need to have access to services due to a higher prevalence of chronic conditions and functional limitations, and they also use these services, both because they can afford to (they have adequate financial resources and more time to access services) and because they present habits and perceptions that make them prone to use them. Our study provides no information on whether their access to healthcare meet all their needs, but their pattern of healthcare use does not seem to be explained in terms of predisposing factors such as gender or educational level. Individuals corresponding to the Type 2 profile, who are in majority young males, have a low level of healthcare utilisation which reflects their low perceived needs. At the same time, these individuals are those with the highest proportion of basic insurance status only. This is an obstacle to access to healthcare which is not very reassuring regarding equity in access to health services, particularly since young people may underestimate their real health needs. Individuals corresponding to the Type 3 profile present low rates of use and multiple negative predisposing factors. For example, they are more likely to be foreigners, with a lower educational level and low financial resources. On the other hand, they are likely to suffer from chronic conditions and functional limitations. It is for this profile that the healthcare system is the most likely to be inequitable. Indeed, these individuals predominantly use services from GPs and emergency units, and far less often from specialists. Individuals corresponding to the Type 4 profile have few expressed needs but use services, especially from GPs and DAS, intensively, preferring community care. These individuals have the adequate resources to use services, both economically (income, health insurance) and culturally (female gender, higher educational level, higher social support). In conclusion, of the four profiles described, one (Type 3) and possibly two (Type 2) profiles are in a situation where the French healthcare (and insurance) system is the most likely to be inequitable.
Finally, we demonstrated that the method used was able to reveal stable and meaningful structures in our data without resorting to the usual reductionism of classical studies on healthcare utilisation. For this reason, we think that a similar multivariate clustering method would merit replication in other datasets derived from other contexts, such as non-urban populations or countries with other healthcare systems, in order to confirm or refine our findings.