Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

There is still limited understanding of how chronic conditions co-occur in patients with multimorbidity and what are the consequences for patients and the health care system. Most reported clusters of conditions have not considered the demographic characteristics of these patients during the clustering process. The study used data for all registered patients that were resident in Fife or Tayside, Scotland and aged 25 years or more on 1st January 2000 and who were followed up until 31st December 2018. We used linked demographic information, and secondary care electronic health records from 1st January 2000. Individuals with at least two of the 31 Elixhauser Comorbidity Index conditions were identified as having multimorbidity. Market basket analysis was used to cluster the conditions for the whole population and then repeatedly stratified by age, sex and deprivation. 318,235 individuals were included in the analysis, with 67,728 (21·3%) having multimorbidity. We identified five distinct clusters of conditions in the population with multimorbidity: alcohol misuse, cancer, obesity, renal failure, and heart failure. Clusters of long-term conditions differed by age, sex and socioeconomic deprivation, with some clusters not present for specific strata and others including additional conditions. These findings highlight the importance of considering demographic factors during both clustering analysis and intervention planning for individuals with multiple long-term conditions. By taking these factors into account, the healthcare system may be better equipped to develop tailored interventions that address the needs of complex patients.

others including additional conditions.These findings highlight the importance of considering demographic factors during both clustering analysis and intervention planning for individuals with multiple long-term conditions.By taking these factors into account, the healthcare system may be better equipped to develop tailored interventions that address the needs of complex patients.

Background
Multimorbidity, also known as multiple long-term conditions, is the co-existence of two or more long-term conditions within an individual [1].It is now the norm in ageing populations, with this group of patients being inherently heterogeneous [2,3].The estimated prevalence of multimorbidity varies considerably depending on the population studied, the specific list conditions that are included in the analysis and the data sources used., the [4], but consistent findings show that multimorbidity is common, more frequent in older people, women, and socioeconomically deprived populations [5,6] The relationship between socioeconomic deprivation and multimorbidity is complex [7,8], but there is evidence that the less affluent have earlier onset and more rapid accumulation of conditions resulting in widening inequalities into old age [9].
While most clinical guidelines focus on managing individual conditions, the number of individuals with multimorbidity is increasing, causing major concerns for the delivery of care in an already constrained healthcare system with competing needs [10,11].Prioritising interventions for high-risk groups is vital as healthcare systems strive toward the sustainability of service delivery.There is considerable evidence suggesting that the current disease-based approach to managing individuals with multimorbidity is associated with a variety of poor outcomes, including inadequate preventative care and access to rehabilitation services [12], repeated referrals for specialist care [13] and increased healthcare costs [14].
Understanding how conditions cluster is a key element in unravelling determinants and the delivery of future healthcare.Little is known about how disease clusters contribute to multimorbidity and complex multimorbidity (defined as having 4 or more multiple long-term conditions) [3] across age, sex, and socioeconomic deprivation of individuals [15].Most studies that report clusters of conditions do so without considering the demographic characteristics of the patients which could affect the nature of observed clusters [16,17].Some approaches to clustering within the multimorbidity literature, aim to classify patients based on their conditions and place them into similar groups while other approaches aim to identify groups of conditions which are present in individuals more frequently than expected.
Recent work by Kuan et al. showed variability in the most common single conditions across the life course and also by sex [18].Other studies have reported that socioeconomic status also impacts the development of multimorbidity during different periods [19].A deeper understanding of how the different demographic characteristics associated with and contributing to clusters of conditions in patients with multimorbidity is needed to help clinicians in the management of those patients and to prepare the health systems to provide adequate management for these complex patients.This study aims to assess the prevalence of multimorbidity and complex multimorbidity by age, sex and area-level socioeconomic deprivation.We also identified the most common condition among patients with multimorbidity, and key clusters of disease, stratified by age, sex and socioeconomic deprivation.

Study design and population
The population for this study were residents of Fife and Tayside, Scotland who were aged at least 25 years old on 1st January 2000 and alive on 31 December 2018, when a cross-sectional analysis of all live patients was performed.The data was generated for a study exploring multimorbidity across different countries in the UK.Exploration of the data showed a strong socioeconomic and age gradient in terms of specific individual conditions and prevalence multimorbidity [20] which we felt warranted further exploration [20].We ascertained the dates of death from National Record Scotland death certificates and the population register.The dataset used linked pseudonymised health and demographic data held by the Health Informatics Centre (HIC) at the University of Dundee.
Health care in Scotland is provided free at the point of care by the taxpayer-funded National Health Service (NHS).NHS Tayside and Fife are two separate Health Boards that provide specialist and secondary care services and contract with general practices that provide primary medical care to an approximate population of 800,000 individuals.The unique Community Health Index (CHI) number allocated to individuals at the point of registration with a general practice (GP), is used across the NHS.Demographic information, hospital admission and daycase records, cancer registry and mental health inpatient were linked to the death data from 1st January 2000 and Emergency Department (ED) attendances from 1st January 2017 onwards.

Multimorbidity definitions
All hospital admissions, psychiatric hospital admissions, outpatients, cancer registry and emergency department records over the period were examined, and all the International Classification of Diseases (ICD)-10 codes were extracted.We identified 31 conditions listed in the Elixhauser Comorbidity Index [21] (see S1 Table) based on the presence of ICD-10 codes relevant to individual conditions for the presence of the condition and the first record of diagnosis date during the study period.Depression and psychoses are examples of mental health longterm conditions included within the Elixhauser index whilst weight loss and cancer are some of the physical conditions listed (see S1 Table for a full list of conditions and related codes).The Elixhauser Index was chosen as previous reviews have suggested it is well established for use with electronic health records using ICD10 codes [22].All individuals with two or more conditions were defined as having multimorbidity, and those with four or more were identified as having complex multimorbidity.

Explanatory variables
The patient's age was calculated on 31 st December 2018 and grouped as 44-49, 50-59, 60-69, 70-79, and 80+ years.Sex was recorded as male/female, and socioeconomic status was measured by the quintiles of the Scottish Index of Multiple Deprivation (SIMD), a postcodeassigned measure of small area (data zone) deprivation.SIMD uses seven domains (income, employment, health, education, housing, crime, access to services) to score data zones in different aspects of deprivation and is then ranked and grouped into quintiles [23].

Data management and statistical analysis
Frequencies and proportions of individuals with the conditions and the prevalence of multimorbidity and complex multimorbidity within each stratum (age, sex, deprivation) were reported and relevant tests for differences were used.Analysis of clusters was performed in three phases: (i) the whole population with multimorbidity, (ii) by age, sex and socioeconomic deprivation, and (iii) by their interactions.The rate of co-occurrence between pairs of the long-term conditions is presented in S1 Fig.To allow for the clustering of conditions across the characteristics of the population, the market basket analysis (MBA-also known as association rule mining) using the Apriori algorithm was used [24,25].The dissimilarity command with the Jaccard option within the MBA was used to identify clusters among the conditions stratified by age, sex and SIMD [24,25].A cluster is a group of conditions with shorter distances to themselves than to other conditions in a binary matrix of conditions.The dissimilarity function organises the conditions by cluster so that the conditions within clusters are closer together than those in different clusters, and therefore more likely to co-occur.We used MBA because it has been reported as more efficient for binary (present/absent) outcomes than the hierarchical cluster analysis that was originally built for quantitative outcomes [24,25].It also allows an individual to "belong" to more than one cluster if they have a large number of different conditions.The method computes and returns distances for binary data in transactions which can be used for grouping and clustering [24].The optimal number of clusters from the dissimilarity clustering was determined using the Elbow method [26].For clustering, we considered only conditions that had at least 5% prevalence in the population with multimorbidity (see the S2 Table ).The clusters are summarised in Tables 2 and 3 and S3

Ethical approval
HIC provided a linked dataset within a Safe Haven environment for this study.The dataset was obtained under HIC Standard Operating Procedures (SOP).NHS Tayside Research Ethics Committee have approved these SOPs (18/ES/0126).The School of Medicine Ethics Committee, acting on behalf of the University of St Andrews Teaching and Research Ethics Committee approved the project (UTREC MD15619 approved 30th June 2021).As the study data are deidentified, consent from individual patients was not required.

Results
Overall, 318,235 people aged 44 years and over were included in the analysis, with 67,728 (21�3%) identified as having multimorbidity (2+ conditions), while 20,123(6�3%) were also classed as having complex multimorbidity (4+ conditions).The mean (SD) age of the people with multimorbidity was 72�8(7�1) years, 31439(46�4%) were men, 13955(20�6%) were most deprived and 12268(18�1%) were least deprived.The prevalence of both multimorbidity and complex multimorbidity in the whole population was similar for both sexes and increased significantly with age (Fig 1) and with increasing socioeconomic deprivation.more women in younger age groups (44-59) have multimorbidity compared to men, whereas in individuals aged 60 and above, men have a higher prevalence of multimorbidity (Table 1).

Clustering of conditions among people with multimorbidity
All people with multimorbidity.In the total population with multimorbidity, conditions were grouped into five clusters (Table 2).
Cluster 2: Cancer Cluster 10766(56%) of the 19,123 people with at least two solid tumours without metastasis, and metastatic cancer were women with 2945(16%) and 3748(19%) from the most deprived and least deprived groups respectively (S4A Table ).
Cluster 3: Obesity Cluster Obesity, uncomplicated hypertension, chronic pulmonary disease, rheumatoid arthritis/ collagen disorders, hypothyroidism, and uncomplicated diabetes formed cluster 3, 55105 (81�4%) of the people with multimorbidity have at least two of the conditions in cluster 3.
Cluster 4: Renal Failure Cluster The conditions in cluster 4 are peripheral vascular disorders, renal failure, fluid & electrolyte disorders, and deficiency anaemia.
Cluster 5: Heart Failure Cluster The conditions that formed cluster 5 are pulmonary circulation disorders, valvular disease, congestive heart failure and cardiac arrhythmias.In all, the percentages of people with multimorbidity from the most deprived groups were higher than the people from the least deprived group except for cluster 2. The clusters of conditions identified for strata of sex, age and socio-economic deprivation are presented in (Table 3) and in S4, S5 and S6 Tables and S2, S3 and S4 Figs.
Looking at stratification by sex and social deprivation, the identified clusters had a core set of conditions across strata."The core conditions included alcohol misuse, other neurological disorders and depression in the alcohol misuse cluster and solid tumour without metastasis and metastatic cancer in the cancer cluster".However, some clusters for specific strata also have additional conditions within the clusters.For instance, most deprived people had additional conditions such as drug abuse in the alcohol misuse cluster (See Table 2).There are similarities in the number of clusters formed among these conditions across sex and deprivation quintiles.Identifying clusters for the different age groups, conditions in cluster 1 among the youngest (44-49 years) and those in their 50s are similar while cluster 1 in the 60s and 70s are similar but such cluster did not exist in those aged 80 years or older.Cluster 2 in the youngest group (44-49 years), and Cluster 3 in the 50s and 60s look similar while there were additional conditions as the patients grew older (Table 3).
For the most deprived populations aged 80 years and over, drug abuse, alcohol abuse, psychosis, and depression formed a cluster which affected 1 in 5 people with multimorbidity in this subgroup, but these conditions were not prominent among the older least deprived population.This would suggest that those planning initiatives aimed at different populations of people with multimorbidity should be aware that underlying clusters of disease will be different.Alcohol and drug abuse formed part of a cluster (liver disease, psychosis, alcohol abuse, drug, depression, chronic pulmonary disease and other neurological disorders) among 83% of the most deprived patients with multimorbidity aged 44-49 years.However, they contributed to a smaller cluster (psychosis, alcohol abuse, drug, depression) among only two-fifths of their least deprived counterparts.About 26% of the patients with multimorbidity have at least two of the conditions in the alcohol misuse cluster, with a mean (standard deviation) age of 66.4(12.5)years, among which 51% were females, 24% and 12% from most deprived and least deprived groups respectively (Table 4).Fifty-six percent of the people with the conditions in the cancer cluster were females with 15% and 20% from the most deprived and least deprived groups respectively.About 8 of every 10 patients with multimorbidity have at least two of the conditions in the obesity cluster.The percentages of patients with multimorbidity from the most deprived groups were higher than the people from the least deprived group across all the clusters except for the cancer cluster.multimorbidity is well known [19,29], there is much less known about differences in clusters of conditions for these characteristics [19].
The identified clusters strongly correspond to current medical knowledge, demonstrating well-known associations between conditions such as alcohol abuse and depression.Across the population, hypertension and cardiac arrhythmias were the study population's most prevalent pair of conditions, which supports the known relationship between hypertension and heart diseases [30].Our analysis shows association, not causality but it may be possible to surmise the drivers of specific clusters as identified.Conditions most prevalent in our most deprived population groups include alcohol and drug misuse, depression and obesity which are all known to be associated with social factors.Other identified clusters are likely to have more physiological drivers e.g.hypertension through to heart diseases.
The choice of how to define multimorbidity is important in terms of conditions and risk factors.Obesity and hypertension can be considered as both conditions that require management and as risk factors contributing to the development of other health problems.Our findings suggest a high prevalence of obesity among individuals aged 44-49 years old with multimorbidity.This undoubtedly places a significant burden on both health and social care services, given the available evidence on how obesity can reduce life expectancy and healthy life expectancy [31].The younger age group also had alcohol misuse as a key condition.This supports a recent report on alcohol-related harm with risk factors rooted foremost in socioeconomic determinants [32].

Study strengths and limitations
The study population was drawn from two Scottish Health Boards with comprehensive health records over a long period.The use of well-defined conditions and ICD-10 codesets to identify each condition allows other researchers to explore multimorbidity using the same methods.Using market basket analysis to cluster conditions rather than classifying patients into mutually exclusive groups meant patients could be present in more than one cluster depending on the conditions they had Using the same approach across different strata meant comparisons were down to the underlying data rather than simply different populations using different methods.
The data used to identify conditions were hospital records from secondary care These will underestimate the occurrence of conditions as less severe cases might not have been captured.Some limitations of this work relate to the choice of the Elixhauser Index to identify underlying conditions.Some common conditions such as myocardial infarction (ICD10 code I21) are not among the 31 conditions identified in the Elixhauser Index and a few individual conditions may be a progression of a single condition, such as uncomplicated diabetes to diabetes with chronic complications.However, people with this progression of diseases also have other conditions and the clustering will be unaffected as only conditions with more than 5% prevalence were clustered.Each condition has a mutually exclusive code set meaning that different ICD10 codes are related only to one condition.We made an a priori decision to only study the 31 conditions listed and to treat them all as separate.Recent work from Ho et al. has suggested a more complete list of underlying individual conditions which may change the identified clusters [4] but is unlikely to change the fact that clusters will vary in different age groups, gender or socioeconomic groups.
Similarly, there have been several different methods used to identify clusters within multimorbid populations but our choice of Market Basket Analysis as a methodological tool is unlikely to be the cause of differences when examining strata.However, the clusters generated by market basket analysis are based on empirical patterns in the data and only show associations between different conditions, it does not show any causal relationships between those conditions.We reported on clusters of multimorbidity in people alive on 31 st December 2018, if we had included those who had died throughout the period, we may have seen some differences in the identified clusters.The measure of deprivation used in this study is allocated at a postcode level, but it is a small area approximation rather than a direct measure of individual deprivation.
The naming of the clusters was discussed with the research team with either the most common condition or a representative term used but it is still subjective labelling.Cluster 1 for instance was named as alcohol misuse as this was the most common condition for people identified in the cluster but it could also have been labelled as socioeconomic-driven conditions.

Recommendations
Identification of patients who are most vulnerable based on clustering of conditions across characteristics such as age, sex and level of deprivation should be used to inform public health strategies including direct primary prevention and interventional clinical services to where they are most needed.There is a need for significant investment in preventative and public health measures and to take action on social determinants of health [33].The clusters of conditions identified in this study may suggest lifestyle interventions, support groups and mental health interventions in the most deprived areas would be a good strategy to focus on.If not, gaps in health inequalities and differences in multimorbidity prevalence observed may very well continue to widen.

Conclusions
This paper identified that different sub-population groups with multimorbidity need different interventions to prevent and/or manage multimorbidity.Condition clustering in the multimorbid population is mainly influenced by age and also by sex and area-level socioeconomic deprivation.A third of the youngest age group with multimorbidity have alcohol misuse contributing to their multimorbidity.Almost half of the oldest age group have hypertension and cardiac arrhythmia.When considering the clustering of conditions, it is important to consider the age of the people being studied as well as their sex and level of socio-economic deprivation.
-S5 Tables, and the generated dendrograms are presented in S2-S4 Figs.R and Stata version 17 were used for the analysis.

Table 2 . Multimorbidity clusters of the conditions among the whole population with multimorbidity, sex and deprivation subgroups.
*only conditions with at least 5% prevalence within each specific population subgroup were clustered +An individual can "belong" to more than one cluster.https://doi.org/10.1371/journal.pone.0294666.t002

Table 3 . Multimorbidity clusters of the conditions by age* ,^@ .
only conditions with at least 5% prevalence within each specific population subgroup were clustered ^efforts were made to align clusters that were similar across different populations.@ Blank cells exist for certain age where the identified conditions are not present or clustered.+An individual can "belong" to more than one cluster. *