Does Multimorbidity Influence the Occurrence Rates of Chronic Conditions? A Claims Data Based Comparison of Expected and Observed Prevalence Rates

Objective Multimorbidity is a complex phenomenon with an almost endless number of possible disease combinations with unclear implications. One important aspect in analyzing the clustering of diseases is to distinguish between random coexistence and statistical dependency. We developed a model to account for random coexistence based on stochastic distribution. We analyzed if the number of diseases of the patients influences the occurrence rates of chronic conditions. Methods We analyzed claims data of 121,389 persons aged 65+ using a list of 46 chronic conditions. Expected prevalences were simulated by drawing without replacement from all observed diseases using observed overall prevalences as initial probability weights. To determine if a disease occurs more or less frequently than expected by chance we calculated observed-minus-expected deltas for each disease. We defined clinical relevance as |delta| ≥ 5.0%. 18 conditions were excluded because of a prevalence < 5.0%. Results We found that (1) two chronic conditions (e.g. hypertension) were more frequent than expected in patients with a low number of comorbidities; (2) four conditions (e.g. renal insufficiency) were more frequent in patients with many comorbidities; (3) six conditions (e.g. cancer) were less frequent with many comorbidities; and (4) 16 conditions had an average course of prevalences. Conclusion A growing extent of multimorbidity goes along with a rapid growth of prevalences. This is for the largest part merely a stochastic effect. If we account for this effect we find that only few diseases deviate from the expected prevalence curves. Causes for these deviations are discussed. Our approach also has methodological implications: Naive analyses of multimorbidity might easily be affected by bias, because the prevalence of all chronic conditions necessarily increases with a growing extent of multimorbidity. We should therefore always examine and discuss the stochastic interrelations between the chronic conditions we analyze.


Background
Research on multimorbidity is often guided by the assumption that multimorbidity is more than just the sum of single diseases [1]. But what then exactly is multimorbidity? In most studies multimorbidity means the presence of several chronic diseases in one person for a longer period of time [2]. It is highly prevalent in the elderly population and may result in decline of functional status, lower quality of life, higher mortality, increased health care utilization and therefore rising costs of care [3]. It is presumed that multimorbidity causes a different dimension of suffering. On the one hand the combination of diseases might lead to a higher illness burden than the single diseases; on the other hand chronic conditions might share symptoms and/or risk factors [1].
We should note that multimorbidity is a complex phenomenon with an almost endless number of possible disease combinations with unclear implications. Recent research has described and grouped these combinations by introducing triads of chronic conditions [4], or multimorbidity patterns resulting from cluster analysis [5] or factor analysis [6]. Despite these efforts the process and pathway of multimorbidity is still not known [7].
Multimorbidity may occur in case of random coexistence of diseases, merely statistically significant associations or causal interrelations between chronic conditions [8]. One important aspect in analyzing the clustering of diseases in individual patients is to distinguish between random coexistence and statistical dependency.
We presume that in a large number of cases clustering of chronic conditions is determined by chance alone. It is non-trivial to discriminate statistically between random co-occurrence and statistical association. For this reason we developed a model to account for random coexistence based on the stochastic distribution in our sample.
The aim of this study is to determine if multimorbidity influences the occurrence rates of chronic conditions. For this reason we analyzed if there is an association between the number of diseases of a chronically ill person and the risk of gaining selected chronic conditions. We can define that multimorbidity has no effect on this process if the gain of new diseases is determined by chance alone, e.g. a healthy person would have the same chance of gaining diabetes mellitus type 2 as a person who already has 8 diseases.

The statistical problem
Statistical independence between chronic conditions can be described as an urn model. Each of the 46 diseases in our analyses is represented by a ball in an urn. The process of gaining diseases then corresponds to randomly drawing balls from this urn. For each additional disease a new ball is drawn. As the diseases all have a different prevalence each of the balls would also have a different probability of being drawn (e.g. because of a different weight). The balls are not returned back into the urn once they are drawn. For this reason the probability for drawing each ball changes in the next draw depending on the probability of the balls that were drawn before. If we want to know the probability of each of the diseases in the first draw, the second draw, the third draw etc. we have to sum up all the single probabilities.
An example: We can imagine a population with one to three diseases from a list of four. Chronic condition no. 1 (CC1) has a prevalence of 60%, CC2 25%, CC3 10% and CC4 has a prevalence of 5% among the patients with only one disease. As the chronic conditions are supposed to be statistically independent from each other the probability for each draw (p) is the same as the prevalence in the first draw (P 1 ). For CC3 this is expressed by the following formula: If we take a look at the prevalence in persons with two diseases (P 2 ) there are the following possibilities: 1) A person could gain CC3 in the first draw or 2) he or she could first gain CC1 or 3) CC2 or 4) CC4 and then CC3 in the second draw: If want to determine the prevalence in persons with three diseases (P 3 ) it gets a little more complicated. A person could gain CC3 as first or second disease, or 5) first get CC1 then CC2 or 6) first CC1 then CC4 or 7) first CC2 then CC1 or 8) first CC2 then CC4 or 9) first CC4 then CC1 or 10) first CC4 then CC2 and then CC3 in the third draw: The probability of having one of the four diseases in each draw is shown in Figure 1. For CC3 the prevalence increases from 10% to 29% to 71% if we have complete statistical independence between prevalence and the total number of chronic conditions one person has. As a matter of course the prevalence of each disease would be 0% in a healthy subsample and 100% in a subsample of persons with four diseases from our list of four.
The association of a chronic condition with multimorbidity can be expressed by relative risks for multimorbidity in which the prevalence in the non-multimorbid population is compared to the prevalence in the multimorbid population [4]. If use our above example and define that 1,000 persons have one disease, 400 have two and 150 persons have three diseases, we can calculate these risk ratios for each disease. In our example we would define multimorbidity as having at least two chronic conditions, so that the non-multimorbid sample consists of all patients having exactly one chronic condition. The risk ratios for multimorbidity are shown in Table 1. Although all four diseases are statistically independent from multimorbidity they have different risk ratios ranging from 1.5 to 4.2. If we would also have healthy persons and/or persons with all four diseases in our sample, the risk ratios would even be much higher.

Methods
The analyses are based on the comparison of two data sets. The first data set (''observed data'') consists of ambulatory data of the Gmünder ErsatzKasse, a German statutory health insurance company with 1.7 million insurants (in 2008), which corresponds to 2.4% of the statutory insured population [9]. The dataset contains pseudonymous data from every insured member of this company. We used a sample of all persons aged 65 years and older who were permanently insured during the year 2006 and had at least one and not more than 12 chronic conditions. Healthy patients have been excluded because by definition all diseases have a prevalence of 0% in healthy subjects and therefore they did not contribute any information to the analyses. Patients with more than 12 diseases have been excluded because of a low sample size in these subsamples (, 1,000 patients) which might lead to biased results in diseases that have a rather low prevalence.
The second data set (''expected data'') was generated in order to model the chronic conditions as statistically independent from multimorbidity. As we used 46 chronic conditions and 12 draws we would have had to calculate hundreds of extremely complex equations that each included many thousands of possibilities. As each of these equations also would be unique and had to be programmed separately it was not possible to calculate the expected prevalences by this brute force method. Instead, a data set of 500,000 hypothetical persons was generated by simulation, which was based on our stochastic model. We used the prevalence data found in the observed data set as probability weights for the diseases. In our simulation we repeatedly produced sequences of diseases by drawing without replacement from all observed diseases. Each simulated patient gains a total of 12 diseases. The order of gaining the diseases is stored. For this reason the expected data set contains information about patients with one through twelve diseases. As we presume that chronic conditions can occur ''earlier'' or ''later than expected'' in the course of multimorbidity we used the observed overall prevalences, i.e. the mean prevalences in all persons aged 65+ in our data set (instead of the mean prevalences in persons aged 65+ that have only one disease) as initial probability weights.
The analysis of morbidity was based on a list of 46 defined diagnosis groups of chronic conditions based on ICD-10 codes. The methods for compiling this list have been described elsewhere in detail [4]. In short, we used the most frequent conditions reported in GP surgeries [10], assessed them for chronicity using a recent expert report [11] and amended this list for all chronic conditions with a prevalence $ 1% in the age group $ 65 years in the data set of the Gmünder ErsatzKasse in 2006. ICD-10 codes were grouped together if diseases and syndromes had a close pathophysiological similarity and if ICD codes of related disorders were used ambiguously by coding physicians in clinical reality, respectively. Prevalence, gender-specific rank order and ICD-10 codes of the diagnosis groups have been published in another paper [6].
All problems under management by physicians within the statutory ambulatory care have to be coded in ICD-10 and forwarded to the health insurance companies as regulated by German law in 1295(1) SGB V and 144(3) of the Federal Collective Agreement within the statutory health insurance system in Germany [12]. Each problem must be represented by one or more ICD-10 codes. We only included diagnoses that were covered by the list of 46 diagnosis groups and had been coded in at least three out of four quarters (three month periods) within the year 2006. This criterion was chosen in order to increase the validity of the diagnoses based on claims data by avoiding transitory or even accidental diagnoses.
Multimorbidity is often used as a dichotomous variable, i.e. the sample is divided around a cut-off point of 2 or 3 diseases. This definition results in a loss of information that can be avoided. For this reason we used multimorbidity as an ordinal variable by dividing the observed and the expected data set into 12 subsamples. Subsample 1 consists of all persons who have exactly one chronic condition. All persons with exactly two chronic conditions are in subsample 2 etc.
To determine if a disease occurs more (or less) frequently than expected by chance we compared our two data sets by calculating ''observed minus expected deltas'' for each disease in each subsample, e.g. we compared the observed prevalence of hypertension in patients with two diseases with the expected prevalence of hypertension in this patient group. We report the overall prevalences (over the subsamples 1 to 12), observed and expected data for the extremes (subsample 1 and subsample 12), the highest deltas for each disease and the subsample in which the delta is highest.
Because of the large sample size we did not test for statistical significance, but instead used a criterion for clinical relevance. We defined that a disease is associated in a clinically relevant extent with the total number of chronic conditions if the absolute value of the observed minus expected delta in one subsample is 5.0% or  more. For all diseases with a clinically relevant delta we compared the observed and expected prevalences in all subsamples in a prevalence curve. Data preparation was done with SAS (Version 9.2). The simulation was calculated using R (version 2.12.0). Statistical analyses were made with Stata/MP (version 11.0) and figures were created using MS Excel 2003 SP 3.
The research expressed in this article was conducted according to the principles expressed in the Declaration of Helsinki. The researchers did not have to obtain informed consent, because the research was based on insurance claims data and the data set was analyzed anonymously (as regulated by German law in 175 SGB X). The study was approved by the Ethics Committee of the Medical Association of Hamburg including the waiver of consent (approval no. PV3057).

Results
In total 121,389 persons were analyzed. Subsample 4 (which consists of persons with exactly 4 diseases from our list of 46) is the largest subsample in our data set with 18,075 patients (14.9%) and subsample 12 is the smallest subsample (with 1.2% of the total sample) and still includes 1,433 patients. The mean age of the total sample is 72.2 years. The lowest mean age can be found in subsample 1 (70.5 years) and the highest in subsample 12 (74.7 years). 56.4% of the total sample were male. Subsample 1 has the highest proportion of males (59.2%) and subsample 12 the lowest (52.2%).
Observed and expected prevalences and maximum deltas for 46 chronic conditions are shown in Table 2. The total sample includes all persons with at least 1 and not more than 12 chronic conditions. In this sample hypertension is the disease with the highest prevalence (63.1%) and hypotension (1.4%) has the lowest prevalence. There are six chronic conditions with a maximum delta $ +5.0% (hypertension, lipid metabolism disorders, chronic low back pain, atherosclerosis/PAOD, neuropathies and renal insufficiency) and also six with a maximum delta # 25.0% (diabetes mellitus, thyroid dysfunction, severe vision reduction, cancers, prostatic hyperplasia and noninflammatory gynecological problems).
Curves of observed and expected prevalences for chronic conditions with a maximum delta . +5% can be seen in Figure 2 and 3. Hypertension and lipid metabolism disorders have their maximum delta in the first six subsamples. The other four diseases have a higher than expected prevalence in subsamples 7 to 12 and (with the exception of chronic low back pain) are lower than expected in subsamples 1 to 6.
Curves of for chronic conditions with a maximum delta , 25% are shown in Figure 4 and 5. All six chronic conditions in these figure have a higher (or equal) than expected prevalence in the first four or five subsamples and a lower than expected prevalence in the other subsamples.

Discussion
Only twelve of 46 chronic conditions show a frequency curve that differs in a clinically relevant extent from the curve that would be expected by chance. This does not necessarily mean that the occurrence rates of the other conditions are statistically independent from the number of chronic conditions as we did not compare incidence rates and time-to-event rates in a monomorbid population with rates of multimorbid patients. Instead, this finding can be interpreted in a way that these conditions do not exceed (or fall below) the average dependency between all chronic conditions. We also have to keep in mind that 19 of the 46 diseases have a total prevalence below 5% so that they are unlikely to reach the criterion of clinical relevance in the subsamples. The fact that renal insufficiency belongs to this group and nevertheless exceeds the relevance criterion shows the strong dependency of this condition on the number of comorbidities.
If we keep these limitations in mind we can classify the 28 remaining conditions (including renal insufficiency) in four types of growth: (1) increased prevalence in persons with a low number of comorbidities (i.e. in patients with less or equal than six chronic conditions), (2) increased prevalence in persons with a high number of comorbidities (i.e. in patients with more than six diseases), (3) decreased prevalence in persons with a high number of comorbidities, and (4) average course of prevalences.
Type (1) consists of hypertension and lipid metabolism disorders, which both are much more frequent than expected among persons with a relatively lower number of chronic conditions. These results do not surprise as both are known to be risk factors for a large number of chronic conditions like cardiovascular disease [13], [14], stroke [15], [16], and renal disorders [17], [18].
Type (2) includes atherosclerosis, neuropathies, renal insufficiency and chronic low back pain, which are all more frequent among persons with many diseases. This might be explained by the fact that most of these conditions are closely related to other chronic conditions, e.g. atherosclerosis can result from hypertension and hyperlipidemia [14]; and neuropathies and renal insufficiency are frequent complications of diabetes mellitus [19], [20].
Type (3) encompasses diabetes mellitus, cancers, thyroid dysfunction, severe vision reduction, noninflammtory gynecological problems and prostatic hyperplasia. All of these chronic conditions are less frequent than expected in persons with many comorbidities. This may result from three possibilities. First, they can be a precursor or early stage of other diseases. This especially applies to diabetes mellitus, which is known to be related to a large number of complications [21]. Second, a condition may result in very high mortality rates, so that the many patients die before they gain additional comorbidities. This may particularly be the case with cancer [22]. Third, it may be that the diseases are in fact less frequently (than average) related to other diseases.
Type (4) consists of 16 conditions that are related to an average extent to the other conditions. This is in many cases surprising as diseases like depression, chronic ischemic heart disease, stroke, gout and others have been shown to be dependent on a multitude of other chronic conditions [15], [23], [24]. This finding can be interpreted as an indicator for a high average interrelation with other diseases affecting most highly prevalent chronic conditions.
To our knowledge until now there have been no studies investigating the prevalence increase of chronic condition with a growing extent of multimorbidity, but there has been a previous study of our research team [4] using risk ratios for multimorbidity. This study identified that renal insufficiency, atherosclerosis and neuropathies were among the ten conditions with the highest risk ratio for multimorbidity. These results could now be confirmed. The chronic conditions obesity, liver diseases, chronic cholecystitis/gallstones and hyperuricemia/gout, which were also among this top ten list, only showed an average diathesis for multimorbidity in our present study. Hypertension, cancers and severe vision reduction were among the four conditions with the lowest risk for multimorbidity in our previous study. We now also found that these diseases occur more often in persons with less chronic conditions.

Strengths and weaknesses
This study is the first to develop a stochastic model for comparing expected and observed prevalence rates in a multimorbid sample. The model fitted well as 16 of 28 diseases performed as expected. Because of the large sample size we refrained from testing for statistical significance. Instead we used a criterion for clinical relevance. In doing so we lost 18 of 46 diseases with a prevalence below 5%. These diseases were used for the stochastic model (and therefore for the simulation of the ''expected'' data set), but they could not be examined for comparison of expected and observed prevalences.
We were able to show that our model is less biased from the prevalence of the diseases than risk ratios for multimorbidity. Our comparisons can also discriminate between conditions that have a higher, a lower or an average association with the number of comorbidities of a patient. A problem of our approach lies in the fact that we cannot decide whether a disease is (absolutely) independent from other conditions.
Another limitation of our model lies in the fact that only bivariate comparisons were conducted. There are noticeable differences in age and gender between the subsamples and therefore these variables could confound our results. Conditions that are more frequent among women or very old patients might seem to have a higher association with the number of chronic conditions than they in fact have. As we have a cross-sectional study design our results could also be affected by selective survival, so that conditions with high morbidity rates could seem to be to a lesser extent associated with multimorbidity than they are in reality.
A strength of our approach relates to the selection of diseases. We included all highly prevalent chronic conditions ($ 1% in the    age group 65+) into our diagnosis groups and used them for our stochastic model. Our analyses are therefore based on a comprehensive picture of chronic diseases in individual patients.
Consideration must also be given to the data quality. Various studies have shown that there are differences in the distribution of age and gender [25] and in the prevalence of diseases [26] between the German health insurance companies. For this reason we compared the morbidity data of the Gmünder ErsatzKasse with data from a prospective cohort study of 3,189 patients in Germany [27]. This study has been published elsewhere. In short, we found that there may be an underreporting of diagnoses in the claims data from the Gmünder ErsatzKasse, but there was an acceptable correspondence of the relative prevalence and the rank order of the individual diseases between claims data and data from the cohort study. Because of the differences between data sources, studies relying on a single data source generally have to be interpreted with caution [28].
Although accidental and transitory diagnoses were excluded, in some cases diagnoses may be imprecise, ambiguous or incomplete because they were not clinically verified by trained professionals. This is a general problem in insurance claims data, but in our view, the benefits of claims data outweigh their disadvantages: We are provided with a large unselected population, representing realworld conditions and including persons living in protected institutions/nursing homes as well as frail individuals and the oldest olds, all frequently not included in survey and field studies. In choosing insurance claims data, we also avoided selection bias concerning service providers and as a matter of course there is no recall bias concerning diagnosis data.

Conclusions
The growth of multimorbidity goes along with a rapid growth of prevalences in all chronic conditions. While this finding may be itself of importance for daily care of multimorbid patients it is -for the largest part -merely a stochastic effect. If we account for this effect we find that multimorbidity still seems to influence the occurrence rates of many chronic conditions, but in two directions: some conditions had a higher than expected prevalence and others had a lower than expected prevalence in patients with many comorbidities.
Our results also have methodological implications: We were able to show that the distribution of prevalences is complex and far from normality. If we use a naive approach for analyzing multimorbidity (e.g. by simply dividing the population in subsamples based on the number of chronic conditions without accounting for the distribution of diseases) these analyses might be affected by bias, because the prevalence of all chronic conditions necessarily increases with a growing extent of multimorbidity. For example, if disease burden is measured by the number of diseases of the individual patients and rare diseases in the study are more likely to produce a certain outcome, the effect of the disease count on this outcome can be confounded by the effect of the individual diseases. We should therefore always examine and discuss the close stochastic interrelations between the chronic conditions we include in our analyses.