Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data

Background Estimating multimorbidity (presence of two or more chronic conditions) using administrative data is becoming increasingly common. We investigated (1) the concordance of identification of chronic conditions and multimorbidity using self-report survey and administrative datasets; (2) characteristics of people with multimorbidity ascertained using different data sources; and (3) whether the same individuals are classified as multimorbid using different data sources. Methods Baseline survey data for 90,352 participants of the 45 and Up Study—a cohort study of residents of New South Wales, Australia, aged 45 years and over—were linked to prior two-year pharmaceutical claims and hospital admission records. Concordance of eight self-report chronic conditions (reference) with claims and hospital data were examined using sensitivity (Sn), positive predictive value (PPV), and kappa (κ).The characteristics of people classified as multimorbid were compared using logistic regression modelling. Results Agreement was found to be highest for diabetes in both hospital and claims data (κ = 0.79, 0.78; Sn = 79%, 72%; PPV = 86%, 90%). The prevalence of multimorbidity was highest using self-report data (37.4%), followed by claims data (36.1%) and hospital data (19.3%). Combining all three datasets identified a total of 46 683 (52%) people with multimorbidity, with half of these identified using a single dataset only, and up to 20% identified on all three datasets. Characteristics of persons with and without multimorbidity were generally similar. However, the age gradient was more pronounced and people speaking a language other than English at home were more likely to be identified as multimorbid by administrative data. Conclusions Different individuals, with different combinations of conditions, are identified as multimorbid when different data sources are used. As such, caution should be applied when ascertaining morbidity from a single data source as the agreement between self-report and administrative data is generally poor. Future multimorbidity research exploring specific disease combinations and clusters of diseases that commonly co-occur, rather than a simple disease count, is likely to provide more useful insights into the complex care needs of individuals with multiple chronic conditions.


Study population
People aged 45 years and over were included in the analysis if they: (a) completed the 45 and Up Study baseline study questionnaire between 1 September 2007 and 2 March 2009; and (b) had a PBS record for any prescription medication within 2 years preceding the questionnaire date (longest lookback available). Only those with consistent PBS concession card holder status within the 2-year period were included. Information about hospitalisations for these participants was also obtained from the APDC data, restricted to the same 2-year period as the PBS data. People who answered version 1 of the 45 and Up Study baseline questionnaire (n = 37 088) were excluded, as it was not possible to ascertain self-report of doctor-diagnosed depression for these participants. Holders of a Department of Veterans' Affairs health card (n = 6 299) were also excluded, as the PBS does not capture all the services provided to these individuals. A total of 90 352 people with consistent PBS concession card holder status were included in the analysis: 46,766 persons with claims data only (medication only); and 43 586 persons with both claims and hospitalisation records (medication + hospitalisation) (S1 Fig)

Morbidity measures
A total of eight chronic conditions (hypertension, cancer, heart disease, stroke, diabetes, asthma, depression and Parkinson's disease-hereafter referred to as 'morbidities') were selected for analysis, based on their availability in both self-report and administrative data.
Self-report morbidities were ascertained on the basis of responses to a single question "Has a doctor ever told you that you have (name of condition)?" in the baseline 45 and Up Study survey.
Morbidity in the hospital data was ascertained using ICD-10-AM codes in any of the 55 diagnosis fields (S1 Table). The initial list of eligible ICD-10 codes was obtained from the Charlson Index [34,35] and Elixhauser Index [36,37], and refined following advice from a clinical coder. If a condition was coded at least once in the 2-year lookback period, then a person was coded as having that condition in the hospital data.
Morbidity in the medication data was ascertained using ATC codes obtained from Rx-Risk-V [38,39], published reports [40], and research articles [41][42][43][44][45][46][47]. A person was coded as having conditions of interest if a specific ATC code was present in the medication data at least twice in the 2-year lookback period, as it was expected that chronic condition medications would be used regularly. Where published literature had different ATC codes, we chose the codes that had the highest positive predictive value (S1 Table).
A count of conditions in each of the three datasets (self-report, medication and hospital) was created by summing the total number of chronic conditions, ranging from 0 to 8, as well as the total when stroke was excluded. Multimorbidity was defined as having two or more chronic conditions, which is the most commonly used definition in the literature [48]. Complex multimorbidity was defined as having three or more chronic conditions affecting three or more body systems [49].

Statistical methods
Measures of agreement. Agreement between the three data sources was measured by estimating sensitivity (Sn), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and Cohen's kappa statistic (κ) using self-report morbidity measures as the reference. Sensitivity represents the percentage of those with a condition (according to self-report) who were correctly identified as having that condition in administrative data. Specificity represents the percentage of those without a self-report condition who did not have a condition in administrative data. PPV represents the percentage of those identified as having a condition of interest in the administrative data, who actually had the condition, according to self-report. NPV represents the percentage of those identified as not having a condition of interest in the administrative data, who did not have a condition according to the self-report. The kappa statistic (κ) represents the proportion agreement corrected for chance. Kappa values above 0.75 denote excellent agreement, 0.40 to 0.75 fair to good agreement and below 0.45 poor agreement [50].
Analysis. Logistic regression was used to model the odds of multimorbidity, within each dataset separately. All analyses were adjusted for age (categorised into four 10-year age groups and 85+) and sex, and adjusted odds ratios (aORs) and their corresponding 95% confidence intervals (CI) were calculated. A range of categorical variables were examined, including remoteness of residence, highest education attainment, Aboriginal or Torres Strait Islander origin, country of birth, language other than English spoken at home, household income and marital status. Information about these variables was obtained from the 45 and Up Study baseline questionnaire. All data management and analyses were conducted using SAS software, version 9.3 [51].

Sample characteristics
The sample comprised 90 352 participants, who all had a PBS record within the 2 years prior to joining the 45 and Up Study. Forty eight percent of participants also had a hospitalisation in the same timeframe. The mean age at survey completion was 70.2 years in the full sample, and 71.8 years among those with a hospital record. The median number of self-report conditions was 1, with hypertension being the most commonly reported. Other characteristics of the study population are presented in Table 1. Table 2 summarises agreement measures for self-report and administrative data for all eight chronic conditions and multimorbidity definitions. Excellent levels of agreement beyond chance were only found for diabetes, in both medication and hospital datasets. Fair to good agreement was found for hypertension, asthma, depression and Parkinson's disease in the medication data only. The agreement between self-report and hospital data was generally poor.

Agreement measures
Except for cancer, sensitivity values were found to be higher in medication data (range 51.5% -72.4%) than the hospital data (range 6.1% -78.6%) (Fig 1). However, hospital data exhibited higher levels of PPV across all conditions, with the majority of PPVs higher than 70%. The highest PPV was for cancer (89%) in hospital data, and diabetes (90%) in medication data.
Prevalence of individual chronic conditions varied by data source, with hypertension identified in nearly 50% of the sample. Stroke prevalence estimates were found to be four times greater using medication data than self-report data (22.5% vs 5.6%), so stroke was excluded from the count of conditions in the remaining analyses.

Prevalence of multimorbidity
The prevalence of multimorbidity in the study sample was highest using the self-report data (37.4% in the overall sample, 44.2% among those hospitalised), followed by medication data (36.1%) and hospital data (19.3%) ( Table 2). The highest level of complex multimorbidity was found among hospitalised patients using the self-report multimorbidity definition (11%).
The prevalence of multimorbidity was higher in males, and increased with age, using all three data definitions (Fig 2). For those aged under 75 years, the highest prevalence was found using self-report data. For people aged over 75 years, the estimates, particularly in women, were higher using medication data. The proportion of persons with multimorbidity was consistently lower in hospital data compared to the other two datasets.
Associations between multimorbidity and key demographic variables were found to be consistent between datasets, with some differences in the magnitudes of these relationships. The odds of multimorbidity were higher in people who were male, older, of Aboriginal or Torres Strait Islander origin, widowed/divorced/separated, or lived in remote/very remote areas (Table 3). Males had higher odds of multimorbidity using hospital data than with medication data (OR = 1.49 versus OR = 1.07). The age gradient in multimorbidity was more pronounced using administrative data than self-report data (OR >2.5 versus OR = 1.83 for those aged 75-84). People speaking a language other than English at home had 6% higher odds of having multimorbidity (OR = 1.06, 95% CI 1.01-1.10) using medication data and 32% higher odds using hospital data (OR = 1.32, 95% CI 1.22-1.42), but 20% lower odds (OR = 0.80, 95% CI 0.76-0.84) of multimorbidity using self-report data.

Agreement in multimorbidity between datasets
A total of 46 683 (52%) people were found to have multimorbidity in any of the three datasets-33 768 using self-report data, and an additional 12 915 using administrative data only. Of all multimorbid cases, half were identified using a single dataset only, and around one in ten   (n = 5 333, 11%) were multimorbid on all three datasets (Fig 3A). When the analyses were restricted to hospitalised patients, the overlap in the datasets increased to 20% (Fig 3B). The agreement on multimorbidity between datasets was poor, with kappa between 0.27 and 0.39, increasing to 0.43 when both hospital and medication data were combined (Table 2). People identified as being multimorbid in only the self-report data had higher prevalence of cancer, depression, asthma and Parkinson's disease than those identified only in the administrative datasets. The most common self-report two-way combinations of morbidities were cancer and hypertension (n = 2 177), hypertension and depression (n = 1 243) and a three-way combination of cancer, hypertension and heart disease (n = 376).
Administrative data, however, were more likely to identify hypertension and heart disease than self-report, with the heart disease and hypertension two-way combination being the most prevalent in both medication (n = 7 291) and hospital datasets (n = 323) (data not shown).

Discussion
This record linkage study of self-report, hospital admission and medication data compared their use for identifying individuals with multimorbidity, based on the most common chronic conditions in Australia. It showed that the ascertainment of multimorbidity varied between data sources, and that, even where the estimated prevalence of multimorbidity was similar for two data sets, the concordance in classification as multimorbid for individual patients was low.
We investigated the level of concordance of identification of eight chronic conditions between self-report and administrative data. We found that chronic conditions identified in hospital data had higher PPVs and low sensitivities, indicating that although the hospital data does not identify all the people with a chronic condition, when such condition is identified, it is generally accurate. Diagnoses may not always be recorded during inpatient episodes of stay, and there is variation in the level or recording between hospitals [10,11]. In Australia, until recently, there was no mechanism to code diagnoses that do not contribute to hospital stay. Prior to 2015, only diagnoses affecting patient management in a particular episode of care were coded in administrative hospital data. In 2015 codes for temporary use in Australia were  assigned to 29 chronic conditions that are present on admission, where the condition does not meet the criteria for coding [52]. We anticipate that this introduction of supplementary codes for chronic conditions will have a positive impact on the sensitivities calculated in the future studies. For studies that do not have supplementary codes, it is advised to incorporate longer lookback periods in order to increase ascertainment of chronic conditions in hospital data [10,53]. We found that using medication data identifies more cases (higher sensitivity), but at the cost of lower PPV. The lowest PPVs in medication data were found for stroke (16%) and heart disease (35%), the definitions for both of which capture drugs with multiple indications for prescribing. Strong levels of agreement for diabetes, hypertension and Parkinson's disease are consistent with previous research [41,[54][55][56], indicating that medication data can potentially be used for capturing these conditions. Low sensitivity and agreement for cancer in our study is congruent with previous Australian studies [54,57], explained by the fact that chemotherapy drugs are only captured in the PBS data whilst patients are undergoing active treatment. Ascertainment of such cases can be increased by incorporating longer lookback periods. Higher sensitivities for diabetes, hypertension and depression found in our study, compared with a previous Australian study [57], could be attributable to a small sample size in that study, as well as our modified list of depression medications. Namely, we excluded tricyclic antidepressants, as they are commonly prescribed for insomnia and pain. This modification increased our PPV from 55% to 66%.
Selection of the most appropriate set of chronic conditions for other studies will depend on the study's purpose and the availability of data. Studies requiring accurate case ascertainment should use hospital data (noting that under-ascertainment is likely), or medication data for conditions for which medications are indicated only for that condition (e.g. diabetes) and where there is enough lookback time available. If a comprehensive profile of a patient's morbidity is needed, we suggest using a combination of data sources in order to increase sensitivity for identifying certain conditions. Caution should be applied when using hospital data for event-based conditions such as stroke, as these may have occurred outside of the time period of data capture, and would thus be under-reported. Identification of stroke patients using medications is also problematic, as the most commonly dispensed medication (Aspirin) is used for a variety of purposes. Furthermore, we recommend caution when interpreting the prevalence of disease or multimorbidity when using a single data source, in line with previously published work [26].
To the best of our knowledge, this is the first study to evaluate the differences in estimates of multimorbidity, using the same list of chronic conditions and the same individuals. Previous data linkage studies have evaluated differences in estimates of chronic disease prevalence within the same individuals [9,55,[57][58][59][60], but did not formally compare case ascertainment of multimorbidity. Pache et al. [24] assessed the prevalence of multimorbidity using three definitions within the same sample, and found that one-third of participants diagnosed with multimorbidity were jointly diagnosed by all three definitions used. In our sample, this estimate was lower (11% -20%), but this is explained by the smaller number of chronic conditions (8 vs 27), and the standardised list of chronic conditions used in our study, while Pache et al. used a different set of conditions in each of their three definitions,. Van den Bussche et al. [26] used an identical list of chronic conditions in the same setting, albeit among different people, and found that the prevalence of individual chronic conditions was one-third lower in claims data than in primary care data.
The odds of multimorbidity in our study were found to be higher among males, those of older age and those speaking a language other than English at home. The age gradient was noticeable in both hospital and medication datasets, especially with older ages. However, the same gradient was not observed in the self-report data for those aged 85 and over, indicating a possible under-ascertainment of multimorbidity when relying on self-report data only for this age group. Males in our sample had between 7% (PBS data) and 49% (APDC data) higher odds of multimorbidity than females. This is in contrast to other Australian studies, which either found no difference [61] or higher prevalence among females [17], albeit there are differences between the study samples in each of the studies. Compared with the current study, the National Health Survey reported higher prevalence of the most common chronic conditions-hypertension, heart disease and diabetes-among males aged 45 and over [62]. People speaking a language other than English at home in our study were found to have increased odds of having multimorbidity in the administrative data but decreased odds in the survey data. These findings are novel, and have not been reported in the published literature, to the best of our knowledge. A possible explanation is that those speaking another language might have difficulties in understanding medical terminology, which translates to underreporting of conditions in the survey data.
The use of a large-scale cohort study linked with administrative data is a particular strength of our study. This allowed us to use a homogenous population and a common set of chronic conditions to explore ascertainment of multimorbidity using different data sources, which, to the best of our knowledge, has not been done before. Administrative data used in this study are available in most Australian states and territories, allowing replication of results.
Our research has implications for studies examining chronic conditions from a single data source and those examining multimorbidity. We have shown that agreement between selfreport and administrative data sources is generally poor, except for a handful of conditions, implying that morbidity and multimorbidity prevalence estimates will vary depending on which data are used. Caution should be applied whenever a single data source is used, taking care to note different levels of capture of chronic disease between data sources. Self-report studies are subject to recall bias, hospitalisation data can only capture conditions for those admitted to hospital and if they are coded during the stay, and medication data may overestimate certain conditions because drugs may have multiple indications. In the case of administrative data, extra care should be taken regarding the time period which is used to ascertain morbidity, with longer times needed to capture more conditions of interest. Choice of which data to use also depends on the purpose of the study. For example, if the aim of the study is to monitor 'active' chronic conditions, data linkage of multiple administrative data sources may be more useful than self-report of ever-diagnosis. Furthermore, our study's finding regarding different individuals, with different combinations of conditions being identified as multimorbid, depending on which datasets are used, poses a challenge when interpreting results of studies examining outcomes of multimorbidity. Careful consideration of individual conditions (which may be under-or over-reported) is needed in order to provide meaningful recommendations for patients with complex care needs.
Although this research generated interesting results, it has some limitations. We based the analyses on a limited set of chronic conditions (arthritis and osteoporosis were notable omissions) available in all three data sources, as well as the available lookback period length. The prevalence of multimorbidity would have been different if a larger set of chronic conditions or a longer lookback period was used. However, all of the conditions used in the current study are National Health Priority Areas [63] as they represent the most common long-term conditions and most commonly managed conditions by GPs [2], significantly contributing to the burden of disease in the Australian community. They are also used in the majority of previously published research [64]. We have used the longest lookback period that the data allowed (2 years), which is longer than the 1-year lookback used in some studies [54,59].
In the absence of readily available linked primary health care clinical data in Australia, and due to different levels of capture of chronic diseases in administrative datasets, we have used self-report chronic conditions as the reference when examining the concordance between data sets. Although the use of self-report data for identification of chronic disease has been cautioned by some [61], numerous other Australian studies use self-report data to ascertain multimorbidity [17][18][19][20][21]. Validation studies involving participants in the 45 and Up Study found excellent levels of agreement between self-report diabetes [65], country of birth [66] and height and weight [67]. Our data suggest that self-report may be less reliable after the age of 85 and in people speaking a language other than English at home. The use of another data source as a reference could have produced different results.
The use of administrative data poses a different set of challenges. Identification of chronic conditions using APDC data is limited to people who have been admitted to hospital, and having a chronic condition recorded if this was not directly related to the hospital stay, so it is likely to identify only the most severe cases. Medication dispensing information is dependent on the capture of data in the PBS dataset. We were limited to use of PBS-subsidised prescription medicines, which does not include over-the-counter and private prescriptions.

Conclusions
As administrative data become more widely used for research and evaluation, it is increasingly important to understand their strengths and limitations for ascertaining chronic disease and multimorbidity. This study showed that administrative data has high predictive value for identifying some chronic conditions, but that sensitivity is generally low. Further, it showed that different individuals, with different combinations of conditions, are identified as multimorbid when different data sources are used. Research that explores specific disease combinations and clusters of diseases that commonly co-occur, rather than simple disease counts, is likely to provide more useful insights into the complex care needs of individuals with multiple chronic conditions.