Age and cohort rise in diabetes prevalence among older Australian women: Case ascertainment using survey and healthcare administrative data

Background Due to the absence and or costliness of biological measures such as glycated haemoglobin, diabetes case ascertainment and prevalence studies are usually conducted using surveys or routine health service use databases. However, the use of each of these sources is associated with its limitations potentially impacting the quality of the case ascertainment and prevalence estimation. This study aimed at ascertaining diabetes cases and estimating prevalence among mid- and older-age women through simultaneous use of a longitudinal survey and multiple healthcare administrative data sources. Methods Data were available for 12,432 and 13,714 women born in 1921–26 and 1946–51 from the Australian Longitudinal Study on Women’s Health (ALSWH). Diabetes was ascertained using the ALSWH survey, health service use, and cause of death data. Parsimonious multiple logistic regression analyses tested associations between sociodemographic and health variables and the presence of diabetes. Results In both cohorts, two or more of the sources captured more than 80% of the women with diabetes. The point prevalence of diabetes increased from 8.4% when the mean age of the women were aged 73, to 22.0% of surviving women at age 90 in the 1921–26 cohort; and from 2.6% at age 48 to 15.8% at age 68 in the 1946–51 cohort. In the 1921–26 cohort, women who were obese (OR: 3.56; 95 CI: 3.04–4.17) and women who were sedentary (OR: 1.18; 95 CI: 1.09–1.40) were more likely to have diabetes compared to those who had a normal weight and engaged in a moderate level of physical activity. In the 1946–51 cohort, the odds of diabetes increased three times (OR: 2.99; 95 CI: 2.54–3.52) for overweight women and nine times (OR: 8.78; 95 CI: 7.46–10.33) for obese women compared to those who had normal weight. Conclusions The simultaneous use of multiple data sources improved the validity of diabetes case ascertainment. Application of this methodology in future studies may have important benefits including estimation of disease burden, health service needs, and resource allocation with improved precision. Diabetes prevalence increased with age, was much higher in the 1946–51 cohort than in 1921–26 at similar ages, and was significantly associated with physical inactivity and obesity. Interventions to promote physical activity and a healthy weight are needed to prevent the rising prevalence of diabetes across successive generations.


Results
In both cohorts, two or more of the sources captured more than 80% of the women with diabetes. The point prevalence of diabetes increased from 8.4% when the mean age of the women were aged 73, to 22.0% of surviving women at age 90 in the 1921-26 cohort; and from 2.6% at age 48 to 15.8% at age 68 in the 1946-51 cohort. In the 1921-26 cohort, women who were obese (OR: 3.56; 95 CI: 3.04-4.17) and women who were sedentary (OR: 1.18; 95 CI: 1.09-1.40) were more likely to have diabetes compared to those who had a normal weight and engaged in a moderate level of physical activity. In the 1946-51 cohort, the odds of diabetes increased three times (OR: 2.99; 95 CI: 2.54-3.52) for overweight women

Introduction
Diabetes is one of four major non-communicable chronic diseases identified and prioritized by the World Health Organization due to its escalating prevalence and health and economic burden [1,2]. Globally, diabetes prevalence has almost doubled (from 4.7% in 1980 to 8.5% in 2014) and the number of people living with diabetes has quadrupled (from 108 million in 1980 to 422 million in 2014) in less than four decades [3]. In 2017, 425 million of the world's population (8.8%) aged 20-79 years were living with diabetes. Currently, this figure reached 463 million (9.3%) and is expected to grow to 700 million (9.9%) by 2045 [4,5]. Type 2 diabetes, which is the focus of this study, accounts for up to 95% of the diabetes burden both globally and in Australia [6,7]. In Australia, diabetes has been identified as a "national public health priority". Its prevalence is estimated to be between 5.1% and 7.4%, affecting more than 1.2 million adults [8]. Mainly due to common complications of the disease, patients with diabetes usually have a lower quality of life, incur higher health care costs (at both a patient and health system level), and are at increased risk of premature mortality [9][10][11]. Up to 68% of Australians with diabetes develop serious complications such as cardiovascular disease, chronic kidney disease, and diabetic foot disease [12]. Diabetic foot disease is responsible for 27,600 public hospital admissions and 4400 lower extremity amputations and 1700 deaths costing the Australian health care system A$1.6 billion every year [13]. Diabetes also contributes to 10% of total deaths in Australia [12]. These figures imply that diabetes is a huge health and economic burden. Therefore, a reliable estimate of diabetes prevalence trends and identification of major determinants of observed trends is of high public health significance and may assist with the prioritization and monitoring of prevention efforts and optimal allocation of healthcare resources.
In estimating disease prevalence, it is important to pay attention to the case ascertainment method employed due to its impact on the precision of the prevalence estimation. For diabetes, obtaining biological measures such as measurement of haemoglobin A1c or oral glucose tolerance test (OGTT) for research purposes is expensive and usually not feasible. Therefore researchers often use other case ascertainment techniques such as surveys and health care administrative data sources (e.g. records of medication and other health service use or hospital use) [14]. However, there are a number of limitations to these data which may contribute to unreliable case ascertainment and biased prevalence estimation [15][16][17].
There is a possibility for recall bias in surveys especially when the recall period is long [18]. Many patients with diabetes may not be using diabetes-specific health services or prescribed diabetes medications. For instance, about a third of people with type 2 diabetes in Australia [19] and Canada [20] reported being treated with diet and lifestyle modification alone. Underreporting of diabetes in hospital admission data has been as high as 60% [21]. Moreover, most health care administrative databases have limitations such as partial coverage of the population and types of health services as well as missing records and errors in recording [22]. Therefore, there is potential to miss or misclassify some cases if using any single data source for case ascertainment and prevalence estimation [23]. Consequently, a better approach to case ascertainment is to use more than one data source [24]. In this study, we aimed to ascertain diabetes cases and estimate prevalence using longitudinal surveys and health care administrative data sources among women born in 1921-26 and 1946-51.

Study design and data sources
This study involved data from the 1921-26 and 1946-51 cohorts of the Australian Longitudinal Study on Women's Health (ALSWH). These two cohorts were recruited in 1996 using the Medicare database as a sampling frame and were aged 70-75 and 45-50, respectively, at the time of recruitment. Since 1996, women in both cohorts have been surveyed on a three-yearly rolling schedule, and the 1921-26 cohort women have completed six monthly surveys since 2011. The ALSWH survey data have been linked to multiple health care administrative databases including hospital admission, medication use, other health services use, cancer registry and death data. Detailed information on the ALSWH and linked data sources have previously been reported [25,26] and are available on the study website: www.alswh.org.au. For this study, four main data sources were used to identify diabetes cases: the ALSWH survey, Pharmaceutical Benefits Scheme (PBS), Medicare Benefits Schedule (MBS), and Admitted Patients' Data Collection datasets. The cause of death data from the National Death Index database was used as a fifth confirmatory source of data where necessary.

ALSWH survey dataset
From the ALSWH survey data, a woman was considered to have diabetes if she responded "yes" to the question "Have you ever been told by a doctor that you have: diabetes (high blood sugar)?" in the first survey; or "In the last three years, have you been diagnosed with or treated for: diabetes?" in any of the later surveys. This analysis included up to six surveys for 12,432 women in the 1921-26 cohort and up to eight surveys for 13,714 women in the 1946-51 cohort. In survey 2 and survey 3, women from the 1946-51 cohort were asked to differentiate between type 1 and type 2 diabetes. It should be noted that most (for instance >99%) of the women in the 1946-51 cohort, had type 2 diabetes. Therefore, this study mainly focused on type 2 diabetes even though there is a possibility that a few women with type 1 diabetes may have been included.

Medicare Benefits Scheme (MBS) dataset
MBS is designed to ensure access to medical services for all Australian citizens, permanent residents, and visitors from countries having reciprocal health care agreement with Australia [27]. Diabetes cases were identified from the MBS using item numbers for diabetes-specific services included under three major headings: 1) Annual Diabetes Cycle of Care; 2) HbA 1C and fructosamine testing for diabetes monitoring, and other diabetes services including eye examination and health education for patients with diabetes; and 3) group allied health services for people with type 2 diabetes.

Pharmaceutical Benefits Scheme (PBS) dataset
PBS is an arrangement through which the Australian Government subsidizes the cost of most prescription medicines for eligible residents and visitors [28]. Identification of cases from the PBS dataset was performed using the World Health Organization's Anatomic and Therapeutic Chemical Classification (ATC) codes. The medications indicating diabetes were: 1) insulin and insulin analogs (ATC codes: A10A) and; 2) blood glucose-lowering drugs (ATC codes: A10B).

Hospital admission dataset
The Admitted Patients' Data Collection (APDC), also called the hospital admission dataset, contains comprehensive data on hospital episodes. These include dates of admission and separation, and primary and secondary diagnoses based on the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) codes, procedures performed and inpatient stay costs [29]. Diabetes specific ICD-10-AM codes (E10, E11, E12, E13, and E14) were used to identify women with diabetes.

Cause of death dataset
The National Death Index is an Australian Government database that contains records of all deaths in Australia since 1980. The data include name, sex, date of birth, and death details including cause of death and associated ICD-9 or ICD-10 codes [30]. This dataset was used only as supplemental data to ascertain diabetes status for women who self-reported diabetes only once or who had only a single record of health service or medication use indicating the presence of diabetes.

Diabetes case definition
After compiling the diabetes indicators from each of the four data sources for the period 1996 to 2016, the following algorithms were used to assess the corroboration of the sources and have a robust ascertainment of diabetes status. A woman was considered to have diabetes if she had any of the following: i) a diabetes indicator in the hospital dataset; ii) at least one diabetes indicator from any two or more of the four main data sources; iii) two or more different PBS records for diabetes-specific medications; iv) two or more different MBS records for diabetes-specific MBS services; v) only one record specific to diabetes in PBS or MBS dataset, but had diabetes as a cause of death, or vi) reported diabetes in the ALSWH survey two or more times or reported only once but with diabetes coded as a cause of death.
Women who reported diabetes only at one survey, or who had only one PBS/MBS diabetes indicator and who had no other indicator of diabetes, even in the cause of death dataset, were considered to have uncertain diabetes status. These women were excluded from the analysis.

Explanatory variables
Sociodemographic variables measured at baseline (Survey 1) included the area of residence (major cities, inner regional and outer regional/remote/very remote), highest educational qualification (year 12 or below, trade certificate/diploma, and a university degree or above), marital status (partnered versus not partnered), body mass index (BMI) [underweight (BMI<18.5), normal weight (18.5�BMI<25), overweight (25�BMI<30) and obese (BMI�30)], difficulty managing on income (easy, difficult), smoking status (never smoker, exsmoker, and current smoker), and private hospital insurance (yes or no). The level of physical activity (nil/sedentary, low, medium, and high) was measured at Survey 2. Missing responses at Survey 1 were backfilled by responses from Survey 2 or Survey 3 where logical.

Statistical analyses
Venn diagrams were used to present diabetes cases identified from different data source combinations. Chi-squared tests were used to compare baseline characteristics of the women with and without diabetes. The likelihood of having diabetes was adjusted for sociodemographic, behavioral, and health variables using multiple multiple logistic regression models. The backward selection method at p<0.05 identified the most significant factors in parsimonious models. All statistical analyses were performed in SAS 9.4 (SAS Institute, Cary, NC, USA).

Ethical considerations
The ALSWH has ongoing ethical clearance from the Human Research Ethics Committee of Universities of Newcastle and Queensland (approval numbers H0760795 and 2004000224, respectively), and all participants signed informed consent on joining the study. Ethical approval for linkage of the ALSWH survey data is also covered by the Universities (approval numbers H20110371 and 2004000224, respectively) as well as the NSW Population and Health Services Research Ethics Committee and other equivalent Committee for Admitted Patients Collections. Linkage to the National Death Index database was approved by the Australian Institute of Health and Welfare Ethics Committee.

Case ascertainment
Diabetes cases identified from one or more of the data sources are presented in Fig 1. Of the 12432 and 13714 women in the 1921-26 and the 1946-51 cohorts, 890 and 1027, respectively, had uncertain diabetes status during the period 1 January 1996 to 31 December 2016. These women were excluded from the analysis (Fig 2). As shown in Fig 1 and S3 Table, the highest proportion of the diabetes cases were captured by the MBS data (79.6% and 86.5% in the 1921-26 and 1946-51 cohort women, respectively). The inclusion of survey and PBS datasets, respectively, captured the remaining 12.2% and 3.8% (in the 1921-26 cohort) and 9.3% and2.9% (in the 1921-26 cohort) of the diabetes cases.
1921-26 cohort. After exclusion of uncertain cases, 2667 of the remaining 11542 women (i.e. 23.1%) were identified as having diabetes. Out of these women, 1691 (63.4%) were identified by both the ALSWH survey and at least one of the administrative data sources (MBS, PBS, or hospital) (Fig 3).
1946-51 cohort. 2037 of the remaining 12687 women in this cohort (16.1%) had diabetes. Diabetes was identified through survey data and at least one of the administrative sources for 1297 (63.7%) of the cases (Fig 3). Table 1 presents a comparison of baseline characteristics for women who had and did not have diabetes (excluding women with uncertain diabetes). A significantly greater percentage of women with year 12 or below qualifications, those who were overweight or obese, not engaged in physical activity, and who had difficulty managing their income were identified with diabetes. Furthermore, a higher percentage of women without a partner (45.6% vs 42.8%) in the 1921-26 cohort; and those who were current smokers (22.0% vs 17.6%) and living in outer regional/remote/very remote areas (26.4% vs 25.1%) in the 1946-51 cohort had diabetes (p<0.0001).

Prevalence
As shown in Fig 4, the point prevalence of diabetes among the 1921-26 cohort women increased from 8.4% when the mean age of the women was 73 and reached 22.0% of surviving women when they were aged 90. In the 1946-51 cohort, the prevalence of diabetes increased from 2.6% when the women were aged 48 and reached 15.8% when they were aged 68. The prevalence in the 1946-51 cohort at age 58 was already higher than the prevalence in the 1921-26 cohort at age 73 (10.5% vs 8.4%, respectively).

Predictors of having diabetes
As shown in Table 2, women who were overweight and obese at baseline had greater odds of having diabetes compared to those women with normal BMI, increasing nine-fold for women

Discussion
This study ascertained diabetes cases among Australian women born in 1921-26 and 1946-51 using longitudinal survey data and multiple health care administrative data sources including hospital admission, PBS, and MBS. We found a high level of concordance between survey and health care administrative data sources in identifying patients with diabetes. The study showed the added advantage of simultaneously using multiple data sources to improve the reliability and completeness of diabetes case ascertainment and precision of prevalence estimation even though some women with diabetes may have been falsely classified as uncertain. In the 1921-26 cohort women, diabetes prevalence increased from 8.4% at a mean age of 73 to 22.0% when they were aged 90. In the 1946-51 cohort women, there were relatively greater increments in diabetes prevalence at earlier ages. In both cohorts being overweight, being obese, and being sedentary were significantly associated with an increased likelihood of having diabetes, with the impact of these affecting at earlier ages for the 1946-51 cohort. These findings have important implications for the prevention and management of diabetes. Compared to other research that employed a single survey or administrative database [31,32], a more robust ascertainment of diabetes cases was achieved in this study through the simultaneous use of four main data sources. This has resulted in improved completeness and validity of the diabetes case ascertainment. For instance, if we were using only the MBS dataset, the dataset which captured the highest proportion of cases in both cohorts, we would have missed 20.4% (1921-26 cohort) and 13.5% (1946-51 cohort) of the cases. If this was done, it would underestimate the diabetes burden and also potentially the health service planning and resource allocation for the disease. The finding that approximately 20% of women with diabetes had diabetes indicators in all of the four data sources indicates the improvement in the validity of the case ascertainment. This proportion is high compared to a previous study that used similar data sources to ascertain dementia among women in the 1921-26 cohort, which found only 2-3% of the women being identified by four of the data sources [24]. This suggests the existence of better concordance between these data sources in identifying diabetes cases compared to other diseases such as dementia. Other studies that showed higher agreement between data sources for diabetes than for other conditions (eg. hypertension) suggest that this may be due to diabetes requiring ongoing self-management and regular health service use [33,34].
This study also highlights the value of longitudinal, population-based data collection as part of routine health care service provision to be used in research. The case ascertainment methodology developed in this study can be applied in other settings and disease conditions resulting in enhanced reliability of case ascertainment and precision of prevalence estimation. Such precise disease prevalence estimates may have numerous practical benefits including correctly estimating disease burden, planning of needed health services, and resource allocation. This is particularly important to diseases where 'gold standard' methods of ascertainment are not feasible for widespread monitoring, but where disease indicators can be found in multiple health care administrative data sources. Our study confirmed cohort differences and identified increasing agewise trends in diabetes prevalence previously hypothesized in cross-sectional studies [35][36][37]. The prevalence of diabetes continuously increased with age during the entire follow-up period among both cohorts of women. However, women in the 1946-51 cohort had prevalence levels that were similar to the prevalence observed among the 1921-26 cohort women when the former were aged 15-20 years younger. Even though adjustment for overweight/obesity narrowed the gap in the prevalence of diabetes in the two cohorts, the prevalence was still higher among the 1946-51 cohort women. Higher diabetes prevalence patterns at younger ages were also observed among the 1973-78 and 1989-95 cohorts of ALSWH [38]. This may mean that future cohorts of women are expected to have a higher prevalence of diabetes at much younger ages.
The odds of having diabetes were observed to be higher for women who were overweight or obese, those who had difficulty managing their available income, those who were current smokers, and women who had a lower level of education. Among the 1921-26 cohort women, being overweight and obese as well as having a sedentary lifestyle was associated with increased odds of having diabetes. These findings are in line with previous studies [39,40]. However, what is worth noting is that especially for the 1946-51 cohort women, being obese had a highly pronounced (~9 times) association with having diabetes compared to women who had a normal weight. In the 1946-51 cohort being a current smoker and reporting difficulty managing on their income also increased the odds of diabetes while having a university level of education was associated with a decrease in odds of diabetes. The relatively higher diabetes prevalence and stronger association with obesity in this cohort of women mirror the fact that these women had both the highest baseline BMI and also weight gain over time compared to all other three cohorts of ALSWH [41].
These increasing trends in diabetes prevalence with age are in line with other studies [35,42,43]. For instance, a longitudinal study of diabetes prevalence in the US-based on survey and measurement of glycated haemoglobin levels for undiagnosed diabetes-reported a 72% overall increase in prevalence between 1999 and 2016, with 40% of this increase attributed to change in BMI and population aging [44]. In these data, diabetes prevalence rose with age within each birth cohort, and also increased between the birth cohorts [45]. Similarly, a study from Mexico shows large cohort increases in the prevalence of diabetes, likely due to changes in nutrition and physical activity [46]. An earlier study in Australia found increasing trends in age-standardized prevalence of diabetes from 2002 (6.2%) to 2013 (7.9%), with strong cohort effects that mirror cohort patterns for obesity. However, the study did not discuss if correction for differences in overweight/obesity prevalence changes the pattern of diabetes prevalence across cohorts [47]. Other explanations for the increasing diabetes prevalence over time, found here and in other studies, including improved diabetes diagnosis techniques, better disease management, longer patient survival, population aging, and or a real increase in diabetes incidence.
The increasing prevalence of diabetes in younger cohorts sends an important message to healthcare practitioners and policymakers responsible for the planning of disease prevention and management interventions. There are multiple avenues to prevent diabetes in younger cohorts, particularly related to achievement and maintenance of normal weight and engagement in physical exercise. Such lifestyle modifications were associated with up to 58% reduction in the incidence of type 2 diabetes among people with impaired glucose tolerance, 40-50% of whom would normally progress to diabetes [48,49]. Similarly, pharmacological interventions were effective in the prevention of diabetes in people with impaired glucose tolerance [49,50].

Strengths and limitations
The use of multiple data sources to ascertain diabetes cases is a major strength of this analysis, and enhanced the completeness of case ascertainment. The selection of two large-sized population-representative cohorts of women was another strength of the study. The prospective study design may also have reduced recall bias in the surveys. An important limitation of this study is related to the assumptions made to construct the diabetes case identifying algorithms. For instance, if a woman had any diabetes indicator in her hospital admission records, she was considered as having diabetes without any further investigation. However, our approach should result in more validity and complete case ascertainment compared to other studies, which used a single method to identify people with diabetes. Another limitation of this study is the possibility for some women with diabetes to be classified as false negative or as having uncertain diabetes status due to the stringent case ascertainment algorithms we employed. However, the women who were classified as having uncertain diabetes status appeared to be dissimilar from both of the other groups i.e. the women with and without diabetes. Despite this, we believe this would be one of the most reliable non-biomarker based methods of diabetes case ascertainment and prevalence estimation.

Conclusions
This study demonstrated the importance of using multiple data sources to improve the completeness of diabetes case ascertainment. The findings of this study imply that there is a large potential for health care administrative data sources to be used in diabetes and other chronic disease ascertainment and precise prevalence estimation. This may have a huge practical importance in improving the precision of disease burden and health care need estimation. The study found that diabetes prevalence has been continuously increasing with age in both cohorts of women during the period 1996 to 2016. However, higher prevalence rates were observed among the 1946-51 cohort women at a much earlier age compared to the 1921-26 cohort. Having diabetes was significantly associated with lifestyle factors such as physical activity and obesity. A stronger association was observed between being obese and having diabetes in the 1946-51 cohort. The findings imply that the prevalence of diabetes will continue to increase in future cohorts of women. This emphasizes the need for interventions aimed at diabetes prevention, early identification, and improved management. Importantly, addressing overweight and obesity management and increasing physical activity should be part of diabetes prevention efforts.