Development of Methodology for Disability-Adjusted Life Years (DALYs) Calculation Based on Real-Life Data

Background Disability-Adjusted Life Years (DALYs) have the advantage that effects on total health instead of on a specific disease incidence or mortality can be estimated. Our aim was to address several methodological points related to the computation of DALYs at an individual level in a follow-up study. Methods DALYs were computed for 33,507 men and women aged 20–70 years when participating in the EPIC-NL study in 1993–7. DALYs are the sum of the Years Lost due to Disability (YLD) and the Years of Life Lost (YLL) due to premature mortality. Premature mortality was defined as death before the estimated date of individual Life Expectancy (LE). Different methods to compute LE were compared as well as the effect of different follow-up periods using a two-part model estimating the effect of smoking status on health as an example. Results During a mean follow-up of 12.4 years, there were 69,245 DALYs due to years lived with a disease or premature death. Current-smokers had lost 1.28 healthy years of their life (1.28 DALYs 95%CI 1.10; 1.46) compared to never-smokers. The outcome varied depending on the method used for estimating LE, completeness of disease and mortality ascertainment and notably the percentage of extinction (duration of follow-up) of the cohort. Conclusion We conclude that the use of DALYs in a cohort study is an appropriate way to assess total disease burden in relation to a determinant. The outcome is sensitive to the LE calculation method and the follow-up duration of the cohort.


Introduction
In the last decades improved survival of patients with several chronic diseases has led to an increase in life expectancy. As a consequence, the prevalence of chronic diseases, such as cardiovascular diseases and cancer has grown [1,2]. Therefore, assessing disease burden by either morbidity rates or mortality rates for individual diseases may result in different conclusions. Summary health measures, combining morbidity and mortality, may offer better insight in the true burden of chronic diseases.
The World Health Organization and the World Bank developed such a composite measure called Disability-Adjusted Life Years (DALYs) [3,4]. The primary reason was to create a single estimate that aggregates the burden of disease on population level in Global Burden of Disease studies [5,6]. Additionally DALYs were used to define how many DALYs were attributable to several lifestyle factors [7]. DALYs are also used in economic evaluations and risk-benefit assessments [5,8,9].
DALYs are the sum of the Years Lost due to Disability (YLD) and the Years of Life Lost (YLL) due to premature mortality [10]. The YLD in a population are calculated by the number of years persons live with a disability multiplied by a disability weight reflecting the severity of the disability. This weight varies between 0 (no burden) and 1 (mortality). The YLL are computed as the number of years that death occurs earlier than expected. The expected number of life years is often set equal to the statistical life expectancy at birth or to the remaining years that a person of a certain age may be expected to live on average. One DALY represents the loss of one year in full health.
The burden of risk factors such as smoking or alcohol is traditionally expressed in terms of relative risks or attributable risk fractions for a specific disease. The use of summary measures has the advantage that the association with total instead of a specific disease burden is estimated. Therefore, risk factors with small effects on several diseases might be more easily identified as important for public health. Summary measures will also help estimating the overall effect of risk factors that may have opposing effects on different diseases.
So far, DALYs were mainly calculated on a population level based on statistical population data on disease incidence and mortality, but not prospectively associated with risk factors. Reallife follow-up data enables us using observed instead of modeled data with regard to risk factors, incidence and mortality event rates and confounding factors. This allows us to investigate relations of detailed risk factor information with overall disease burden. We introduce the method of DALY computation based on individual data from an ongoing follow-up study to estimate the association between risk factors and total disease burden using DALYs. Our aim is to address some methodological issues related to the computation of DALYs in real-life follow-up data from a left-and right-truncated cohort. We use the estimated number of DALYs for smoking as an example.

Methods
We calculated DALYs for each individual in the cohort based on the YLD and YLL. Important parameters for the calculation of YLD and YLL are: disability weight of a disease, time of onset (duration) of a disease, time of death and expected time of death. The details of the calculation method including the ascertainment of the disease endpoints, calculation of the life expectancy and the analysis in relation to smoking status are presented.
The DALYs were computed for 33,507 participants of the Dutch part of the European Prospective Investigation into Cancer and Nutrition (EPIC-NL) study [11]. All participants provided written informed consent before study inclusion. The study complies with the Declaration of Helsinki and was approved by the institutional board of the University Medical Center Utrecht (Prospect) and the Medical Ethical Committee of TNO Nutrition and Food Research (MORGEN). More details about the EPIC-NL cohort are available as supporting information (EPIC-NL description S1).
Participants were followed for mortality and morbidity through linkage with several registries. Information on vital status and the date of death was obtained through linkage with municipal registries. The cause of death was obtained from Statistics Netherlands. Information on cancer occurrence was obtained from the National Cancer Registry. Other disease occurrence (Coronary Heart Disease, Cerebrovascular Accident, Diabetes Mellitus, Chronic Obstructive Pulmonary Disease, Asthma, Parkinson's disease, Rheumatoid Arthritis, Osteoarthritis, and Inflammatory Bowel Disease) was obtained from the national hospital discharge diagnosis database from the Dutch National Medical Registry and self-report. Additional information is available as supporting information (EPIC-NL description S1). Follow-up was complete until 31 December 2007.

Years Lost due to Disability (YLD)
YLD are the number of years lived with a disability multiplied with a disability weight reflecting the severity of the disability. These weights were derived from the Dutch Disability Weight study and can range from 0 (no loss) to 1 (death) [10,12]. For some types of cancer that were registered within EPIC-NL, disability weights were not available from the Dutch Disability Weight study [12]. The disability weights for thyroid-, brain-and bone cancer were based on weights that were derived within the Australian burden of disease study [13]. Other cancer types were assigned the same disability weight as a type of cancer with assumed comparable severity by the authors including a cancer epidemiologist and a quality of life expert (see Table S1).
Some EPIC-NL participants (n = 1,080) developed multiple diseases (comorbidity). No disability weights are available for years lived with multiple diseases. We used a multiplicative method to estimate weights for comorbid conditions which is in line with the method used in the Global Burden of Disease calculation [14,15].

Years of Life Lost (YLL)
YLL are defined as the number of years that death occurred earlier than the age the person was expected to die if he/she had The life expectancy of a person can be calculated in several ways. Basically, the estimated life expectancy depends on two factors, i.e. attained age and calendar year in which this age is attained. Younger people have longer life expectancies than elderly, but life expectancy also changed over past calendar years, i.e. life expectancy is higher for later calendar years. Both, age and reference calendar year may be assumed to be either constant or variable, i.e. the same for each individual or depending on the individual. Combining age (constant or variable) and reference year (constant or variable) resulting in four combinations as basis for the life expectancy calculation. We chose a logical constant and variable value for age and for reference year. We compared these four methods using gender specific mortality rates from Statistics Netherlands [16]. Life expectancies for birth year were only available from 1950 onwards. Therefore, participants born before 1950 were assigned the same life expectancy as participants born in 1950 (method 1). Participants who are still alive beyond their estimated life expectancy will not obtain DALYs for the period they lived longer than their estimated life expectancy (method 1 and 2).

Statistical Analysis
The use of DALYs as a continuous outcome of disease burden is illustrated for current-and former-smokers compared to neversmokers (reference) as defined at baseline. Due to the large number of healthy participants with 0 DALYs at the end of followup the outcome was not normally distributed, i.e. a peak at 0 and a normal distribution in participants with DALYs.0. Therefore, we used a two-part model [17] to estimate the relationship between smoking status and DALYs. A two-part model combines the probability of DALYs estimated using logistic regression with the number of DALYs per smoking category among participants with DALYs estimated using linear regression. Confidence intervals were constructed with bootstrapping. The relationships were adjusted for age, sex, BMI, physical activity, education, ethanol and energy intake (all measured at recruitment).
We conducted additional analyses stratifying for sex and two age categories (age at cohort entrance: ,50, $50). To evaluate the effect of extending follow-up time we conducted additional analyses to calculate DALYs for different follow-up periods, until 2001 (average 6.4 years of follow-up), 2003 (average of 8.4 years) and 2005 (average of 10.4 years). Furthermore, DALYs were computed for specific diseases to assess which disease causes the highest attributable disease burden. For these comparisons we chose to use method 4 to estimate the life expectancy, because we believe this method is the most realistic one in the sense that it measures health losses as they actually occur.
All statistical analyses were conducted using SAS 9.2 (SAS Institute, Cary, US).

Results
The mean follow-up of the 33,507 EPIC-NL participants was 12.4 years, 6,741 participants were identified with a non-fatal disease of interest and 1,504 died during this follow-up period. The total disease burden for the entire follow-up period was 69,245 DALYs; 40,861 (59%) healthy years were lost due to disability (YLD) and 28,384 (41%) years were lost due to premature death (YLL) (   (table 4). Stratified analyses showed that the number of DALYs for smokers compared to never-smokers were highest in the participants aged $50 years. Male smokers lost more healthy years (1.50 DALYs) than female smokers (1.12 DALYs) compared to male and female never-smokers. Compared to never-smokers currentsmokers lost most healthy years due to Coronary Heart Disease, cancer, Cerebrovascular Accident, and Chronic Obstructive Pulmonary Disease, which thus drive the association of smoking status with DALYs.

Discussion
This paper presents several methodological issues related to the computation of DALYs in an on-going follow-up study. We show that the method of life expectancy calculation i.e. based on which age and which calendar year, drive the DALY estimates. In addition, length of follow-up is another important factor in the calculation of the total disease burden, with longer follow-up (i.e. larger extinction of the cohort) leading to larger effect sizes. We observed that during a period of 12 years -in an on average 49 years old population-smokers lost 1.28 healthy years of their life (1.28 DALYs) compared to never-smokers.
We compared four different methods to calculate life expectancy. The methods resulted in different estimations for the absolute number of DALYs, but an association between smoking status and DALYs was clear for all methods.
Method 1 in which age is constant (0) and the reference calendar year is variable (birth year) has the disadvantage that people who die at the same age from the same disease and who are equally exposed to the same risk factor but who are born in different years are assigned a different number of years lost. Consequently, the effect assigned to the risk factor depends on the birth years of the participants in a cohort. For example, in a young cohort (born later) the impact of risk factors increase. Persons who live beyond their life expectancy (as well as with method 2) obtain zero DALYs for the period after their life expectancy. The period someone lives beyond the life expectancy could also be seen as health gain (negative DALYs). When including negative DALYs in the analysis this only slightly changes the results for the association with smoking status (data not shown). Method 2, in which age and reference calendar year are constant, has a constant but arbitrary life expectancy for all individuals, possibly stratified by sex. The advantage is that persons who die at the same age lose the same number of life years and studies can be compared based on the same endpoint. However it is hard to imagine that researchers would chose an arbitrary life expectancy which has no plausible link to the participants in their cohort. Method 3 in which age is variable (age at death) and the reference calendar year is constant (1995) has the disadvantage that the reference year is arbitrary. Furthermore, the effect of a risk factor is underestimated. Consider two people born in the same year but one dying early due to being exposed to a risk factor. The person dying later could lose almost as much life years (YLL) because relative life expectancies increase with age. Method 4 in which both age (age of death) and reference year (year of death) are variable has the disadvantages of both method 1 and 3 but it is the most accurate in the amount of life years a person actually loses when he dies.
We propose to calculate the remaining life expectancy at time of   death or when alive, at end of follow-up (method 4) because it is the most realistic method in the sense that the years of life lost for an individual are the best estimate at the time of his death or end of follow-up. Moreover, the differences between the methods turned out small, so it seems not to matter too much which method is used. The number of DALYs for smokers is lower than several estimates previously reported [18,19]. Several issues should be taken into consideration. First, not all relevant diseases were incorporated in our analysis, and for other major diseases only severe cases resulting in hospitalization were included. In addition, the registered date of onset of disease (date of hospital discharge) is likely to be later than the true date of onset when the disease started to contribute to disease burden. However, we did include those diseases that are most strongly associated with smoking.
Another important point is that incidence data are not complete due to the truncated nature of the ongoing cohort study. For a complete view of any effect the ideal cohort for these calculations would be the follow-up of a birth cohort until extinction. In such a study population, DALYs before baseline and after the end of follow-up are directly observed. We did not have access to an extinct birth cohort, so we had to make assumptions with regard to the YLD and YLL calculation of participants still alive at the end of follow-up. YLL for subjects still alive at end of follow-up was set at zero, assuming subjects all live until the estimated date of their life expectancy. If they were healthy at the end of follow-up they are treated as remaining healthy until their expected age of death. However, these people likely will also develop a disease before dying, so underestimating numbers of DALYs. For those participants who got a disease during follow-up (before December 2007) the YLD was calculated assuming they reach the estimated life expectancy, i.e. a date after 2007. Unfortunately there were no life tables available for specific patient groups therefore we used the same life table for all participants. In reality, participants alive at the end of follow-up may die before reaching their expected time of death, and those with a disability are more likely to die earlier, but since these premature deaths are not yet observed they were not included in the calculation. Presumably, as demonstrated in our sensitivity analysis, longer follow-up will increase the number of DALYs. The longer the follow-up period, ideally until extinction of the cohort, the greater the accuracy of the calculated association between smoking status and DALYs. The current reported loss for smokers ignores loss before study entry, refers to an observation period of only 12.4 years, underestimates DALYs due to hospital discharge dates instead of earlier incidence dates, and assumes for persons alive at end of follow-up, with or without disease, that they will live until estimated life expectancy.
We excluded participants with prevalent disease possibly related to smoking behavior at baseline. Altogether, in this relatively young and healthy cohort most participants survived more than 12 years of follow-up. Consequently, smokers who were still healthy after 12.4 years have as much DALYs as healthy never-smokers who were still healthy at the end of follow-up whereas the smokers probably will develop more disease later on and die earlier after truncation.
In conclusion, we present a methodology to calculate DALYs from real-life data. This summary outcome measure can be used to assess the prospective relationship between a determinant such as smoking status and disease burden in a cohort study. DALYs have the advantage that risk factors with small effects on many diseases can be more easily identified and the overall effect of risk factors that may have opposing impacts on different diseases can be estimated. The outcome is sensitive to two factors, the assumption used for computing the life expectancy and the follow-up time that determines the number of deaths and survivors at the end of this time. The longer the follow-up, the completer the outcome picture approaches the ultimate outcome. We believe that the use of DALYs in a prospective cohort is an appropriate way to explore the association between different lifestyle determinants and total disease burden. EPIC-NL description S1.