lillies: An R package for the estimation of excess Life Years Lost among patients with a given disease or condition

Life expectancy at a given age is a summary measure of mortality rates present in a population (estimated as the area under the survival curve), and represents the average number of years an individual at that age is expected to live if current age-specific mortality rates apply now and in the future. A complementary metric is the number of Life Years Lost, which is used to measure the reduction in life expectancy for a specific group of persons, for example those diagnosed with a specific disease or condition (e.g. smoking). However, calculation of life expectancy among those with a specific disease is not straightforward for diseases that are not present at birth, and previous studies have considered a fixed age at onset of the disease, e.g. at age 15 or 20 years. In this paper, we present the R package lillies (freely available through the Comprehensive R Archive Network; CRAN) to guide the reader on how to implement a recently-introduced method to estimate excess Life Years Lost associated with a disease or condition that overcomes these limitations. In addition, we show how to decompose the total number of Life Years Lost into specific causes of death through a competing risks model, and how to calculate confidence intervals for the estimates using non-parametric bootstrap. We provide a description on how to use the method when the researcher has access to individual-level data (e.g. electronic healthcare and mortality records) and when only aggregated-level data are available.


S2a Appendix: How to assess whether small numbers could influence largely the results
The total excess Life Years Lost measure is based on non-parametric survival curves, such as the ones obtained with the Kaplan-Meier estimator. If there is right censoring (as in most of the cases), it could happen that only a small number of persons is at risk of dying after a certain age. For example, assume that 70% of the population is still alive at age 80 years, i.e. S(80)=0.7; however, only two persons are still at risk in the specific data available. If one of them dies, the survival curve will immediately drop to 35%. It is important to consider what is an acceptable number of persons at risk n, considering that the weight each of these persons has on the survival curve is 1/n. LYL <-lyl_range(data = diseased, t0 = age_disease, t = age_death, status = cause_death , age_begin = 0, age_end = 94, tau = 95)

lyl_checkplot(LYL)
In this example, we see that the number of persons at risk increases rapidly and we have an acceptable number of persons even in the oldest ages of follow-up. Note that small numbers in the early ages are not as problematic as small numbers in the later ages, given that they are not used for many analyses. For example, persons at risk age 1 year, are only included when estimating age-specific Life Years Lost at ages 0 and 1 (and the estimates will have a low weight because they depend on the number of cases). However, persons at risk at age 85 years are included in all age-specific analyses up until 85 years.
lillies: an R package for the estimation of excess Life Years Lost S2 Appendix Page 3

S2b Appendix: Determine the number of iterations for bootstrap confidence intervals
We can use non-parametric bootstrap to provide confidence intervals for the estimates. In order to assess whether 1,000 simulations are sufficient to ensure a reliable confidence interval, it is possible to draw the estimates considering the first 1, 2, …, 999, or all 1,000 simulations.

plot(LYL_ci, weights = diseased$age_disease)
Given that the variability in the estimates is reduced after approximately 400 simulations, it is reasonable to assume that the 95% confidence intervals are accurate. However, a larger number of simulations would provide an even more reliable estimate, especially for Life Years Lost associated with unnatural deaths. lillies: an R package for the estimation of excess Life Years Lost S2 Appendix Page 4

S2c Appendix: Interpretation of negative excess Life Years Lost
The Life Years Lost measure quantifies, for the group of patients with a disease, the average remaining life expectancy from the age at disease onset subtracted from a set reference age, for example 95 years. For the general population, the measure quantifies the average remaining life expectancy subtracted from 95 years for subjects alive at ages corresponding to the age-at-onset distribution for those with the disease. The difference between Life Years Lost of the two groups, the general population and those with the disease, we denote excess Life Years Lost, and may be interpreted as the number of years that people with the disorder lose in excess of that found in the general population. It could be the case that those with a disease experience less Life Years Lost for some specific causes than those experienced in the general population; for example, it has been shown that those with mental disorders experience less neoplasm-related Life Years Lost than the general population. [1][2][3] The number of excess Life Years Lost due to neoplasms among those with mental disorders is therefore a negative number.