Disease-related income and economic productivity loss in New Zealand: A longitudinal analysis of linked individual-level data

Background Reducing disease can maintain personal individual income and improve societal economic productivity. However, estimates of income loss for multiple diseases simultaneously with thorough adjustment for confounding are lacking, to our knowledge. We estimate individual-level income loss for 40 conditions simultaneously by phase of diagnosis, and the total income loss at the population level (a function of how common the disease is and the individual-level income loss if one has the disease). Methods and findings We used linked health tax data for New Zealand as a high-income country case study, from 2006 to 2007 to 2015 to 2016 for 25- to 64-year-olds (22.5 million person-years). Fixed effects regression was used to estimate within-individual income loss by disease, and cause-deletion methods to estimate economic productivity loss at the population level. Income loss in the year of diagnosis was highest for dementia for both men (US$8,882; 95% CI $6,709 to $11,056) and women ($7,103; $5,499 to $8,707). Mental illness also had high income losses in the year of diagnosis (average of about $5,300 per year for males and $4,100 per year for females, for 4 subcategories of: depression and anxiety; alcohol related; schizophrenia; and other). Similar patterns were evident for prevalent years of diagnosis. For the last year of life, cancers tended to have the highest income losses, (e.g., colorectal cancer males: $17,786, 95% CI $15,555 to $20,018; females: $14,192, $12,357 to $16,026). The combined annual income loss from all diseases among 25- to 64-year-olds was US$2.72 billion or 4.3% of total income. Diseases contributing more than 4% of total disease-related income loss were mental illness (30.0%), cardiovascular disease (15.6%), musculoskeletal (13.7%), endocrine (8.9%), gastrointestinal (7.4%), neurological (6.5%), and cancer (4.5%). The limitations of this study include residual biases that may overestimate the effect of disease on income loss, such as unmeasured time-varying confounding (e.g., divorce leading to both depression and income loss) and reverse causation (e.g., income loss leading to depression). Conversely, there may also be offsetting underestimation biases, such as income loss in the prodromal phase before diagnosis that is misclassified to “healthy” person time. Conclusions In this longitudinal study, we found that income loss varies considerably by disease. Nevertheless, mental illness, cardiovascular, and musculoskeletal diseases stand out as likely major causes of economic productivity loss, suggesting that they should be prioritised in prevention programmes.


Methods and findings
We used linked health tax data for New Zealand as a high-income country case study, from 2006 to 2007 to 2015 to 2016 for 25-to 64-year-olds (22.5 million person-years). Fixed effects regression was used to estimate within-individual income loss by disease, and cause-deletion methods to estimate economic productivity loss at the population level.
Income loss in the year of diagnosis was highest for dementia for both men (US$8,882; 95% CI $6,709 to $11,056) and women ($7,103; $5,499 to $8,707). Mental illness also had high income losses in the year of diagnosis (average of about $5,300 per year for males and $4,100 per year for females, for 4 subcategories of: depression and anxiety; alcohol related; schizophrenia; and other). Similar patterns were evident for prevalent years of diagnosis. For the last year of life, cancers tended to have the highest income losses, (e.g., colorectal cancer males: $17,786, 95% CI $15,555 to $20,018; females: $14,192, $12,357 to $16,026).
The combined annual income loss from all diseases among 25-to 64-year-olds was US $2.72 billion or 4.3% of total income. Diseases contributing more than 4% of total disease-related income loss were mental illness (30.0%), cardiovascular disease (15.6%), musculoskeletal (13.7%), endocrine (8.9%), gastrointestinal (7.4%), neurological (6.5%), and cancer (4.5%). The limitations of this study include residual biases that may overestimate the effect of disease on income loss, such as unmeasured time-varying confounding (e.g., divorce leading to both depression and income loss) and reverse causation (e.g., income loss leading to depression). Conversely, there may also be offsetting underestimation biases, such as income loss in the prodromal phase before diagnosis that is misclassified to "healthy" person time.

Conclusions
In this longitudinal study, we found that income loss varies considerably by disease. Nevertheless, mental illness, cardiovascular, and musculoskeletal diseases stand out as likely major causes of economic productivity loss, suggesting that they should be prioritised in prevention programmes.

Author summary
Why was this study done?
• Quantifying income loss from incident or prevalent disease helps generate a fully rounded burden of disease on society.
• These income losses are often used to estimate productivity losses and conversely can be used to quantify productivity gains in future cost-effectiveness studies of treatments and prevention.
• However, existing income loss studies are often just for one disease at a time, making them both noncomparable with other estimates for other diseases but also likely overestimating the income loss from diseases due to nonaccounted comorbidity.
What did the researchers do and find?
• We used routine health data for an entire high-income country (New Zealand) linked to income-tax data to create a full-population panel study of 25-to 64-year-olds for disease and income status, year by year during 2006 to 2007 to 2015 to 2016.
• We then used an econometric method-fixed effects regression-to estimate the within-individual income loss in the year they develop disease, years they are prevalent with disease, and last year of life if dying of that disease.
• Income loss in the year of diagnosis was highest for dementia and also high for mental illness. Similar patterns were evident for prevalent years of diagnosis. For the last year of life, cancers tended to have the highest income losses.
• The combined annual income loss from all diseases among 25-to 64-year-olds was US $2.72 billion or 4.3% of total income for 25-to 64-year-olds. Diseases contributing more integrated-data/apply-to-use-microdata-forresearch/. Or contact SNZ at: access2microdata@stats.govt.nz. Applicants to use the data will need ethics approval and a data integration approval from SNZ. At this point in time, only researchers based in NZ can access the data through either data laboratories on site at SNZ, and satellite-laboratories in a number of NZ institutions (e.g. hosted by NZ universities).

Introduction
Health shocks adversely affect labour force participation [1], incomes, and productivity [2,3]. Estimates of income loss following diagnosis of disease may be used as ancillary estimates of the individual burden from poor health, and the aggregate societal economic productivity loss [4][5][6]. These income loss estimates can also inform policy on sickness benefit and health insurance and can be used in prioritisation of preventive interventions. As longevity increases and populations have older age structures, there is an increasing need for health interventions to be assessed not only on health sector impacts (i.e., health gains and health system expenditure), but also on the impact of interventions on wider society-including workforce productivity given the need to support an increasingly aged population. There are large variations in contexts, data, and methods used to estimate disease-related income loss. Studies often focus on a single disease, limiting the ability to compare across different diseases. In addition, confounding by comorbidities (or other diseases) is often ignored in studies that estimate income loss from one disease in isolation (e.g., ischaemic heart disease [7], rheumatoid arthritis [8], cancer [9]) leading to likely overestimation of income loss for a given disease. Put another way, if such studies were undertaken separately for all diseases, the sum of income loss across these studies would (likely greatly) exceed the actual income loss from all diseases considered together. Even studies from Scandinavia, with population-wide disease registers linked to taxation or employment data, have also focused on single diseases such as diabetes [10], breast cancer [11], and injury [12]. We are aware of only one study that has considered multiple diseases simultaneously in estimating productivity loss for a national cohort [4]. On the other hand, some studies only include income loss or productivity loss from deaths [13,14] tallying up all income loss had the person hypothetically lived to (say) 65 years of age (which also ignores competing morbidity and competing mortality risk).
To overcome the limitations in previous studies on disease-related income loss and provide income loss estimates that are comparable across diseases, we used population-wide panel data on disease and injury events linked to income data. We estimated income loss while the participant is alive or in the tax year of their death, for adults 25 to 64 years of age, adjusting for all diseases simultaneously to overcome confounding, which we believe is unique. There is also likely to be further confounding by socioeconomic position (SEP), in that SEP causes both variations in disease rates and income (loss); we aimed to estimate the income loss a person would have avoided had they counterfactually not developed the given disease. Our inference target was the average citizen or total population-not just those employed; accordingly, we estimated income loss estimates averaged across all citizens. Specifically, our research objectives were to determine (1) individual-level disease-specific income loss estimates by phase of diagnosis; and (2) population-level estimates of total income loss by disease and the ranking of disease contributions to income loss in the total population.

Methods
We created a cohort of the entire New Zealand usually resident population 25 to 64 years of age during the observation window of 2006 to 2007 to 2015 to 2016, using linked administrative health and income/tax data.
The study was approved by Statistics New Zealand (SNZ) for undertaking in the SNZ Integrated Data Infrastructure (IDI) and separately approved by the University of Otago Ethics Committee. This study is reported as per the RECORD guideline (S1 Checklist).

Population
New Zealand is a high-income country with a population of 5 million people with a median age of 38 years, 16% of the population 65 years of age and older, and a life expectancy of 81.4 years. Most New Zealanders are of European extraction with sizeable populations of Māori (Indigenous population; 16.5%), Asian (15.1%), and Pacific peoples (8.1%; 2018 Census data). The GDP per capita in 2018 was about US$42,000, ranked about 30th among all countries. Over 80% of total health expenditure is government funded (through tax revenue). Regarding income protection, a separate accident insurance corporation exists that will cover 80% of one's wage or salary while incapacitated, but income protection in the event of sickness is patchy comprising: accrued sick leave from one's employer; income protection insurance from private insurance companies; and a relatively low publicly funded sickness benefit safety net.

Datasets
The health data comprised the following national datasets, all linkable with the National Health Index unique identifier: the National Minimum Data Set (NMDS) for all inpatient events since 1988; cancer registrations since 1995; retail pharmaceuticals since 2005; mental health event data since 2000 (Programme for the Integration of Mental Health Data (PHRIMD) and Mental Health Information National Collection (MHINC)); virtual diabetes register; and mortality data during the 2006 to 2007 to 2015 to 2016 observation window.
The income data were sourced from Inland Revenue Department (IRD) data collated for all New Zealand residents, from 2 sources: automatic filings from employers to IRD for wages and salary under the "pay as you earn" system; and self-employed income from annual returns submitted by residents.
Multiple government datasets (including the above health and income data) are available to users of the SNZ IDI (available to New Zealand-resident researchers by application to SNZ). All datasets are prelinked (before researchers have access) using a resident population spine that is maintained by SNZ (a detailed data profile and methods regarding the SNZ IDI is published elsewhere [15]). Our study cohort was taken from a SNZ resident population [16] and includes anyone with activity in the IRD, health, education, and Accident Compensation Corporation datasets within 12 months prior to the reference date (31 March of each year) and excludes individuals classified as having moved overseas (i.e., if the total length of time spent overseas was at least 10 of the 12 months spanning the reference date (6 months either side of the reference date). The population was modified to include individuals who died within 1 year prior to the reference date.

Variables
We included the following covariates and interaction terms in the models (by sex): Diseases. We used the NZ Burden of Disease Study (NZBDS) condition groupings [17,18] to select 14 aggregated and 40 disaggregate-level diseases or conditions (see S1 Table for coding, S2 Table for categories, NZBDS categories with insufficient data or no variation over time (dental and congenital) were excluded). To determine if any of these diseases were prevalent before or were incident during the 2006 to 2007 to 2015 to 2016 observation window, a thorough case finding algorithm was applied consistent with that used for the NZBDS (S1 Table). In general terms, International Classification of Disease (ICD) codes for events and disease-specific drug combinations were developed, disease by disease; primary and secondary hospitalisation diagnoses were used in the look-back period (i.e., pre-2006 to 2007) to determine presence or absence of disease, but only primary diagnoses were used to determine incident disease in the 2006 to 2007 to 2015 to 2016 observation window. Once a person was diagnosed with a disease, they were assumed to have that disease for the rest of their lifeexcept skin disorders, infections, internal injuries, poison injuries, and other injuries who only had the disease for the year, and cancers where survival beyond 5 years for lung, 8 years for colorectal, 10 years for "other" cancers, and 20 years for breast and prostate resulted in that person being recoded as being free of that cancer (based on statistical cure times) [19]. Each disease was coded by phase as not present (reference category), diagnosed in that year, died in that year of that disease, and otherwise prevalent. Note, therefore, the costs for the first 2 categories are for people with an average of 6 months in that state (but for the diagnosis category still including the time and costs for events preceding the diagnosis date in the same tax year).
Income. Each eligible resident was assigned a total pretax income as collected by IRD for each tax year 1 April to 31 March, for the 10 years. In main analyses, self-employed income included sole trader income but excluded income from partnerships (if not paid as personal income), rental properties, company directorships, and shareholdings. It is important to note that if an individual received sick pay from their employer at the same level as their usual income, their income (apparent to us) did not change-meaning that we missed this component of income loss relevant to the underlying construct of productivity loss. All annual total income was inflation adjusted to the 2020 reference year using the consumer price index, then converted to US$2020 using the NZ-USA OECD purchasing power parity of 1.445.
Covariates. Age was treated as a 5-year categorical variable for main effects and grouped to 25-to 39-, 40-to 49-, 50-to 59-, and 60-to 64-year-old categories for interaction with diseases.
Each individual was assigned to one ethnic group in a prioritised manner (given individuals can nominate more than one self-identified ethnic group), in order of the following: Māori (indigenous population of New Zealand), Pacific peoples, Asian peoples, and the rest as Other (i.e., largely European).
To allow for time-varying confounding by changing SEP, and variations in income loss by SEP, we assigned each individual person-year to a quintile of deprivation using a validated small area deprivation measure called NZDep [20]. The NZDep measure is a principal components calculated index using 9 census variables at a small area (meshbock) level of about 100 people: proportion of income as benefit receipt; household income; housing tenure; sole parent family; unemployment; qualifications; household crowding; telephone access; and car access. (We did not use a comorbidity index per se, but rather as all models were adjusted for all other diseases (be that the 14 or 40 disease level) we consider this adjustment for confounding by other diseases as equivalent to adjusting for comorbidity).

Analyses
Our analytical plan for this study was to follow the analyses we previously conducted (and published in this journal) for disease-related health system expenditure [21]. We, however, deviated from this previous approach to use fixed effects (FE) regression analyses for the "main analyses," reporting ordinary least squares (OLS) regression in sensitivity analyses due to concerns about residual confounding (below).
Fixed effects regression modelling. We used an "excess" or "net" cost approach [21][22][23][24][25], with total income as the dependent variable in the regression analyses (uses within individual variation that removes time-invariant confounding [26] with cluster-adjusted standard errors). Conceptually, this excess costing approach examines how individual's incomes vary year by year, corresponding to their disease status in each year. If the average difference in income in people's first year of diagnosis of stroke (compared to their own pre-stroke years) was $5,000 (adjusted for other changes over time such as other diseases and changing deprivation), then this is the income loss.
Observations were only excluded if they had missing geocode for assigning NZDep (0.96%). Observations were censored after the year of death and if not eligible to be in the usually resident population (e.g., it was possible for a person living overseas for a period to contribute person observations in early and late years but be censored for midyears). Due to extreme income outliers, we further excluded year observations with a total income that was negative and less than the 0.1th percentile or greater than the 99.9th percentile.
The FE regression model is while not a research question per se in this study, allowing for such interactions is important in the cause-deleted analyses below that estimate total income loss from disease across the population.) β 1 represents the coefficient for the independent variables (with the standard error used to generate 95% CI); a i is the unknown intercept for each individual; and u it is the idiosyncratic error term.
Cause-deleted analyses. We used a cause-deleted approach to estimate population-level income loss from disease. First, we predicted back onto every individual their expected income loss, using their disease covariates and the matching regression coefficients (including for interactions). Then, disease by disease, we set everyone's disease values to 0 (i.e., no disease); predicted each individual's cause-deleted income loss; and the difference between the asobserved and cause-deleted models total predicted income was the cause-deleted income loss attributable to each disease.
Sensitivity analyses. While FE regression has the strong advantage of removing all timeinvariant confounding, it also has a disadvantage of relying on within-individual differences in income before and after diagnosis. For some chronic diseases such as diabetes, mental illness, and respiratory disease, the year during which our case finding algorithms determine a person to be incident is somewhat arbitrary-as it assumes a discontinuity in health at that point, when indeed the person may have had slowly progressing impacts on their health leading up to (say) their first hospitalisation. This will mean a bias towards underestimating income loss. A between-person regression approach does not suffer from this bias but is prone to residual confounding by SEP. Therefore, the OLS regressions also restricted observations to only those people with no disease 2 years before the observation window and adjusted for prior average income (continuous variable) in the 2-year period 2003 to 2004 to 2004 to 2005.
The case finding algorithm for depression and anxiety used a reasonably stringent case definition-some contact with some publicly funded mental health services (e.g., hospitalisation, acute assessment). Much depression and anxiety is treated in primary care, often with antidepressants, and is not registered on the PRIMHD database. We, therefore, also used an extended case definition in sensitivity analyses that included receipt of mental health medication-although this will now overestimate income loss due to disease as, for example, antidepressants have treatment indications other than just depression. Conversely, due to variability in quality and duration of pharmaceutical records (especially in the look-back period), we ran analyses where pharmaceutical records were not used in case finding (for all diseases).

Results
There were 22.5 million person-year observations over the 10-year observation period, of which 49.5% of observations were for a person with at least one disease or condition ( Table 1). The total income was $606 billion (all cost values in US$, 2020), of which 45.4% was generated by people with at least one diagnosis or condition in the year. Person-years for people with a gastrointestinal condition were most common at 15.1% of all person-years (3.4 million (S2 Table) out of 22.5 million total person-years), followed by musculoskeletal diseases 14.3%.

Objective 1: Individual-level disease-specific income loss estimates by phase of diagnosis
Figs 1 and 2 show the FE regression coefficients-inflation and purchase power parity adjusted to US$-by sex and two of the disease phases: first year of diagnosis and prevalent disease. These per capita income losses represent the reference case person (age 50 to 54, Other/European ethnic group, living in the middle quintile of neighbourhoods ranked by deprivation). Income losses tended to be greater for males. Income loss in the year of diagnosis was highest for dementia for both men ($8,882; 95% CI $6,709 to $11,056) and women ($7,103; $5,499 to $8,707). Traumatic brain injury, mental illness, and lung cancer cases had the next largest income losses for males and females. One disease had a significant increase in income postdiagnosis, namely migraine for females ($446, 95% CI $265 to $627). Similar patterns were evident for prevalent years of diagnosis.
Income loss in the last year of life was (unsurprisingly) high for all diseases, given on average 6 months of income was lost if the person was employed at the time of death and greater  Table 2 shows the cause-deleted income gains; cause-deleted results combined the prevalence of each condition with the per capita income loss (as above). The average annual income for all 25-to 64-year-olds was $60.6 billion; if all diseases were deleted, we estimated the total income would have been $63.3 billion-a 4.3% or $2.72 billion increase in income ($121 per adult). Of this increase in income by deleting all diseases, the largest contributor was clearly mental illness at 30.0%-with nearly half of this from the depression and anxiety category  (13.1%). For aggregated disease categories contributing more than 5% of this total diseaserelated income loss, the rank order after mental illness was as follows: cardiovascular disease (15.6%); musculoskeletal (13.7%), endocrine (8.9%), gastrointestinal (7.4%), and neurological (6.5%). Other notable findings included type 2 diabetes mellitus contributing to 6.5% of all disease-related income loss and, conversely, cancer only contributing 4.5% and injury only 3.5%. Fig 3 shows the cause-deleted income gains by age (sexes combined) by 14-level grouping. Unsurprisingly, the income gain increases with age. Mental illness stands out as the major cause of income loss at younger ages and continues to be a major contributor into older ages. Conversely, musculoskeletal and the vascular and blood category make increasingly large contributions at older ages.  income are lesser in absolute magnitude (due to likely confounding of the unadjusted OLS results by SEP). For most diseases, there were income drops in the year prior to diagnosis (S2 Fig). These drops were 50% or more of the income drop observed in the first year of diagnosis, for both males and females, for mental health disorders, dementia, chronic obstructive pulmonary disease, type 2 diabetes, chronic liver disease and sensory disorders-consistent with all these diseases having substantial prodromal periods. However, the possibility of some reverse causation for mental diseases (i.e., drop in income causing mental disorder) exists. Therefore, we undertook a crude test by lagging income forward and back 1 year in FE analyses. Consistent with the assumption that mental illness is mostly causing income loss, not vice versa, the association of mental illness to income in the next year was stronger than to income in the previous year.

Sensitivity analyses
Some disease classifications likely detected more severe cases, e.g., mental illness that relied upon a presentation to and diagnosis at a publicly funded inpatient or outpatient service. Using mental illness as an example, if we added community dispensed pharmaceuticals (e.g., antidepressants) to the case classification algorithm, prevalent person years of depression and anxiety increased from 3.6% to 14.4%, but commensurately the per person income loss more than halved (as less severe disease was being included, and conditions other than anxiety and depression that share the same pharmaceutical treatments). At the aggregate level of all mental illness, in terms of cause-deleted analyses, the wider case definition using pharmaceuticals increased the percentage of all income loss from mental illness from 30.0% to 39.9%.
Conversely, excluding pharmaceuticals from all our base-case finding algorithms made little difference to FE regression results.

Discussion
Diseases cause substantial income loss. For a counterfactual scenario of no disease and using income loss from diagnosis to the year of death (but not after the year of death), our results suggest that the 25-to 64-year-old population's income would be 4.3% greater. For aggregated disease categories contributing more than 4% of this total disease-related income loss, the rank order was as follows: mental illness (30.0%); cardiovascular disease (15.6%); musculoskeletal (13.7%), endocrine (8.9%), gastrointestinal (7.4%), neurological (6.5%), and cancer (4.5%). Migraine was the only disease that resulted in a (modest) increase in income, for females only, but with 95% confidence intervals excluding the null. While it could be a chance finding, it is also not implausible: A diagnosis of migraine may lead to better treatment and more productivity; conversely, increased stress in the workplace due to longer hours or more responsibilities with career advancement (and, therefore, income) may trigger migraines. The finding of high per capita income loss for people with dementia in their first year of diagnosis and if prevalent was unexpected but also based on relatively few people (Figs 1 and 2); it may be that dementias before the age of 65 selectively impact the ability to work of high-income occupations. Unsurprisingly, there is a strong correlation between aggregate disease-related income loss and health loss from the same conditions measured in disability-adjusted life years and years of life lived with disability-due to both being largely driven by the prevalence of the condition (S4 Fig). The pattern of income loss in the current study is similar to that observed by Kinge and colleagues in Norway [4] (see S2 Fig for a comparative breakdown), albeit the Kinge and colleagues study used more approximation methods than our actual linked data. Generalizability to other high-income countries, the duration, and generosity of employer sick leave may vary with other countries; in our analysis, employer-funded sick leave was not able to be "seen" as it is simply part of salary or income. Therefore, the reader will need to be aware of this if their conceptualization of income loss (or productivity loss) includes employer sick pay. Second, the extent of assistance provided to people with health conditions to return to (paid) work varies by country. Third, the extent of disease-related income loss is likely to vary with the unemployment rate. If unemployment is high, then disease-related income loss is likely to be higher due to a greater pool of competitors for the same job (but, conversely and rather brutally, the productivity loss to society will be less as the sick person is more easily "replaced" in the workforce). Nevertheless, the general patterns we observe likely hold in other countries-and our study is a valuable template for future comparison studies.
There are 2 main approaches to estimate economic productivity losses: (a) human capital approach (HCA); and (b) friction cost approach (FCA) [5,6]. In the HCA, productivity losses are equated to losses of income. In the FCA, the illness and reduced work capacity of an individual is assumed to be replaced by another citizen after a certain time if the economy has structural unemployment or other means to replace workers (e.g., immigration). Our study uses an HCA among the living (i.e., loss of income compared to the participant's "healthy self") and a hybrid approach among decedents whereby income loss in the tax year of death is included, which on average includes the 6 months predeath and (complete income loss) 6 months postdeath. We do not include income loss in years after death, equivalent to an FCA of people being "replaceable" in the workforce.
We believe that our study is a substantial advance in methods and data over previous studies. First, by adjusting for many diseases in one model, we provide estimates of how diseases compare against one another and prevent overestimation of income loss due to the presence of comorbidities. Second, we use population-wide linked health and tax data, over 10 years, offering high power and avoiding selection biases that may arise with panel studies and attrition. Third, we have data on health conditions prior to the 10-year observation window to ascertain preexisting health status, which allows us to accurately estimate income loss by stage of disease.
There are limitations of our study. First, while the income/tax data is an objective income assessment, it is not perfect for the measure of productivity; employers will continue to pay employees who are sick during their sickness leave entitlement period, and we had no data on this. Accordingly, we underestimate "productivity loss" to some extent. It would be a useful extension to our study to impute this employer-funded sick pay using external survey datahowever, we doubt that it will change the relative income loss and rankings by disease. Second, our FE regression models remove all time-invariant confounding by design and measured time-varying confounders, but they are likely to underestimate income loss due to failure to "capture" any deterioration in income prior to diagnosis using our case finding criteria. Hence, we ran between-person OLS regression models, for a healthy cohort at the outset and adjusting for income before the observation window, as a sensitivity analysis. The truth probably lies somewhere between the FE estimates and the OLS estimates in the model adjusted for prior income. We favour the FE analyses because it removes time-invariant confounding by design. Also, in the between-person analyses, we noted patterns consistent with some residual confounding, for example, in people contracting breast cancer and prostate cancer (which tend to be higher SEP, the latter due to unequal uptake by SEP in prostate-specific antigen testing) have higher incomes after diagnosis-likely due to residual confounding by SEP. Conversely, there are likely competing residual biases in the FE regression towards overestimation. Residual time-varying confounding likely remains, for example, divorce as a cause of both income loss and depression. Also, reverse causation may inflate some estimates, for example, again the mental health estimates whereby low income causes poor mental health (although in sensitivity analyses, we found that the association of mental illness with income loss lagged 1 year was stronger than the association of income with mental illness lagged 1 year-suggesting that any reverse causation is "less" than the direction of causation we assumed).
We find that absolute income losses by disease are generally higher for males than females, but relative income losses are similar-reflecting known inequities in pay between males and females. However, it must be noted that we used a population-wide approach, not an analysis restricted to only those employed. Given that females have lower workforce participation, our estimated income loss among all females getting a disease will for this reason alone be less than among all males (with higher employment rates) getting the same disease. If results such as ours are to be used in prioritising health interventions based on their ability to improve individual incomes and aggregate productivity, we recommend careful attention to, and correction for, structural societal inequalities. A similar argument applies to diseases with varying rates by ethnicity and SEP; income loss by disease will vary in at least one of absolute or relative terms by SEP. Also, there may be subadditive or superadditive impacts on income loss from having 2, 3, or more conditions compared to the independent and separate (unconfounded) income loss for each condition as reported in this paper. These questions were beyond the scope of this study and will be pursued as additional follow-up publications.
There are important implications of this study. First, disease advocacy groups and researchers often invoke large estimates of the economic impact of their disease of choice, using studies with highly variable methods and not allowing for comorbidity. Our analysis provides a realistic and "confined within the total income envelope" estimate of income loss, conducted comparatively across diseases. Second, a major implication of our study is that preventing diseases that cause substantial income loss to individuals, and economic productivity loss to society at the aggregate level, justify greater weighting in prioritising intervention programmes. For example, our findings suggest that preventing mental illness, musculoskeletal diseases, and cardiovascular disease might justify some more weighting, if we also value the contribution of health interventions to economic outcomes. While the exact absolute value of income loss in our study is subject to our assumptions, the relative comparisons are robust between diseases-all diseases were analysed together with similar assumptions. We argue that such comparability of estimates by disease, especially by phase of disease, at least within one country, opens a useful policy door to estimating the impact of interventions (e.g., salt reduction in bread that lowers stroke and ischaemic heart disease rates, e.g., obesity reduction programmes; e.g., treatments) on income loss in addition to the usual health gain and health expenditure impacts. Such additional analyses should be a useful adjunct to prioritise health interventions, if a desirable additional impact of interventions in the health sector is improvements in economic productivity. However, such analyses must be handled carefully, given the potential equity implications (e.g., sex as above).
In summary, we used a unique data base of repeated disease and tax income measures on an entire population. FE regression estimated the within-individual change in income when developing disease among 25-to 64-year-olds. Income loss among individuals developing disease was highest for dementia (noting this was dementia onset before the age of 65 years), followed by mental illness. From a total population perspective, combining the prevalence of disease with the income loss estimates per individual, the 3 largest causes of income loss were mental illness, cardiovascular disease, and musculoskeletal diseases. Our study is a major advance, including all diseases simultaneously and quantifying within-individual income loss. We encourage other countries to also conduct such comparable analyses and then further to trial including such income loss estimates as additional considerations in intervention prioritisation with policymakers. (DOCX) S1 Checklist. The RECORD statement-Checklist of items, extended from the STROBE statement, which should be reported in observational studies using routinely collected health data. (DOCX) or organisation, and the results in this file have been confidentialised to protect these groups from identification and to keep their data safe.

Supporting information
Careful consideration has been given to the privacy, security, and confidentiality issues associated with using administrative and survey data in the IDI. Further detail can be found in the Privacy impact assessment for the Integrated Data Infrastructure available from www.stats. govt.nz.
The results are based in part on tax data supplied by Inland Revenue to Statistics NZ under the Tax Administration Act 1994. This tax data must be used only for statistical purposes, and no individual information may be published or disclosed in any other form, or provided to Inland Revenue for administrative or regulatory purposes.
Any person who has had access to the unit record data has certified that they have been shown, have read, and have understood section 81 of the Tax Administration Act 1994, which relates to secrecy. Any discussion of data limitations or weaknesses is in the context of using the IDI for statistical purposes and is not related to the data's ability to support Inland Revenue's core operational requirements.