Health system costs for individual and comorbid noncommunicable diseases: An analysis of publicly funded health events from New Zealand

Background There is little systematic assessment of how total health expenditure is distributed across diseases and comorbidities. The objective of this study was to use statistical methods to disaggregate all publicly funded health expenditure by disease and comorbidities in order to answer three research questions: (1) What is health expenditure by disease phase for noncommunicable diseases (NCDs) in New Zealand? (2) Is the cost of having two NCDs more or less than that expected given the independent costs of each NCD? (3) How is total health spending disaggregated by NCDs across age and by sex? Methods and findings We used linked data for all adult New Zealanders for publicly funded events, including hospitalisation, outpatient, pharmaceutical, laboratory testing, and primary care from 1 July 2007 to 30 June 2014. These data include 18.9 million person-years and $26.4 billion in spending (US$ 2016). We used case definition algorithms to identify if a person had any of six NCDs (cancer, cardiovascular disease [CVD], diabetes, musculoskeletal, neurological, and a chronic lung/liver/kidney [LLK] disease). Indicator variables were used to identify the presence of any of the 15 possible comorbidity pairings of these six NCDs. Regression was used to estimate excess annual health expenditure per person. Cause deletion methods were used to estimate total population expenditure by disease. A majority (59%) of health expenditure was attributable to NCDs. Expenditure due to diseases was generally highest in the year of diagnosis and year of death. A person having two diseases simultaneously generally had greater health expenditure than the expected sum of having the diseases separately, for all 15 comorbidity pairs except the CVD-cancer pair. For example, a 60–64-year-old female with none of the six NCDs had $633 per annum expenditure. If she had both CVD and chronic LLK, additional expenditure for CVD separately was $6,443/$839/$9,225 for the first year of diagnosis/prevalent years/last year of life if dying of CVD; additional expenditure for chronic LLK separately was $6,443/$1,291/$9,051; and the additional comorbidity expenditure of having both CVD and LLK was $2,456 (95% confidence interval [CI] $2,238–$2,674). The pattern was similar for males (e.g., additional comorbidity expenditure for a 60–64-year-old male with CVD and chronic LLK was $2,498 [95% CI $2,264–$2,632]). In addition to this, the excess comorbidity costs for a person with two diseases was greater at younger ages, e.g., excess expenditure for 45–49-year-old males with CVD and chronic LLK was 10 times higher than for 75–79-year-old males and six times higher for females. At the population level, 23.8% of total health expenditure was attributable to higher costs of having one of the 15 comorbidity pairs over and above the six NCDs separately; of the remaining expenditure, CVD accounted for 18.7%, followed by musculoskeletal (16.2%), neurological (14.4%), cancer (14.1%), chronic LLK disease (7.4%), and diabetes (5.5%). Major limitations included incomplete linkage to all costed events (although these were largely non-NCD events) and missing private expenditure. Conclusions The costs of having two NCDs simultaneously is typically superadditive, and more so for younger adults. Neurological and musculoskeletal diseases contributed the largest health system costs, in accord with burden of disease studies finding that they contribute large morbidity. Just as burden of disease methodology has advanced the understanding of disease burden, there is a need to create disease-based costing studies that facilitate the disaggregation of health budgets at a national level.


Methods and findings
We used linked data for all adult New Zealanders for publicly funded events, including hospitalisation, outpatient, pharmaceutical, laboratory testing, and primary care from 1 July 2007 to 30 June 2014. These data include 18.9 million person-years and $26.4 billion in spending (US$ 2016). We used case definition algorithms to identify if a person had any of six NCDs (cancer, cardiovascular disease [CVD], diabetes, musculoskeletal, neurological, and a chronic lung/liver/kidney [LLK] disease). Indicator variables were used to identify the presence of any of the 15 possible comorbidity pairings of these six NCDs. Regression was used to estimate excess annual health expenditure per person. Cause deletion methods were used to estimate total population expenditure by disease. A majority (59%) of health expenditure was attributable to NCDs. Expenditure due to diseases was generally highest in the year of diagnosis and year of death. A person having two diseases simultaneously generally had greater health expenditure than the expected sum of having the diseases separately, for all 15 comorbidity pairs except the CVD-cancer pair. For example, a 60-64-year-old female with none of the six NCDs had $633 per annum expenditure. If she had both CVD PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002716 January 8, 2019 1 / 21 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
The Global Burden of Disease (GBD) has achieved the vision of a coherent and comparable set of epidemiological estimates of disease-specific incidence, prevalence, morbidity, and mortality across all countries and since 1990 [1,2]. There are many reasons associated with planning and comparative analysis that suggest it would also be worthwhile to pursue a similar vision of national health system expenditure, disaggregating spending by disease in a manner comparable across countries. The Organization of Economically Developed Countries has promulgated such a vision under the rubric of national health system accounts that disaggregates total health expenditure by disease [3], and within the United States, there have been proposals for the creation of such accounts [4]. A comparable set of disease costs across multiple countries would facilitate comparisons both over time within countries and between countries. Such accounts would provide measures of disease burden that would generate a consistent set of costs to use in cost of illness (COI) and cost-effectiveness studies. Achieving such a vision, though, is challenging. Countries vary in the structuring of their health systems, and there is a need for standardisation of methods across countries. Using the same disease classification systems and standardising, or at least accommodating differing boundaries of total health expenditure, is critical [5].
A key limitation of disease-specific COI studies that focus on a single or small set of disease (s) is their tendency to overestimate costs due to attribution of comorbidities. If separate COI studies are naively summed, overestimation of total expenditure is likely [4]. COI studies including most/all diseases in a given country, constraining the total cost to the actual total envelope of health expenditure, are protected against this double-counting problem. Whilst such multiple or all-diseases COI studies are uncommon, the US has a reasonably long history of such studies since the 1960s (e.g., [6] through to more recent studies [7,8]), and examples are emerging in other countries (e.g., [5,9,10]). These studies are usually hybrids of top-down disaggregation and bottom-up estimation using a range of individual-level data sets. However, we are not aware of any studies in which individuals are uniquely identified across multiple data sets (e.g. inpatient, outpatient, pharmaceuticals, mortality, cancer registrations), allowing identification of disease diagnosis, progression, and death (and how costs are aligned to disease progression) and the identification of comorbidities within the same individual, linked to all event and cost data.   [4,11] outline three approaches to attributing costs to diseases: (1) encounter based, (2) episode based, and (3) person based. Encounter based and episode based both require each encounter or episode to be coded to a primary disease, with associated or comorbid diseases as secondary codes. This coding is routinely undertaken for hospitalisation in many countries but is unlikely to be available for primary care encounters and pharmaceuticals (as many drugs are used for more than one disease). 'Most cost of illness studies take an encounter-based approach, assigning claims to disease buckets based upon their coded diagnoses. Comorbidities are a major problem here; attributing each spending item for a patient who is both hypertensive and diabetic is not easy' (p. S10 [4]). Dieleman et al. (2017) have proposed an extension to the encounter-based approach, merged with an attributable risk approach, to determine base costs for each disease and outflows and inflows from comorbid conditions [12]. However, this approach still requires primary and secondary diagnoses for each event. In the person-based approach, all of a person's total expenditure is regressed on disease indicators and covariates: a model with no disease comorbidity interaction terms will estimate the independent and unconfounded costs of each modelled disease; a model with additional covariates for comorbidity interactions will estimate the greater (or lesser) costs of having two or more diseases simultaneously, over and above that expected for having each disease separately.
In this study, we capitalise on national-level data tracking all publicly funded noncommunicable disease (NCD) health events of uniquely identified individuals, over multiple years, in New Zealand to estimate costs by disease. New Zealand is a high-income country with a burden of NCDs similar to other high-income countries [2], although with less than average healthcare spending per capita than other high-income countries (US$3,648 in purchasing power parity-adjusted US dollars for 2015, compared with a population-weighted average of $5,551 across all high-income countries) [13]. We extend previous work in two ways. First, we attribute health expenditure (1) to mutually exclusive disease categories and (2) split into the single-disease costs (i.e., the cost of disease as though comorbidities did not exist), and the disease comorbidity interactions to capture the impact of people having more than one disease simultaneously.
The second extension is the estimation of costs by phase of disease progression. This includes estimating the costs in year of diagnosis and year of death if dying of that disease. These estimates are useful for subsequent simulation modelling of disease prevention and treatment in cost-effectiveness research, as the differential timing of costs can be better modelled. Our advances over previous national-level and multiple-disease COI studies may not be replicable in many other countries due to data limitations, but we aim to generate a greater depth of understanding to assist not only policy within New Zealand but also the generation of approaches for multicountry COI studies.
We address three specific research questions: 1. What is health expenditure by disease phase for NCDs in New Zealand?
2. Is the cost of having two NCDs more or less than that expected given the independent costs of each NCD?
3. How is total health spending disaggregated by NCDs across age and by sex? Data cleaning undertaken by the authors included ensuring same sex, date of birth, and date of death; when an individual had multiple records and there was a disagreement, we used data from the most authoritative source (e.g., mortality record if date of death) or most commonly given (e.g., female sex if on all but one record for a given individual).

Disease groupings and case definition
For this study, we used two distinct disease groupings. The first set, the aggregated disease set, included the first five of eight disease groupings in the 1990-2013 New Zealand Burden of Disease Study (NZBDS) [15], namely cardiovascular disease (CVD) and diabetes (which we divided into CVD and diabetes separately), cancer, chronic lung/liver/kidney (LLK) disease, neuropsychiatric conditions, and musculoskeletal conditions. The coding and capture of mental events was not yet reliable in the study window, so we excluded psychiatric conditions and had a neurological-only category. The three NZBDS groupings we did not model were as follows: other NCDs; maternal, neonatal, nutritional deficiency, and infectious disorders; and injury. The second set of diseases used in this study were 13 disaggregated diseases, including lung, breast, prostate, colorectal, and other cancers; ischaemic heart disease (IHD), stroke, and other CVD; and chronic lung, chronic liver, and chronic kidney disease.
To determine if any of these diseases were prevalent before or were incident during the 1 July 2007 to 30 June 2014 costing window, a thorough case-finding algorithm was applied, consistent with that used for the NZBDS (S1 Table). International Classification of Disease (ICD) codes (primary and other) for events and disease-specific drug and laboratory testing combinations were developed, disease by disease. For cancers, survival after diagnosis by 5 years for lung, 8 years for colorectal, 10 years for 'other' cancers, and 20 years for breast and prostate resulted in that person being recoded as free of that cancer, based on statistical cure times [16]. For all other disease, no remission was allowed (i.e., diagnosis was for the remainder of his or her life). Each disease was coded by phase as not present (reference category), diagnosed in that financial year, died in that financial year of that disease, and otherwise prevalent. Note, therefore, the costs for the first two categories are for people with an average of 6 months in that state (but, for the diagnosis category, still including the time and costs for events preceding the diagnosis date in the same financial year).

Modelling
We did not attempt to classify what healthcare spending was or was not related to each specific disease. Rather, we used the established 'excess' or 'net' cost approach [17][18][19][20], using a statistical approach based on regression models, with total health system cost for each individual in each financial year as the dependent variable and demographics, dummy variables for calendar year, and disease and disease phase indicators as the independent variables. The coefficients for the disease indicators are therefore estimates of the excess cost of having that disease phase and are independent of other diseases (and comorbidities) in the model. The intercept term is the health system cost not attributed to the diseases in the model (e.g., due to injuries, preventive care, mental illness, maternity, etc.).
To model nonadditive costs associated with comorbidity, we interacted the disease indicators in order to identify when an individual had both diseases. It was not possible to include all possible combinations of disease interactions in regression models due to the large number of interaction terms. Thus, we retained for regression modelling the disease phase indicators for each disease as 'main effects' but only included all 15 comorbidity pair interactions formed by six diseases (i.e., 6!/([6 − 2]! × 2!) = 15). For example, the cancer-CVD pair was coded '1' for a personyear observation for which the person had a diagnosis of both diseases, regardless of phase of diagnosis (i.e., prevalent cancer with prevalent CVD was coded the same as first year of diagnosis of cancer with prevalent CVD). In regression modelling, the disease phase indicators and disease comorbidity pairs were all interacted with age and age squared, first centring age at the midpoint of the 60-64-year-old age group, 62.5 years, and dividing by 10, so the age interaction terms are interpretable in per 10-year units. Regression equations are detailed in S1 Appendix.
To prepare the data for regression modelling, we aggregated them into unique strata formed by cross-classifying 5-year age groups by financial year, by the four levels of each disease (no diagnosis, diagnosed in that financial year, dying of that disease in that financial year, otherwise prevalent with that disease). For the disaggregated 13 disease classifications, there were 151,913 unique strata of these cross-classified categorical variables with at least one male observation, and 139,717 unique strata for females. Between-person ordinary least squares (OLS) regressions were run on these aggregated data sets, with weights equal to the number of observations in each stratum. The advantages of this approach included (1) aggregation reducing the skewness of data, (2) OLS coefficients were directly interpretable as excess costs, (3) the total predicted expenditure (when applying coefficients back onto observations and summing regression-predicted costs across all individuals) always gave the exact observed total expenditure in the data sets, and (4) model-run time was considerably reduced compared with analyses on unit-level data. We also trialled gamma regression on unit-level data, which had better residual plots than OLS but overpredicted disease and total costs, causing us to prefer the above OLS on aggregate data approach. Both because zero costs were so rare (0.09% of all person-year observations) and previous work has found that for estimating averages, a one-part model performs (nearly) as well as a two-part regression [21], we did not employ a two-step process of first estimating individuals with zero expenditure.
Four sets of models were run: disease progression models (models with all the disease-specific phases) for either the aggregated (n = 6) or disaggregated (n = 13) disease groupings, but without comorbidity interactions; and both these models with the additional 15 disease comorbidity pairs. Note that we retain the same 15 disease comorbidity pairs in the disaggregated 13-disease model, as it was impractical to model and interpret the 78 possible comorbidity pairs formed by 13 diseases.
As part of model fitting, we wanted to ensure that the intercept terms in the model and regression-predicted costs by age using the models agreed to the annual expenditure in the data for individuals with none of the disease modelled. There was good correspondence of these predictions and those observed in the data for models including the disease comorbidity pairs; however, predictions for models not including the comorbidity pairs underestimated the costs among the nondiseased at younger ages (both OLS and gamma regressions) due to strong disease-disease-age interactions. Therefore, for models without comorbidity pair interactions, we rescaled all regression coefficients. First, we used the unscaled regression equations to predict costs among those with any disease and compared this with their actual total cost in the data set. We then used age-specific scalars for the disease coefficients in the model to return the correct total cost of people with diseases. For example, if the sum of OLS regression predicted that costs of 50-54-year-olds with at least one disease was $200 million, but the actual observed total cost for these same people was $180 million, then we rescaled all disease coefficients by 180/200 = 0.9 for this age group. Second, the reciprocal procedure was performed for people without disease. Notably, no such scaling was needed for the better-fitted models with the comorbidity pairs and their interactions, with age included.
Given age interactions, we present absolute estimates of disease expenditure for 60-64-year-olds as the main results, with young (45-49 years) and old (75-79 years) estimates presented additionally in S4 Table and S5 Table. Actual regression coefficients are presented in S3 Table, allowing interested readers to calculate excess costs for alternative groupings.
We present cause-deleted estimates, the estimates of spending without the presence of comorbidities, by setting disease coefficients in the predictive equation to zero.
We undertook several robustness checks. First, for simpler models (six-disease, no disease comorbidity pairs), an OLS general estimating equation (GEE) regression model on individual-level data was undertaken to allow for intraclass correlation due to up to 7 years of observation for each individual. Second, for simpler models we also ran (non-GEE) OLS regressions on unit-level data. Third, all the above models use between-person comparisons, which may be residually confounded by unobserved individual characteristics. Therefore, we also ran fixed effects models that capitalise on within-individual changes in disease status to estimate disease costs, removing potential time-invariant confounding [22].

Results
There were 18.9 million person-year observations for those 25 years and older between 1 July 2007 and 30 June 2014. A total of 7.1 million (37.7%) of these observations included at least one NCD ( Table 1). There was a total allocated expenditure across the 7 years of US$26.4 billion, of which $20.3 billion (77.0%) occurred among observations with at least one disease. Regarding source, half or $13.1 billion arose from inpatient care and about a fifth from each of outpatients and community pharmaceuticals. Table 2 shows the observations and total expenditure by disease phase. People with neurological conditions, musculoskeletal conditions, CVD, and diabetes had the greatest number of observations at 16.1%, 15.7%, 11.3%, and 7.5%, respectively, mostly as prevalent observations (i.e., neither diagnosed nor dying in that observation year). People with CVD had the highest total expenditure (43.1%), although this includes spending on additional diseases. Table 3 shows the modelled health system expenditure per person-year for 60-64-year-olds, attributed to each disease phase, for both the aggregated and disaggregated-disease models, by sex, without and with disease comorbidity pairs. The first row of the table results ('No NCD') is the estimated expenditure for a 60-64-year-old with no NCDs; the next panel is the main effects by disease phase for each of the 6 or 13 diseases; and the last panel is the expenditure per year for the 15 comorbidity pairings, over and above that attributed to the diseases independently.

Excess disease costs per person
When considering the aggregated-disease model for females, without comorbidity pairs (FA model in Table 3), a 60-64-year-old without any of the six diseases was estimated to cost the health system US$626 per year. If she had a cancer diagnosed in that year, her costs were predicted to increase by $9,747 over and above the $626 base cost, by $9,815 if dying of cancer in that year, or by $1,716 if living with a prevalent cancer. If she also had prevalent CVD, her costs were predicted to increase a further $1,439.
Fig 1 shows these independent excess costs for both male and female 60-64-year-olds, for the disaggregated-disease models (i.e., estimates for the MD and FD model columns of Table 1). Excess costs in the year of diagnosis and year of death from that disease were higher than prevalent costs. However, patterns differed by disease. For example, cancer and CVD Excess costs for the disease comorbidity pairs were positive, with 95% confidence interval (CI) excluding the null for all pairs across all models, except for cancer with each of CVD, diabetes, and LLK disease (bottom panel of Table 3). That is, the costs of having diseases jointly was usually greater than the estimated summed costs for having the diseases independently  Scaling required for models without disease-disease interactions is described in Materials and methods. (shown in the rest of Table 3). For example, using the male disaggregated-disease model (model MDC in Table 3), a 60-64-year-old with both CVD and LLK disease was predicted to cost $2,050 (95% CI $1,837-$2,263) more than that predicted based on the independent excess costs due to CVD and LLK independently. Only 1 of the 60 comorbidity pairs (15 possible pairs across four models) had a negative cost with the 95% CI excluding the null, namely the cancer-CVD pair in the male disaggregated-disease model (−$484, 95% CI −$784 to −$240). Whilst this in isolation may be a chance finding given the 60 pairs tested, this same cancer-CVD pair had a negative cost for the male aggregated-disease model and female aggregated and disaggregated models-but with 95% CI including the null.
There were significant interactions of age and age squared with most separate diseases and disease comorbidity interactions (S3 Table). Because of these age interactions, excess costs vary by age (S4 Table and S5 Table give estimates for 45-49-and 75-79-year-olds, respectively). Two patterns were evident for independent disease costs: excess costs in the last year of life were less with increasing age, and musculoskeletal and neurological excess disease costs per person in the first year of diagnosis and prevalent cases were greater for 75-79-year-olds than 45-49-year-olds. Regarding the disease comorbidity pairs, there was a strong pattern of comorbidity costs being greater for 45-49-year-olds than 75-79-year-olds-often considerably so (Fig 2). For example, excess expenditure for 45-49-year-old males with CVD and chronic LLK was 10 times higher than for 75-79-year-olds, and six times higher for females. High excess costs for comorbidity are notable for cancer and neurological, and CVD and LLK (especially at younger ages).

Disease costs at population level
Total expenditure by disease costs is a function of the above per-person costs and how common the disease phases actually were ( Table 2). Across the 7 years in the study, using the models without comorbidity interactions, the greatest excess expenditures were for neurological (22.3%), CVD (21.2%), and musculoskeletal (20.8%) disease ( Table 4). Fig 3 shows the attribution of this disease expenditure by sex and age. A greater area under the curve for females is attributed to nonmodelled diseases at younger ages, consistent with maternity care not being  Table 3. CVD, cardiovascular disease; DM, diabetes mellitus; LLK, lung/liver/kidney; MS, musculoskeletal; Neuro, neurological.
https://doi.org/10.1371/journal.pmed.1002716.g002 Table 4. Population-level total costs and cause-deleted cost savings in absolute dollars (US$ millions) and as a percentage of cost savings from deleting all diseases. included as an explicit category in the modelling. Noteworthy patterns include that neurological and musculoskeletal costs were substantial at all ages, with neurological particularly notable at younger ages for females; diabetes costs and female breast cancer costs were skewed more to younger ages; and other CVD and IHD costs were considerably greater than stroke costs at all ages. Using the models with the 15 comorbidity pairs, the independent disease costs reduced by 23.8% due to costs now being attributed to one of the 15 possible comorbidity pairs (Table 4). Of this 23.8% comorbidity expenditure, the five comorbidity pairs including a neurological condition contributed 60.7%, and the five pairs including musculoskeletal contributed 41.4%. The comorbidity pair with the greatest contribution was neurological plus musculoskeletal (18.1%), meaning the 9 out of 15 comorbidity pairs including either/both neurological and musculoskeletal conditions explained 84.0% of additional health expenditure due to comorbidities at the population level (i.e., 60.7% + 41.4% − 18.1%; see S1 Fig for a breakdown of percentage comorbidity expenditure for each of the 15 comorbidity pairs).

Total health expenditure
Of the 76.2% of expenditure not captured by comorbidity pairs, CVD accounted for 18.7%, followed by musculoskeletal (16.2%), neurological (14.4%), cancer (14.1%), chronic LLK disease (7.4%), and diabetes (5.5%; Table 4). The expenditure attributed to these six diseases independently, when including comorbidity pairs relative to the model excluding comorbidity pairs, varied by disease, ranging from 0.60 for diabetes (i.e., 40% of apparent independent expenditure for diabetes can actually be attributed to superadditive costs arising from diabetes coexisting with other diagnoses) to 0.92 for cancer (i.e., only 8% of apparent independent expenditure on cancer can be attributed to comorbidity additive effects; last column of Table 4).

Sensitivity analyses
The coefficients across the preferred aggregated data OLS models and the unit-level OLS and GEE unit-level OLS regressions were identical (S6 Table). The standard errors were smallest for unit-level OLS but more similar for aggregate OLS and GEE unit level. However, given the size of the data sets and the usually sizeable health expenditures for diseases, most disease excess costs are statistically significant, regardless. Fixed effects analyses that use within-person variation (i.e., differences in expenditure between years, with and without disease, or with varying disease phase) were in reasonable agreement with our preferred OLS regressions on aggregate data (S2 Fig), given conceptual differences between the approaches and data limitations (see S1 Appendix).

Main findings and interpretation
This study estimates health system expenditure for many NCDs, by disease phase, for New Zealand. We capitalised on linked health data with unique identifiers for the entire population that track medical costs over multiple years. We used a regression approach to estimate the excess cost to the health system from having a given disease. At the population level, we found that 59% of all allocated health system expenditure was attributable to the included NCDs. Attributing costs to separate disease groupings and not yet allowing for comorbidities, neurological diseases accounted for 22.3% of NCD expenditure, followed by CVD (21.2%), musculoskeletal (20.8%), cancer (15.1%), chronic LLK disease (11.5%), and diabetes (9.0%). We were surprised by large costs due to neurological and musculoskeletal diseases; therefore, we plotted in Other studies have found a correlation of health expenditure with years of life with disability (YLDs) [9,10], although the correlation is even stronger with disability-adjusted life years. The New Zealand neurological and musculoskeletal morbidity burdens stand out even more for YLDs (Fig 4) than for costs (Fig 3), offsetting possible concerns that our costings for neurological and musculoskeletal disease might be too high. How does this distribution of costs, ignoring comorbidities for now, compare with previous studies? There is a body of work in the US using methods that attributed events to diseasesnot an excess costing approach using regression. For example, Dieleman and colleagues (2016) attributed 16 years of health expenditure using multiple data sources in the US and found that CVD contributed 11% of all health expenditure (12.5% in our New Zealand study, using the total health expenditure as the denominator in Table 4), cancers 5.5% (New Zealand = 8.9%), COPD 2.6% (4.9%), neurological 4.8% (13.2%), musculoskeletal 8.7% (12.3%), and diabetes 4.8% (5.3%) [8]. In another comparable study using a GBD grouping of diseases for Switzerland, Wieser and colleagues (2018) found that cancer contributed 6.0%, CVD 15.6%, diabetes 1.5%, neurological (including dementia) 3.7%, and musculoskeletal 13.4% [9]. Musculoskeletal expenditure in New Zealand, at 12.3%, is intermediary between these US and Swiss percentage estimates. Of note, in GBD 2016, New Zealand also had 1.31 times higher morbidity burden for back pain and 1.84 times higher 'other' musculoskeletal than expected based on its level of sociodemographic development [23]. This leaves the neurological category as the notable 'stand-out' high cost for New Zealand, perhaps due to both the range of disease and conditions included (e.g., epilepsy, dementia, migraine) and the inclusive capture of people with at least one neurological diagnosis (e.g., through the range of data sets and classifications; S1 Table).
Where our New Zealand study steps ahead of similar general COI studies is in further disaggregating disease costs by phase and comorbidities, an advance that is possible due to all health data sets being linked at the individual level. Disease costs were generally higher in the year of diagnosis and the year of death if dying of that disease (Fig 1), consistent with our previous work on multiple cancers [24]. Costs in the year of death if dying from chronic kidney disease, musculoskeletal, or diabetes were particularly prominent, ranging from $13,000 to $30,000 (Fig 1). Dying-related costs from kidney disease, if also undergoing (failed) transplant or renal dialysis, are high. Dying of diabetes per se, whilst uncommon compared with dying of diseases elevated in risk by diabetes, is also expectedly expensive.
Regarding comorbidities, we find that the majority of comorbidity pairs increase costs more than that expected based on having the same diseases separately (Table 3, Fig 1), and this superadditive comorbidity cost was far more prominent at younger ages (Fig 2). Whilst we have no direct evidence, it seems plausible that young people with comorbidities might be treated more aggressively than old people with comorbidities or perhaps that young people with comorbidities have more severe disease than their younger counterparts with just one disease. Further research to understand this this age variation in comorbidity expenditure is warranted.
We estimated that 23.8% of all NCD health expenditure is attributable to these additional increments of comorbidity, over and above the independent costs of having diseases separately. Most previous COI studies analysing the impact of comorbidities have been from the perspective of one disease only, and its increased costs due to comorbid conditions (e.g., [25][26][27]). An exception is Dieleman and colleagues (2017), who used encounter-level data with multiple ICD codes recorded for each event, to quantify 'inflows' and 'outflows' between all comorbidities, to reallocate health spending mutually exclusively [12]. This is a different approach to what we took using a person-based excess costing approach to directly examine disease-disease interactions based on diagnoses stated on any event linked for the same person, and quantify the independent contributions of separate diseases and comorbidity pairs to total health expenditure. We found that the majority of the 15 comorbidity pairs increased expenditure over and above having the diseases separately. There was, however, evidence that having cancer and CVD simultaneously reduced health expenditure compared to that expected from having the two diseases separately. It is possible that a diagnosis of cancer means that treatments for other diseases are down-prioritised, but this is speculation on our part.

Strengths and limitations
There are limitations with our study. First, cost weights assigned to events may not have been the exact price to hospitals or other fund managers. Second, whilst New Zealand has highquality linked heath data, it does not capture all costs. Private expenditure is estimated at 18% of all health expenditure; future research could attempt to include this but will require additional data and assumptions about how this funding is distributed compared with public funding. For example, the great majority of NCDs are treated in public hospitals, but breast cancer treatment and hip/knee replacement surgery are more commonly provided in the private sector. There are also missing costs due to support care services (e.g., rest home care of people with dementia). We were unable to capture actual primary care usage by individuals on national databases; rather, we just imputed an 'expected' cost based on the bulk funding formula, and we will thus have underestimated disease costs somewhat, as people with the diseases we captured will (presumably) be higher users of primary care. We also do not include prevention costs. Nevertheless, we capture the majority of costs, and certainly enough costs to confidently speak to patterns of cost by sex, age, disease, disease phase, and comorbidity pairs. Future research should aim to improve these data, however.
Third, whilst we see the excess costing approach using regression modelling as a strength overall, and it avoids the complexities of attributing each individual medical event to a disease, it has limitations. People with diseases may be higher users of health services, regardless of their disease-a form of time-invariant confounding. In fixed effects analyses exploiting changes in costs within individuals (see S1 Appendix), we found broadly similar costs, suggesting that time-invariant confounding is not too problematic. There may also be time-varying confounding, whereby people with a disease increase health services use for other reasons, including increasing surveillance and preventive activities when engaged with the health system. Fourth, addition of further comorbidity combinations (e.g., with mental illness) would likely increase the proportion of all NCD health expenditure attributable to the superadditive effect of comorbidities.

Potential implications for researchers, planners, and policy makers
So what for researchers and policy makers? There is a surprising lack of disease-attributed costing studies involving multiple diseases at once. Governments and health systems managers and funders can improve planning and prioritisation, knowing where the money goes. Also, cost-effectiveness studies usually need costs by disease to model cost offsets from preventing disease (presentations) and, conversely, the health systems costs from living longer in evaluations using an unrelated disease costing approach [28,29]. Our analytical framework generates these costs by disease phase for such cost-effectiveness modelling (e.g., [30]). We believe the costing methods used in this paper can be applied elsewhere. For example, the OECD has as a goal that national health expenditure be disaggregated by disease; this will require standardisation as best as possible of data and methods across countries, but given data variations across countries, exact standardisation is unlikely. We therefore propose that an approach similar to that used for estimating epidemiological parameters in the GBD may be useful [2,31]. Here, one would build up from COI studies in multiple countries (e.g., [5]) and use country-level parameters such as GPD, health system configuration, and such to build a model predicting disease costs within an envelope of total health expenditure given by a system of national health accounts. Our paper and methods may help in that the regression models built within a given (or few) country predict the relative expenditure by disease-which could be merged with the population demography, disease epidemiology, and total health expenditure of another country to at least provide an initial costing.
Finally, any national costing study will be dependent on the data available and methods used. That said, we suspect that the general patterns of these results (i.e., of comorbidity impacts being superadditive) for New Zealand are likely to be generalisable to other highincome countries. The methods we applied in this paper could be applied elsewhere to test for such generalisability-or not-of comorbid costs.  Table. Comparison of coefficients (s.e. in parentheses; NZ$ 2011) for three modelling approaches, for six-disease model without comorbidity interactions: OLS regression on aggregated data (default method, as shown in Table 3 Table 3) and within-person (OLS fixed effects) approaches. Coefficients are in NZ$ 2011. OLS, ordinary least squares. (TIF)