Incidence and Variation of Discrepancies in Recording Chronic Conditions in Australian Hospital Administrative Data

Diagnostic data routinely collected for hospital admitted patients and used for case-mix adjustment in care provider comparisons and reimbursement are prone to biases. We aim to measure discrepancies, variations and associated factors in recorded chronic morbidities for hospital admitted patients in New South Wales (NSW), Australia. Of all admissions between July 2010 and June 2014 in all NSW public and private acute hospitals, admissions with over 24 hours stay and one or more of the chronic conditions of diabetes, smoking, hepatitis, HIV, and hypertension were included. The incidence of a non-recorded chronic condition in an admission occurring after the first admission with a recorded chronic condition (index admission) was considered as a discrepancy. Poisson models were employed to (i) derive adjusted discrepancy incidence rates (IR) and rate ratios (IRR) accounting for patient, admission, comorbidity and hospital characteristics and (ii) quantify variation in rates among hospitals. The discrepancy incidence rate was highest for hypertension (51% of 262,664 admissions), followed by hepatitis (37% of 12,107), smoking (33% of 548,965), HIV (27% of 1500) and diabetes (19% of 228,687). Adjusted rates for all conditions declined over the four-year period; with the sharpest drop of over 80% for diabetes (47.7% in 2010 vs. 7.3% in 2014), and 20% to 55% for the other conditions. Discrepancies were more common in private hospitals and smaller public hospitals. Inter-hospital differences were responsible for 1% (HIV) to 9.4% (smoking) of variation in adjusted discrepancy incidences, with an increasing trend for diabetes and HIV. Chronic conditions are recorded inconsistently in hospital administrative datasets, and hospitals contribute to the discrepancies. Adjustment for patterns and stratification in risk adjustments; and furthermore longitudinal accumulation of clinical data at patient level, refinement of clinical coding systems and standardisation of comorbidity recording across hospitals would enhance accuracy of datasets and validity of case-mix adjustment.


Introduction
Routinely collected data for hospital admitted patients are increasingly used for clinical and epidemiological research, health resource distribution, funding strategies and quality improvement purposes. Demographic and diagnostic information captured in administrative hospital data collections is employed for case-mix or risk adjustment in order to account for differences in patient characteristics and provide fair comparisons and reimbursements [1][2][3][4]. According to certain coding rules and data standards this information is recorded by clinical coders based on patients' medical information documented during admission [5]. Despite advancements in diagnoses classifications, coding training and standardisation of clinical documentation and coding practices that improved accuracy and reliability of comorbidity information [3,6], discrepancies in recorded comorbidities at coder, hospital [7][8][9] and regional levels [10,11] have been reported in Australia and elsewhere. Relating case mix to funding strategies introduced a systematic bias of reporting more comorbidities, known as "upcoding", for greater gains in several national health systems [12]. Such biases can change the relationship between patient profile and outcome across hospitals and would potentially lead to inaccurate or unfair provider comparisons and allocation of incentives [2,4,[13][14][15][16].
Different sources of information, employed by studies to verify consistency in hospital datasets, resulted in varying levels of agreement. Higher agreements were reported where hospital data were compared against clinical charts as opposed to self-reported data [7,[17][18][19]. A recent study reported almost a fifth of the variation in discrepancies in coding common comorbidities in Australian hospitals was attributable to hospital characteristics [7]. Individual hospitals contributed to the observed differences along hospital structural characteristics such as size and location [7,8,20].
Despite the important findings from Australian studies previously conducted, no study examined internal consistency of hospital datasets through longitudinal investigation of patient-specific morbidity information. Such a design allows a population-based investigation and reflects discrepancies within a homogeneous setting governed by a single documentation and set of clinical coding standards. Furthermore, investigation of the temporal behaviour of discrepancies and their variations can provide additional insight into the consequences of systematic changes in clinical coding practices such as changes in documentation, coding rules and standards, infra-structure and staffing [8,21,22].
This study aimed to measure non-recorded morbidity incidents in administrative hospital datasets and the contribution of patient, admission, morbidity and hospital related factors, as well as examine inter-hospital variation in the observed incidents. We used record linked data for all admitted patients between July 2010 and June 2014 in all acute hospitals across New South Wales (NSW), Australia. Discrepancies in the four chronic conditions of diabetes, hepatitis, HIV and hypertension as well as smoking status were investigated. These five are among most frequently captured conditions in risk-adjustment models [23][24][25]. Their effect on care and treatment make their recording required or more likely [5].

Data source and study population
NSW, the largest health jurisdiction in Australia, has over seven million residents and approximately 500 healthcare facilities with up to three million admissions per annum. We used records from the record linked Admitted Patient Data Collection (APDC) database between 2010-2013 financial years (2010-2013 FY) comprising all NSW hospital separations from 1st July 2010 to 30th June 2014. Each separation (episode of care) record includes information on patient demographics, morbidities and procedures, hospital characteristics, and separations (discharges, transfers and deaths) from all public and private healthcare facilities in NSW. Record linked APDC includes a unique patient identifier that enables the identification and linkage of patient-specific admissions [26]. Each record is assigned with up to 55 codes for morbidities (principal diagnosis and comorbidities) based on the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) Seventh Edition [27]. Linked APDC records were obtained from the NSW Admitted Patient, Emergency Department and Deaths Register, which was established under the public health and diseases registers provisions of the NSW Public Health Act 2010 and is maintained by the NSW Ministry of Health. Record linkage was carried out by The Centre for Health Record Linkage (CHeReL) [26]. The data were accessed remotely through Secure Analytics for Population Health Research and Intelligence (SAPHaRI) system made available by Centre for Epidemiology and Evidence, NSW Ministry of Health [28]. De-identified patients' records were provided and accessed via SAPHaRI and used for analysis. The study was approved by the Western Sydney Local health District (WSLHD) ethics committee and the Centre for Epidemiology and Evidence, NSW Ministry of Health as the data provider.
Of all admissions at all NSW healthcare facilities within our study period (11,278,591 admissions for 3,761,932 patients), we included admissions of those patients who had at least two admissions with hospital length of stays of at least 24 hours in any NSW acute public or private hospital with at least one recorded chronic condition. This study examined 1,545,294 (13.7%) admissions for 385,268 (10.2%) patients. Admissions at community facilities, multipurpose, non-acute or sub-acute centres, psychiatric and rehabilitation facilities, nursing home and hospices, and children's hospitals were excluded.

Discrepancy identification and covariates
Based on ICD-10-AM, five conditions, diabetes (E10-E14), chronic hepatitis (hepatitis: B18.0-B18.2, B94.2 and Z86.18), chronic HIV (B20-B22, B23.8, and B24), hypertension (I10-I15), and smoking (F17.1, F17.2, Z86.43, and Z72.0), were identified within recorded morbidities for each admission. For each patient, the earliest and the latest admissions with the recorded chronic condition (first and last index admissions respectively) were identified for each chronic condition. A discrepancy incidence in clinical coding was defined as any admission with a non-recorded chronic condition occurring: a) between the first and the last indices; or b) within three months of the last index admission (follow-up period) but not occurring after 31 st March 2014 (buffer period). For patients with only one admission with a recorded condition, only one index admission existed and therefore the second criterion was applied. All admissions occurring after the first index admission including the last index admission and those that met the follow-up and buffer periods criteria were included in the denominator.
This restricted prospective approach in the identification of a non-recorded chronic condition was employed to avoid any overestimation caused by counting admissions prior to diagnosis or after possible cure. The limited follow-up period of three months allowed inclusion of any possibly true discrepancy incidents occurring after the last index admission while minimising inclusion of any admission following a false positive admissions (patient had no chronic condition but the condition was recorded). Furthermore a buffer of three months at the end of the study period diminished the effect of censoring among follow-up admissions. An extensive sensitivity analysis using different follow-up periods in the absence or presence of a buffer was conducted and the results were outlined in S1 Table. For all admissions, four sets of covariates-patient, admission, morbidity, and hospital related-were considered. Patient demographic variables included age, gender and socio-economic status. We utilised a Statistical Local Area level disadvantage index of Socio-Economic Indices for Areas (SEIFA) with the lower values indicating more disadvantage [29]. SEIFA scores were categorised into four classes (1 st quartile: most disadvantaged to 4 th quartile: least disadvantaged areas). Admission covariates included admission type (surgical, medical, and other), admission source (emergency, planned, and other), and length of stay (1-2, 3-5, 5-10, and over 10 days). Morbidity related factors were number of recorded morbidities categorised by quartiles, presence of any other chronic conditions (yes, no), and discrepancy in recording other four chronic conditions (yes, no). Hospital characteristics included hospital type (public vs. private), location (metropolitan vs. rural), and peer groups for public hospitals. Public hospital peer groups comprised "A1": principal referral group, usually teaching hospitals; "B": major metropolitan and non-metropolitan; "C1": district group 1; and "C2": district group 2. Hospital peer groups contained similar sized hospitals, ranging from those treating more than 25,000 acute case-mix weighted separations per annum in principal referral groups through to treating between 2,000 and 5,000 acute case-mix weighted separations in district group [30].

Statistical analysis
We employed Poisson linear models to evaluate adjusted discrepancy incidence rates (IR) and rate ratios (IRR) for the five chronic conditions separately after including patient, admission, morbidity and hospital-related characteristics. Separate models for public hospital admissions were constructed to derive estimates for the public hospital peer group effect. Morbidity characteristics were also entered into the models one at a time because of multicollinearity. To investigate the temporal behaviour of the discrepancy incidents, financial years were also entered into the models for all admissions as well as in separate models for public and private hospital admissions as indicator variables, with 2010 as the reference year. Adjusted trends were estimated by multiplying incidence rate ratios obtained from the Poisson model and the crude risk at the reference year. The difference between public and private hospitals trends was also assessed using an interaction term between the hospitals type and year variables in the full model.
Inter-hospital variation among public hospitals was evaluated within a multilevel framework, using a Poisson mixed model with a random intercept component at hospital level for each condition. A series of models were constructed to assess the contribution of hospitalrelated factors in the observed variation of discrepancy incidents following adjustment from a null model to the most comprehensive with all covariates. To express the inter-hospital variation, we employed the variance partition coefficient (VPC) for Poisson multilevel modelling scheme using the exact formulae developed by Stryhn et al. [31]. The VPC on hospital level indicates the influence of the hospitals on discrepancy incidents that cannot be explained by the model parameters. Due to conditionality of VPC in Poisson modelling on covariates values, the median and inter-quartile of the calculated VPC for all existing covariates values were reported. Furthermore the proportional change in inter-hospital variance estimates (s 2 h ) of the different models were calculated. This indicates the proportion of total inter-hospital variation that is explained by case-mix factors. To translate inter-hospital variation into risk differences, we used the median incidence rate ratio (MIRR) statistic which is the median of the rate ratios of pair-wise comparisons of admissions with identical characteristics taken from randomly chosen hospitals and calculated as expð0:95 Â ffiffiffiffi ffi s 2 h p Þ, an extension of the measure developed by Merlo et al [32,33]. To assess the effect of hospital size, random intercept estimates were stratified by hospital peer group and associated statistics were derived. To quantify trend of inter-hospital variation over the study period, the Poisson mixed models were extended by the inclusion of the year variable as a categorical random slope. We also used pair-wise Pearson correlation to assess the association of hospital recoding performances across the five chronic conditions, based on the hospital-specific random intercepts. Data preparation was conducted in SAS Enterprise Guide V.6.1 [34] through SAPHaRI [28], and analyses were performed in R package V.3.1.2 [35].

Discrepancy incidence rate and risk factors
Of 228,687 inspected admissions following 76,666 patients with a diabetes related first index admission, 43,008 subsequent admissions had no recorded diabetes code, resulting in a discrepancy incidence rate of 18.8%. There existed more discrepancy incidents related to the four other chronic conditions: 26.7% (in 1,500 admissions) for HIV, 33.2% (in 182,735 admissions) for smoking, 36.6% (in 12,107 admissions) for hepatitis and the highest rate of 51% (in 262,664 admissions) for hypertension (Table 1).
Discrepancy incidents were lower among females for most of the chronic conditions, with the largest gender difference observed for hypertension (21%). For older patients, hepatitis was more accurately recorded, while diabetes and smoking conditions were less likely to be documented than for younger patients. Patients' socio-economic status had either no or an inconsistent effect on coding completeness. Patients who underwent surgery during hospitalisation were up to 36% more likely to have their chronic conditions recorded compared to the medical patients. The effect of admission source was inconsistent across chronic conditions; emergency admitted patients had a lower discrepancy incidence rate for diabetes, but a higher rate for smoking compared to planned admissions. A similar inconsistent pattern was evident for the effect of length of stay on completeness of recording chronic conditions ( Table 1).
The incidence of non-recorded chronic conditions was significantly higher in private hospitals across all conditions. For the most common conditions of diabetes, smoking and hypertension, the excess likelihood of discrepancy in recording morbidities within private hospitals ranged between 15% and 22%. Significantly higher inconsistency rates of 60% and 201% in clinical records were found for patients with hepatitis and HIV. Rural hospitals tended to have up to an 8% lower discrepancy incident rate in recording the top three most common chronic conditions; no difference was observed between metropolitan and rural hospitals in recording HIV and hepatitis. Among public hospitals, smaller hospitals had higher discrepancy incidence rates mainly for the top three most common conditions, compared to large principal referral hospitals. The highest gaps of at least 90% and the lowest gaps of at most 20% were found in recording diabetes and hypertension respectively ( Table 2).
A higher number of recorded comorbidities at hospital admission decreased the likelihood of non-recorded chronic conditions by at least 40%. Each chronic condition (except HIV) was more likely (at least 13%) to be recorded if the patient had any other recorded chronic conditions. The likelihood of not recording a chronic condition could increase in the omission of the recording of the other four conditions by at least 49% (Table 2).

Trend analysis
As depicted in Fig 1, discrepancy incidents for all examined chronic conditions declined over the four-year period (2010-2013 FY). The sharpest drop of close to 85% was observed for diabetes (adjusted rates of 47.7% in 2010 vs. 7.3% in 2013). For hepatitis the adjusted rate increased by 6% in 2012, reaching 58%, then markedly dropped by over 60% in 2013, with a total drop of 56% over the study period. Incidence rates for smoking and hypertension notably decreased by 35% and 20% respectively, but rates were unchanged for HIV with a non-significant drop of 18%. The discrepancy incidence rate in public hospitals remained lower than the Table 1. Admissions, discrepancies and associated incidence rates and rate ratios, stratified by patient and admission characteristics.  rate in the private hospitals over the study period for all the chronic conditions. Up to 4% larger drops were observed for diabetes and hypertension rates in public versus private hospitals; whereas the drop in the discrepancy rate for smoking was 10% larger in private compared to public hospitals. The observed differences in trends for hepatitis and HIV discrepancy rates were not significant.

Inter-hospital variation
Adjustment for patient and admission characteristics explained much of the observed interhospital variations in discrepancy incidents, as seen by large drops in the VPC from the model with no adjustment to models with patient and admission factors across all chronic conditions (Table 3). However, a noticeable proportion of between 0.9% (for HIV) and 9.4% (for smoking) of all variations was still attributable to hospital and associated factors. Hospital characteristics (rurality and peer groups), partly explained the inter-hospital variation, leaving up to 7% of unexplained variation that was associated with individual hospital characteristics (unseen factors). Overall, the presence of smoking or hepatitis in a patient admitted to a hospital with high discrepancy rates, was up to 33% (MIRR = 1.33, adjusted for patient and admission characteristics) more likely not to be recorded, than had the admission been to a hospital with lower discrepancy rates. A smaller gap of close to 20% was observed for diabetes and hypertension; followed by 14% for HIV, the condition most robust to hospital characterises (Table 3). According to proportional variance reductions, case-mix factors explained between 22% (the lowest for hypertension) and 61% (the highest for HIV) of inter-hospital variations. Most of this was explained by hospital and admission factors as opposed to patient demographics, as inclusion of them largely decreased the estimated variances across all chronic conditions. In   Table 3). The extent to which inter-hospital variations and likelihood gaps in recording chronic conditions were influenced by hospital size varied across chronic conditions (Fig 2). Large principal referral hospitals (A1) tended to have smaller variations in recording diabetes, but higher variations for HIV compared to the smaller district hospitals (C1 and C2). The observed variations in recording diabetes translated to a 9% gap among principal referral hospitals compared to a 17% gap in the group with the smallest district hospitals (C2); the relevant numbers for HIV were 24% and 9% respectively. No consistent pattern or considerable difference was observed among hospital groups in recording other chronic conditions.
The inter-hospital variations in discrepancy incidence rates varied over time for all chronic conditions. In particular, there were greater differences in the coding of diabetes in the second half of the study period compared to the first half (Fig 3). The gap of at most 25% for diabetes in the first period increased to over 65%. An increase from 10% to 34% was also evident for the coding of hepatitis. The trends in the variation of the other three conditions noticeably decreased in 2011 and subsequently either remained stable or began to increase over the next three years.
Hospitals with lower discrepancy rates in recording diabetes tended to also have lower discrepancy rates in the recording of smoking since a significant correlation between deviations from the average (estimated hospital-specific intercept) in the two conditions was observed across 80 hospitals. A similar pattern among hospitals was also observed for the recording of hepatitis and smoking as well as for hypertension and HIV; see S2 Table. Inter-hospital variance and associated median incidence rate ratio of discrepancy incidents for the five chronic conditions among 80 NSW public hospitals stratified by hospital peer groups.

Key findings
This large population-based study using NSW Ministry of Health hospital admissions linked datasets over a four-year period identified the non-recorded incidence rates of five chronic conditions as varying between 19% (for diabetes) and 51% (for hypertension). Except for HIV, the adjusted discrepancy incidence rates for all examined chronic conditions declined considerably, ranging from 20% for hypertension to 80% for diabetes over the four-year period to July 2014. Admission records from private hospitals and smaller public hospitals had higher discrepancy incidents compared to their counterparts. Variability among public hospitals was responsible for 1% to 9% of variation in adjusted discrepancy incidence rates for the five chronic conditions, leading to between 14% and 33% discrepancy rate differences. Seven per cent of the variation remained unexplained after adjusting for hospital characteristics. The inter-hospital variation changed over time, with the increase most noticeable for diabetes. Hospital size had an inconsistent effect on inter-hospital discrepancy differences across the conditions.

Discrepancy incidence rates and trends
Completeness in recording chronic morbidities and agreement among different sources of morbidity data have been investigated in Australia and elsewhere. The identified 19% incompleteness in the coding of diabetes in the NSW hospital administrative data was in the range of the previously reported rates of at most 13% [17][18][19]36] and 26% [7] when clinical charts and self-reported information, respectively were used as the reference. The large drop in discrepancy incidence rates for diabetes from over 47% in the first half of our study period to 10% or less in the last two years coincided with the change in rules governing the coding of diabetes as Discrepancies in Recording Chronic Conditions comorbidity in hospital data. In general, according to the Australian ICD standard for documenting additional diagnoses in clinical charts, only those conditions affecting the patient's care management or treatment within that admission are required to be coded in hospital administrative datasets [5]. Therefore, diagnoses that relate to an earlier admission, and which have no effect on the current admission, are not required to be coded. The cause and effect relationship requirement for coding purposes between diabetes and the patient's care, which was applied during the 2010 to 2012 period, was lifted in July 2012 [27]. Such changes reportedly influenced diabetes prevalence estimates based on administrative data [22,37] and the occurrence of discrepancies as demonstrated in this study. Our findings reflected the influence of the change in standards that lead to reduced subjectivity associated with coding at the coder level. In particular, it revealed the potential improvement in recording (documented) diabetes by coders versus the lack of documentation of diabetes in clinical charts by clinicians [4].
A lower discrepancy rate of 19% in coding smoking status was observed in the UK administrative datasets [19], compared to the 33% identified in this study and the 41% reported recently from NSW APDC datasets [7]. Inclusion of tobacco related service use in the UK study could have contributed to lower inconsistency, while identification of ex-smokers and tobacco related injuries in our study compared to the recent Australian research may have resulted in better completeness rates identified in this study.
The observed 51% discrepancy rate for hypertension was almost double that seen when clinical charts were the reference [17,19], but was lower than the rate of 69% obtained using patients' self-report [7]. Compared to other reports, we applied the narrowest ICD codes in case identification, disregarding cases with renal, brain or pregnancy complications caused by hypertension which could have resulted in different completeness rates.
For the rare conditions of hepatitis and HIV, our study benefited from a large state-wide cohort, providing more reliable discrepancy rates (of 37% and 27%) compared to other reports (zero to 33%) limited by small sample size [17,18,38,39]. We found noticeably high inconsistencies in coding morbidities that are either life threating or can cause severe complications, as is the case for HIV, which is listed among the most important risk factors of mortality in risk adjustment methods [23,24].
In addition to changes in coding standards and varying case identification methods noted above, systematic changes that affect coding practices as well as the method of verification to identify non-recorded comorbidities in hospital data may also have contributed to the differences in reported discrepancies. The observed decreasing trends for all conditions, particularly within public hospitals, can be associated with the introduction of activity based funding in 2011 in Australia [21,40] as it previously resulted in increases in the recording of secondary diagnoses and procedures [12,41] in Europe. Responses to shortfalls in staffing and training of clinical coders prior to our study period [42] could have contributed to the temporal reduction in discrepancies as observed elsewhere [8,39].
The current findings indicate higher discrepancy rates compared to studies conducted using clinical chart review, regarded as the gold standard [17,19,38], to ascertain the presence of chronic conditions but lower rates than studies using primary carer provider or patient survey information [7,18]. Having higher rates compared to clinical chart-based studies could be due to inclusion of non-recorded conditions as true non-documented conditions in our rates. Despite the potential to report false positive rates (falsely recorded conditions) in studies measuring agreement between hospital data and external references, comparison of hospital data with clinical charts indeed focuses on discrepancy in the coding conditions that were documented. Using other references such as survey based information may still overlook nonrecorded conditions, and not resolve problems of high subjectivity and variation due to lack of unique governing standards in documentation. The employed internal references developed and applied in this study enabled the capture of all non-recorded conditions regardless of whether they were documented within one environment and governed by a unique set of rules. Although the effect of temporal data accumulation was not determined, using a prospective design enabled us to directly estimate the amount of discrepancy which can be eliminated through data accumulation over time. The demonstrated increase in accuracy of hospital data through temporal data accumulation [7,36] also support the utilisation of internal references within this setting. The very low false positive rates of less than 2% in administrative datasets [7,18] for most of the conditions investigated gives further credence for the reliability of our internal references (index admission being true positive) made possible with data linkage and lends credibility to our sole focus on non-recorded comorbidities.
We echoed other research findings of higher discrepancies among private hospitals compared to public hospitals [7,8,17,18]. The role of clinical coding in funding public hospitals could result in improved accuracy in public hospital datasets [12,41]. Our finding that rural hospitals tended to have more accurate recorded conditions was consistent with US results [8,20] but contradicted previous Australian findings [17]. However, the significantly higher discrepancy rates in smaller versus larger public hospitals were consistent with previous Australian findings [7,17]. A tendency to record more comorbidity at larger hospitals, reflecting the presentation of severely ill patients with multiple conditions was positively associated with better accuracy in administrative datasets [7,13].

Inter-hospital variation
Variation in performance [43,44], quality and safety [45,46] and service usage [47] indicators among acute care providers in NSW and elsewhere has been identified. Taking into account patient and admission differences, notable inter-hospital variability in discrepancy incidence rates was evident among 80 NSW public hospitals. A third of our adjusted inter-hospital variation (0.9% to 9.4%) was explained by hospital size and location. Our results were comparable to Lujic et al. [7] who reported a slightly higher variation (2% to 13%) among similar hospitals. Differences in the modelling scheme, measurement and adjustments would have contributed to the results.
Larger variation in recording hypertension and smoking than diabetes were consistent with previous findings [7]. The contribution of case mix adjustment in explaining inter-hospital variation differed across chronic conditions with the highest for HIV and hepatitis and the lowest for hypertension. No comparative data exist for examining the variability in recording hepatitis and HIV conditions. These findings highlighted the potential biases, caused by discrepancies in coding, for care provider comparisons and funding based on risk adjustment methods, in particular those using hypertension, as has been addressed [7] and evaluated [13] elsewhere.
Discrepancy rates as well as inter-hospital variation varied over time and were affected by hospital size. Despite the observed drop in the discrepancy rate for diabetes, a significant increase in the related inter-hospital variation over the second half of the study period was evident, perhaps reflecting differences in the method and speed of adoption of modified coding rules for diabetes [27]. The timing and level of adaptation to new standards among hospitals can introduce larger variation at least in the short term. The introduction of activity-based funding might also have contributed to the overall increasing trend of variation observed from 2011 for the other conditions [40].
At the patient level, discrepancy rates for each condition were inversely associated with the number of recorded conditions and, in particular if the recorded comorbidities included one of our five chronic conditions. At the higher level, hospitals with good coding practice for one condition tended to do well with others. These findings reemphasise the importance of individual hospital responsibilities and characteristics. Engagement of coders in diverse roles, higher staffing and lower throughput, training and professional development and interaction with clinicians are among the effective organisational factors aimed at enhancing clinical data quality [8,9]. Enhancement and standardisation of training and rotation of coders between hospitals have also reduced variation at coder level [39].

Implications
Our study raises several important policy implications. Firstly, despite advancement in adjustment methods to ascertain fair comparison and funding strategies, the significant non-random inconsistencies in the administrative dataset are likely to lead to disproportionate conclusions. Minimising discrepancies or at least controlling for hospitals level factors through modelling or stratification will facilitate optimal decision making. Secondly, in the absence of routine utilisation of clinical chart review, the use of temporal accumulation of morbidity information within administrative datasets to measure discrepancy and construct informed risk adjustment is feasible, as demonstrated by this study. Thirdly, defining quality characteristics for administrative data and routinely monitoring the quality indicators over time would allow better understanding of the effectiveness of system changes, such as documentation and recording standards, and highlight areas for improvement and subsequent actions [48,49]. Lastly, systematic knowledge enhancement and engagement among hospital administrators, clinicians, coders and researchers within health service domain for recording quality improvement and reimbursement purposes should be formalised.

Strengths and limitations
This study benefited from its design, using a large population based dataset to access all admissions in all acute hospitals within the most populated health jurisdiction in Australia to explore for the first time trends in coding discrepancy rates. This study benefited from data linkage at patient level and a prospective longitudinal design. The design enabled the exclusion of any pre-diagnostic admissions, to eliminate the risk of any overestimation in discrepancy rates, and combined with a restricted set of criteria and follow-up period minimised any false positives due to error or post treatment. The proposed design developed and employed internal references based on routinely collected data that could readily be used for real time monitoring of clinical coding practices and improvement through longitudinal data accumulation and dynamic indexing.
We may have under-reported the total discrepancies in the absence of an external reference for measuring false positive rates. However, clinical chart review on randomly sampled cases although useful, is limited to the extent that it relies on comprehensive documentation of all comorbidities. Variation analysis was limited to public hospitals as determined by data availability; the analyses of hospital-specific admissions from private hospitals could provide addition insight. Despite the essential role of time in our design, the effect size of data accumulation as well as time between multiple admissions was not quantified and is therefore an area for further research. Models incorporating a coder's related characteristics including staffing, rotation, experience and training, coding parameters such as rules governing the documentation and coding of a condition and unseen admission factors including principal diagnoses may better explain differences. Distinguishing contribution to discrepancy from incomplete documentation of morbidity in clinical chart versus incomplete coding of documented conditions in hospital administrative datasets would be very informative for targeted actions. Conducting a controlled trail or comparing patients' records across transfers could provide valuable insight of documentation versus coding contributions in discrepancies. Inclusion of changes in the rules governing recording practices in the modelling might also provide more evidence on the effect of system-wide changes and further highlight potential areas for improvement.

Conclusion
Chronic conditions are recorded inconsistently in hospital administrative datasets, and hospitals, individually as well as grouped by characteristics, contribute to the observed incidence and variation in discrepancies. Consequently, case-mix adjustments for provider comparison and funding purposes could be biased because of coding incompleteness and associated discrepancy patterns across hospitals. While examination of non-recording patterns associated with hospital characteristics through modelling or stratification for risk-adjustment purposes could potentially minimise bias, longitudinal accumulation of clinical information at patient level through data linkage combined with refinement of clinical coding systems and standardisation of documentation across hospitals would enhance accuracy of routinely collected datasets and the related validity of case-mix adjustment.
Supporting Information S1 Table. Pair-wise correlation of hospital performance in recording chronic conditions. (DOCX) S2 Table. Discrepancy incidences using different settings. Number of admissions included in discrepancy incidence rate calculation for each of the five chronic conditions using different follow-up periods and buffers. (DOCX)