Can Italian Healthcare Administrative Databases Be Used to Compare Regions with Respect to Compliance with Standards of Care for Chronic Diseases?

Background Italy has a population of 60 million and a universal coverage single-payer healthcare system, which mandates collection of healthcare administrative data in a uniform fashion throughout the country. On the other hand, organization of the health system takes place at the regional level, and local initiatives generate natural experiments. This is happening in particular in primary care, due to the need to face the growing burden of chronic diseases. Health services research can compare and evaluate local initiatives on the basis of the common healthcare administrative data.However reliability of such data in this context needs to be assessed, especially when comparing different regions of the country. In this paper we investigated the validity of healthcare administrative databases to compute indicators of compliance with standards of care for diabetes, ischaemic heart disease (IHD) and heart failure (HF). Methods We compared indicators estimated from healthcare administrative data collected by Local Health Authorities in five Italian regions with corresponding estimates from clinical data collected by General Practitioners (GPs). Four indicators of diagnostic follow-up (two for diabetes, one for IHD and one for HF) and four indicators of appropriate therapy (two each for IHD and HF) were considered. Results Agreement between the two data sources was very good, except for indicators of laboratory diagnostic follow-up in one region and for the indicator of bioimaging diagnostic follow-up in all regions, where measurement with administrative data underestimated quality. Conclusion According to evidence presented in this study, estimating compliance with standards of care for diabetes, ischaemic heart disease and heart failure from healthcare databases is likely to produce reliable results, even though completeness of data on diagnostic procedures should be assessed first. Performing studies comparing regions using such indicators as outcomes is a promising development with potential to improve quality governance in the Italian healthcare system.


Introduction
Primary care is specifically suitable to face the growing chronic disease epidemic in a sustainable way [1,2]. Therefore it is the object of novel attention and of innovative policies [3] which specifically need health services research for timely effectiveness evaluation [4][5][6][7].
Many observational studies have been performed to evaluate the impact of innovative policies in primary care, for instance alternative rewarding policies for General Practitioners (GPs) in Ontario [8] or incentives for the introduction of Electronic Health Records in the United States [9,10]. Such studies use administrative data to obtain evidence on the impact of policies in a inexpensive, timely and reproducible fashion [11]. Indicators measuring compliance with standards for management of chronic diseases were used as outcomes in those studies, similar to the clinical indicators of the Quality and Outcome Framework of the UK National Health System [12], such as regular prescription of recommended therapies and regular diagnostic follow-up. However, concerns have been raised that such indicators estimated on the basis of administrative databases might not reflect the actual compliance of standards in the population bearing the disease, as methods to identify patients from administrative data, rather than clinical information, might lead to biased samples. Studies addressing this issue have obtained contradictory findings [13,14].
As a result of those concerns, comparison of quality of primary care between regions or countries is generally performed by means of hospitalization rates for the so-called ambulatory care sensitive conditions [15], which are readily obtained from administrative databases but do not require identification of cohorts of patients with a specific condition. However the relationships between quality of primary care and avoidable hospitalization is complex and population-based trends can be confounded by socioeconomic factors [16], by prevalence of morbidity or general hospitalization habits [17].
In Italy, the VALORE Project was the first national-level study which evaluated a national policy in primary care by using administrative healthcare data for calculation of indicators of compliance [18]. This paper presents the validation study on the reliability of administrative databases in estimating such indicators.

Ethics
No identifiable human data were used for this study. The dataset used in the study is not openly available. According to the Italian law on data confidentiality (decree 196/2003), permission to use individual-level data, albeit non-identifiable, must be granted by the institutions which bear the responsibility of the custody of the data. Permission to use data extracted from administrative databases for the VALORE project was granted to Agenzia regionale di Sanità della Toscana by ULSS 16 Padova (Veneto region), ASP 7 Ragusa (Sicily region), Assessorato Politiche per la Salute Emilia Romagna (Emilia Romagna region), Zona Territoriale Senigallia (Marche region), which are responsible for the custody of the data of the corresponding populations. Agenzia regionale di sanità della Toscana (Tuscany region) is enabled by a regional law (40/2005) to use Tuscan data for research purposes. Approval for use of encrypted and aggregated data from the HSD was also obtained from the Italian College of General Practitioners.

Setting
Italy has a tax-based, universal coverage national health system organised in three levels: national; regional (21 regions); and local (on average 10 Local Health Authorities per region). Healthcare is managed for every inhabitant by the Local Health Authority where she has her regular address [19]. Coordination of primary care within a Local Health Authority is performed at a smaller geographical level called Health District [18]. Every Italian inhabitant is entitled to choose a GP, although parents might opt for a specialist paediatrician instead for their children, up to the age of 15. Therefore, each inhabitant from the age of 16 onward is specifically associated with a GP. GPs are the ''gatekeepers'' of the system, meaning that only upon GP prescription can specialist encounters be obtained free of charge. Dispensing of drugs or administration of diagnostic procedures can be obtained free of charge upon prescription of either a GP or a specialist physician employed by the healthcare system [19].
The five regions which contributed data to the VALORE validation study were: Veneto (A, Northern Italy), Emilia Romagna (B, Northern Italy), Tuscany (C, Central Italy), Marche (D, Central Italy) and Sicily (E, Southern Italy).

Study design
The VALORE project had selected several indicators to measure compliance with standards of care for diabetes, IHD and HF. In each region from the pool of regional GPs two convenience samples of groups of GPs were extracted and included in the validation study. In each regional pair, GPs of one sample had indicators computed from administrative databases, GPs of the other from their own clinical databases. Measurements of indicators were compared within and between regions.
The true values of an indicator across all the GPs in a region are an unobservable distribution. The rationale of this study design is based on the assumption that if measurements performed with two different methods in two different samples of GPs provide similar results, the likelihood that they both measure the true distribution is higher than the likelihood that they systematically make the same mistakes across different regions.

Data collection: sample of GPs with administrative-based measures
The national Italian government has mandated since the early Nineties collection of healthcare administrative data across the whole country. The healthcare activities which are mandated to be reported to the government have progressively expanded, from inpatient care [20] to drug dispensings and diagnostic tests [21]. Moreover an inhabitant registry is maintained by each Local Health Authority, where the GP chosen by the inhabitant is recorded, as well as other information, such as gender, birth date, date of entry in the territory of the Local Health Authority, date of exit from the territory [21]. However, outpatient diagnoses are not recorded in health administrative databases yet. Therefore cohorts of patients with chronic diseases must be selected by means of disease-specific longitudinal algorithms involving hospital discharges diagnoses, drug and/or other healthcare services utilization.
In each region, a convenience sample of Health Districts was chosen. All the GPs serving in those Health Districts were identified from the inhabitant registries of the corresponding Local Health Authorities and included in the sample. The healthcare administrative data of the whole population who chose a GP in this sample was loaded in the VALORE database. Patients aged 16-95 with diabetes, IHD and/or HF at the index date 1/1/2009 were detected by means of ad hoc algorithms based on past healthcare received. More details on this process are described elsewhere [22]. Indicators were computed during a one-year follow-up by linking the cohorts to the administrative databases of drug dispensings and diagnostic tests.
GPs were excluded from the samples if they had less than 300 persons registered or less than 4 patients with the disease, as indicators computed on small numbers were considered to be non robust.
The samples of GPs with clinical-based measures were drawn from the Health Search CSD Longitudinal Patient Database (HSD), a longitudinal observational database that is representative of the general Italian population. HSD was established in 1998 by the Italian College of General Practitioners and, at the time when the study was conducted, it contained data from from more than 800 GPs throughout Italy, covering a total population of around 1.2 million patients [23]. The GPs participating in HSD all use the same information software, in which they record demographic information, visits and referrals, diagnoses, drug and diagnostic tests prescriptions and clinical information of their patients. They are accepted as participants in HSD if their records are arguably complete, i.e. the prevalence of the principal diseases measured from their records is comparable with the expected prevalence of the general population. For this study, data from the 190 GPs practicing in the five regions of the VALORE project were used. The study population comprised patients aged 16-95 who had been registered with the GP for at least two years and were alive on 1st January 2009. Patients with diabetes, IHD and/or HF at the index date 1/1/2009 were detected by means of algorithms based on recorded diagnosis, which is described in detail elsewhere [22]. Indicators were computed from the prescribed drugs or diagnostic tests.

Indicators
The indicators that were included in the study are shown in Table 1, and are classified as therapy indicators (for IHD and HF only), laboratory diagnostic tests, and bio-imaging diagnostic tests (HF only). All the indicators were based on clinical guidelines for the management of the disease that recommended regular therapy and yearly testing, respectively. The standard for a therapy recommendation was considered to be compliant with when at least two dispensings (in VALORE) or prescriptions (in HSD) were recorded in 2009, at least 180 days the one from the other. The standard for a diagnostic recommendation (laboratory or bioimaging) was considered to be achieved when at least one test was performed (in VALORE) or prescribed (in HSD) during 2009.

Statistical analysis
In each sample the number of GPs, the number, age and gender distribution of patients aged 16+, and the average number of patients per GP were computed, both for the general population and for the population with each of the diseases. Differences in the variables within each regional pair of samples were tested either by a two-tail difference in means or a Chi-square test.
For each GP indicators were computed as percentage of patients who were compliant with the recommended standards of care. The distribution of the indicators of each regional pair were represented in a box-plot. To test whether each pair of measurements was drawn from the same distribution, the nonparametric Wilcoxon-Mann-Whitney two-sample statistic (also known as Wilcoxon rank-sum statistic) was performed in each region and for each indicator. In a sensitivity analysis, the test was repeated for achievements of standards in patients aged 45-74 years.
Data management and data analysis were performed with Stata 10.1.

Results
Of the 1671 GPs serving in the Health Districts participating to the VALORE study, 1501 (89.8%) had enough registered patients and entered the study. Few GPs were discarded from the diseasespecific studies because they had less than four patients, the maximum was the 7% of GPs in region A in the HF study. All the 190 GPs in the HSD sample entered the study.
The description of the study population is shown in Table 2. Every HSD sample contained less GPs than the VALORE sample. The GPs in the HSD sample had a bigger registered population on average in all the five samples (range in HSD: 1238-1431, range in VALORE: 925-1223). The average number of patients per GP was higher in HSD GPs as well for diabetes (range in HSD 92.0-107.5, range in VALORE: 55.9-81.6) and IHD (range in HSD: 50.8-78.6, range in VALORE: 40.0-61.9), but for HF the numbers were similar in the two groups (range in HSD: 13.7-22.2, range in VALORE: 12.8-20.0). Age distribution was different within all pairs in all the populations, and the VALORE samples were older except in region B. Women were slightly more represented in the VALORE populations, except again in region B and in region E. This difference in gender did not show up in diabetic patients and was not consistent across regions in IHD and HF patients. Figure 1 shows the box-plots of the pairs of distributions of the crude values of each indicator. A qualitative examination of the box-plots detected that distributions are very similar within the pairs. A notable exception are laboratory measurements in region E and bio-imaging test in all the regions, and VALORE showed lower values in all cases. The geographical trends, represented by orderings of the median values of the distributions, were similar between regions when measured in either data source, but less so in the case of the bio-imaging test. Table 3 shows the results of the Wilcoxon-Mann-Whitney tests. Among therapy indicators the test found no differences in the distributions, with the exception of the samples in region C and, for HF only, region A, and the VALORE samples had higher values. The test confirmed that the distributions for all the laboratory diagnostic indicators of region E were different. Among diabetes the test detected slightly different distributions in three regions in either of the indicators, and in the IHD indicator region C and B had different distributions. The test confirmed that the only indicator of bio-imaging testing resulted in incoherent

Discussion
Even though in Italy the data items to be collected in health administrative databases are mandated by the central government, and the resulting central databases are therefore formally homogeneous, data collection takes place locally. Italy is characterized by long-standing regional differences in general and in healthcare in particular [24]. Therefore it is possible that inaccurate local data collection processes hamper data quality and completeness, and in particular quality of personal identifiers that allow for record-linkage. Moreover, outpatient diagnosis are not among the data items collected, therefore identification of cohorts of patients with chronic diseases must rely on algorithms linking inpatient diagnosis with drug and other healthcare services utilization. Inhomogeneous quality of personal identifiers and completeness of recordings might lead to inhomogeneous accuracy in defining cohorts of patients and in identifying healthcare services that they access. This in turn might result in non-comparable measures of compliance with standards of care for chronic diseases.
This study addressed this concern by comparing such measures with measures obtained from a different data source, in five Italian regions. The database which was chosen as a comparator collects clinical data from GPs, and is therefore complementary to the healthcare administrative data.
The results show that administrative databases provide reliable estimates on regional level. Indeed, the four therapy indicators had the same distribution within the pairs of regional samples in the large majority of cases. The same was observed for the three diagnostic indicators except in one region, where the distributions were systematically different. The only bio-imaging indicator had different distributions within pairs. Geographical trends between regions were consistent across the two data sources. This provides evidence that the two data sources both estimate the same population distribution, thus supporting the use of indicators computed on health administrative databases for comparisons between regions.
It was not possible to obtain measurements from the two data sources on the same samples of GPs. This was partly due to the fact that the identity of the GPs belonging to the database HSD is confidential. Moreover, data linkage at individual or even GP level between different data sources had legal implications in terms of privacy regulations and the procedures needed to obtain permissions for such data collection [25] could not be managed in the context of the VALORE Project.
Therefore, observed differences in the distributions might be attributable to the composition of the following main effects: (a) due to non-random selection of the two samples, the GPs in the two samples were qualitatively different with actually different performance; (b) due to the different selection process that was conducted in the two type of data sources, the cohorts of patients of the two samples were qualitatively different subpopulations of the actual patients, which actually received different care; (c) difference in measurement, and HSD was likely biased (d) difference in measurement, and VALORE was likely biased. In the following paragraphs we provide plausible explanations to disentangle the effect (d), which is the object of this study, from (a), (b) and (c). It is a limitation of this study (see Limitations  Table 2. Cont. subsection) that some of the hypotheses we generated could not be tested. For cause (b), the main reference is the study by Gini et al, which found evidence that diabetic patients without therapy are less prevalent in the VALORE sample, and patients with heart failure are younger in the GP sample [22]. For therapy indicators some differences are observed for HF in regions A and C. This is most likely due to reason (c), that is, patients included in the cohorts of HSD samples are different than patients included in the cohorts of the VALORE samples: indeed, age distribution of patients is different within the pairs, with the older cohort in VALORE being more likely to be assisted at home or in residential facilities, where GPs are likely to not record their activity completely [22,23]. To test this assumption, analysis was restricted to patients aged 45-74, and indeed differences disappeared in region C in one indicator and in region A in both.
For laboratory testing indicators, region E seem to underestimate consistently the actual values of the indicators, across the three diseases. This could amount to incomplete collection of administrative data from laboratories, or to higher use of out-of pocket services: indeed, the most recent National Health Survey found that in region E (Sicily) attitude to use diagnostic services that are non reimbursed by the Health System is higher than in the other regions participating to our study [26]. In the other regions differences do not show a consistent pattern, except perhaps in region C, where however (a) rather than (d) could be the cause, that is, GPs in the HSD sample and GPs in the VALORE sample in region C actually have different quality of care. Indeed, in region C therapy indicators differ slightly between samples as well.
The bio-imaging indicator is probably underestimated by healthcare administrative databases: this might be due to out-ofpocket payment of this analysis, or to the fact that bio-imaging occurring during hospital admissions was not recorded by VALORE.
The overall similarity in measurements that was observed in this study generates in turn two observations. First, the standards of care in the sample of GPs participating to the HSD database seem to be representative of the distribution of the whole population of GPs. This was surprising, as GPs in HSD are selected because of completeness in their recordings, and good recording habits were expected to be associated with better standards of care. The second observation is that specialist physicians who assist chronic patients are likely to involve GPs in regular prescription of therapies and diagnostic tests: indeed, if GPs were unaware of such prescriptions in the share of patients who are visited by a specialist, their clinical data would detect lower standards.
Our study was performed in samples drawn from regions belonging to three macro-areas of the country. Only a study performed in all regions could rule out the possibility that major issues show up in other areas, however the evidence we observed points to the direction of greater confidence. On the other hand, we do not claim that our results support reliability of similar measurements for any chronic disease. Indeed, this is determined by how reliable the algorithm for identifying the case is, and it was shown that this depends specifically on the disease, as frequency of hospital use, specificity of drug indication and pattern of healthcare may vary [22].
In summary, the evidence we provide is promising enough to support comparison of regions with respect to indicators of compliance with standards of care for diabetes, IHD, and HF. Moreover, it supports the reliability of empirical studies, as the VALORE study [18], using such indicators to evaluate the impact of organizational innovation in primary care.

Limitations
In this study indicators were computed on a population level for a convenience samples of GPs instead of directly being compared on a patient level for the same GPs. Similarity between samples could be due to random combination of contrasting effects rather than being attributable to the factors that we discussed. Although this is unlikely to have happened consistently in five regions, an individual-level validation study only could address this concern. Italy, like several other countries, has a national legislation that permits exemption to the requirement for patient consent for projects in the public's interest [25], but this pathway was too complex to be faced in the context of the VALORE project.
It was not possible to test some of the hypotheses we generated to explain observed differences. A study involving more regions and different points in time could provide counterfactuals to test our hypotheses.

Conclusion
According to the evidence presented in this study, estimating compliance with standards of care for diabetes, ischaemic heart disease and heart failure from healthcare databases is likely to produce reliable results, even though completeness of data on diagnostic procedures should be assessed first. Performing studies comparing regions using such indicators as outcomes is a promising development with the potential to improve quality governance in the Italian healthcare system.