Epidemiology of Idiopathic Pulmonary Fibrosis in Northern Italy

Background Idiopathic pulmonary fibrosis (IPF) is the most common and severe form of idiopathic interstitial pneumonia. Despite its clinical relevance, few studies have examined the epidemiology of IPF and temporal variation in disease incidence and prevalence. Aim of the study was to investigate the prevalence, incidence and trends of IPF in Lombardy, a region with nearly 10 million inhabitants, during 2005–2010. Methods For the identification of IPF patients, we used healthcare administrative databases of Lombardy Healthcare System and adopted three algorithms: generic, broad and narrow case definition (GCD, BCD, NCD). IPF cases were identified according to diagnoses reported in inpatient and outpatient claims occurred during 2000–2010. We estimated age- and sex-adjusted annual prevalence and incidence rates from 2005 to 2010, thus allowing for a 5-year washout period. Results The mean annual incidence rate was estimated at 2.3 and 5.3 per 100,000 person-years using NCD and GCD, respectively. IPF incidence was higher among males, and increased with age. Trend remained stable over the years. The estimated annual prevalence rate was 35.5, 22.4, and 12.6 per 100,000 person-years using GCD, BCD and NCD, respectively, and increased with age. Moreover, we observed a positive trend over the years. Using BCD and NCD, prevalence was higher among males. Conclusions The results of this study, which is one of the largest population-based survey ever conducted according to strict criteria, indicated that prevalence of IPF increased across the years while incidence remained stable, thus suggesting that survival with IPF has improved.


Introduction
Idiopathic pulmonary fibrosis (IPF) is the most common and severe form of idiopathic interstitial pneumonia [1]. It is a progressive disease with a variable clinical course and is associated with extremely poor prognosis [2][3][4]. IPF is characterized by a short survival from diagnosis (3-5 years) and a high fatality rate [5]. Respiratory failure is the most common cause of death; however, other causes of death, such heart failure, bronchogenic carcinoma, ischemic heart disease, infection and pulmonary embolism, are frequently concurrent [6,7]. The disease pathogenesis is not fully understood; however, abnormal wound healing processes and molecular alterations involved in aging and inflammatory related processes appear to be involved [8,9].
One of the major challenges in epidemiologic studies of such a rare disease has been the difficulty of recruiting a sufficient number of patients [10]. Few large-scale studies have specifically focused on epidemiological investigations [9,[11][12][13][14][15][16][17]. and the true incidence and prevalence of IPF are not well established. Difficulties encountered in establishing the epidemiology of IPF may also be due to the lack of uniform definition of IPF in older studies and differences in diagnostic criteria, study population and design [18,19]. Thus, incidence and prevalence of IPF vary across the studies. Recent studies indicated that the incidence and mortality of IPF appear to be on the rise [12,18,20,21]; however, data on incidence trends are not always in agreement [22].
Epidemiology of IPF in Italy has been poorly investigated. In two studies, the prevalence of IPF has been evaluated analysing data collected in the Italian Registry of Diffuse Infiltrative Pulmonary Diseases, a multicentre prospective registry established in 1998 [23,24]. However, only the percentage of IPF diagnosis among patients affected by interstitial lung disease (ILD) was evaluated. Moreover, this registry was created using data obtained from those centres that voluntarily accepted to adhere to the project and, therefore, from those physicians who spontaneously provided information on number and clinical characteristics of patients with ILDs. It is likely that the use of this registry might have resulted in an underestimation of IPF cases.
In a recent study carried out in the Lazio region (which has about 6 million inhabitants), the epidemiology of IPF was investigated using hospital admissions and mortality databases of the regional health system [13]. The annual incidence and prevalence in this large Central-Southern Italian region were estimated at 7.5 and 25.6 per 100,000 person-years, respectively; however Italian trends in incidence and prevalence of IPF are unknown.
The aim of this study was to evaluate the epidemiology of IPF in Lombardy-the most populous region of Italy-between 2005 and 2010, by analysing the health care administrative databases.

Materials and Methods
To investigate the epidemiological characteristics of IPF in Lombardy, a region with nearly 10 million inhabitants (accounting for 16.3% of the Italian population), we performed an observational retrospective analysis of healthcare administrative databases of the Health Regional System.
Since the Italian Health Care System provides universal coverage, each region is responsible for collecting and organizing data on health care services provided to patients; moreover, each region is charged with storing all such information in databases for administrative purposes. In order to facilitate the use of these databases for scientific purposes, since 2000 in Lombardy all data have been inserted in a new data warehouse, named DENALI. DENALI is endowed with a probabilistic record linkage capable of ameliorating the match of anonymized data of different datasets belonging to the same individual [25][26][27].
For each resident in Lombardy, DENALI contains the following data: demographic characteristics and vital status; hospitalization data, including up to six discharge diagnoses encoded according to the International Classification of Diseases, ninth Revision, Clinical Modification, 2002 edition (ICD-9-CM) and up to six procedures; description of each outpatient visit, with the indication of diagnoses (encoded according to ICD-9-CM) or procedures performed during that visit.
For our analysis, we first applied a case definition that we called "generic case definition" (GCD) and that identified as IPF cases all individuals with at least one hospitalization with diagnosis of IPF or at least one outpatient visit with diagnosis of IPF (ICD-9-CM code 516.3) during the period from 1st January 2000 to 31st December 2010. For the identification of cases as incident or prevalent through claim analysis, the date of the first event (index event) of each patient was used as a proxy for the timing of disease onset. In addition to GDC, we used the two case definitions suggested by Raghu et al. [28], namely the "broad case definition" (BCD) and the "narrow case definition" (NCD), with some adjustments in order to adapt them to our context. In detail, for each subject satisfying GCD we extracted from DENALI all claims occurred before or after the index event. We then applied the BCD: we defined IPF cases those patients that satisfied the GCD and had no claims (inpatient or outpatient) with a diagnosis code for any other type of ILDs (Table 1) on or after the date of the last claim with IPF diagnosis. Finally, we applied the NCD: we defined IPF cases those patients that satisfied the BCD and had one or more claim with a procedure code for surgical lung biopsy, transbronchial lung biopsy or computed tomography of the thorax (Table 1), on or before the date of the last claim with a diagnosis code for IPF.
Since DENALI doesn't include information about health services provided prior to 2000, and the median survival of patients is 3-5 years after the onset of IPF [1,6], the use of DENALI ■ at least one surgical lung biopsy (ICD-9-CM codes 33.28), transbronchial lung biopsy (ICD-9-CM codes 33.27) or computed tomography of the thorax (ICD-9-CM codes 87.41) performed during an hospitalization or outpatient visit, on or before date of last IPF diagnosis (ICD-9-CM code 516. 3) data from 2000 to 2004 might result in an underestimation of the number of prevalent cases and, at the same time, an overestimation of the number of incident cases until 2004. To ensure that the index event is a good proxy for the onset of IPF, information about health system accesses should be documented for at least five years before the onset itself. For these reasons, we only analysed the period 2005-2010, and incident IPF patients were identified among patients who had been covered by the Lombardy health care system for at least 5 years before the index event (washout period).
Using the three case definitions (GCD, BCD, NCD), we assessed the overall prevalence and incidence rates of IPF as well as prevalence and incidence rates stratified by gender and age (<55, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, and 85 years). Finally, we evaluated temporal trends by computing yearly prevalence and incidence rates standardized by age and gender, using the population living in Lombardy at January 1st 2010 as reference [29].
We assumed a Poisson distribution of the rates and computed 95% confidence interval (95% CI) based on Normal approximation [30]. In order to test for age-related and temporal linear trends, for each case definition we computed the count of expected incident (or prevalent) cases in the reference population stratified by gender, age classes and calendar year, and then we used these counts as dependent variable in a Poisson regression model with the following structure: where Y i is the expected count in the i-th stratum; N i is person-years at risk in the i-th stratum; I Males,i is an indicator variable that assumes value 1 when the gender is male in the i-th stratum; Age i is the age class of the i-th stratum and Year i is the calendar year of the i-th stratum. A significant Wald's test for β 2 and β 3 indicated respectively an age-related and a temporal linear trend in the dependent variable. For all statistical tests, a pre-specified two-sided α of 0.05 was regarded as significant.
The analyses were performed using SAS software, version 9.2 (SAS Institute, Cary, NC, USA) and R, version 3.1.1 (R Project for Statistical Computing, http://www.R-project.org).

Study population
Using the DENALI data warehouse, we found that a total of 11,558 hospital admissions with diagnosis of IPF and 5,117 outpatient visits with an IPF diagnosis occurred over the years 2000-2010. Limiting the analysis to 2005-2010, the number of subjects with IPF identified through the hospital discharge diagnoses alone were 4,872 using the GCD, 3,255 using the BCD, and were 2,094 using the NCD; however, since we also searched for IPF diagnosis in outpatient visits, sample sizes raised, and a total of 5,441 (+11.7%), 3,573 (+9.8%), and 2,097 (+0.1%) IPF patients were identified using the GCD, BCD and NCD, respectively (data not shown).

Prevalence
The number of IPF cases identifies as prevalent in Lombardy during 2005-2010 is reported in Table 2: the results are illustrated according to the three IPF case definitions, and stratified by gender and age groups. Based on the GCD, BCD and NCD, the number of prevalent cases of IPF was 5,441 (53.8% of whom were male), 3,573 (male: 57.2%) and 2,097 (male: 56.9%), respectively. About 70% of patients were aged 65 and older, regardless of the case definition.
Finally, the average annual prevalence rate (per 100,000 person-years) as estimated according to the NCD was 12.55 (95% CI: 12.26-12.84). As observed using the BCD, prevalence was higher in males (13.23; 95% CI: 12.82-13.65) than in females (11.84; 95% CI: 11.43-12.25); moreover, the age-stratified analyses showed a pattern that was similar to that obtained using the GCD and BCD.
The trends observed in age classes were confirmed by the Poisson regression models (Table 4) for all case definitions: the estimated β for age class was always significantly positive.
The analysis of the temporal trend revealed that the annual standardized prevalence rates increased from 2005 to 2008, and seemed to stabilize thereafter, regardless of the case definition (Fig 1). However, the stabilization was more evident when the analysis was performed using the GCD. The Poisson model reported in Table 4, detected an overall growing trend related to calendar year.

Incidence
From 2005 to 2010, a total of 2,951, 2,093 and 1,309 new cases of IPF were registered in Lombardy using the GCD, BCD and NCD, respectively. Results are illustrated in Table 2. About 60% of new cases occurred in males, regardless of the case definition. The majority of incident cases (63% of the total) occurred in subjects aged 65 to 84 years. Age distribution of incident cases did not significantly differ among the three case definitions. Using the GCD, the estimated annual incidence rate (per 100,000 person-years) was 5.25 (95% CI: 5.06-5.44), and was significantly higher among males (6.18; 95% CI: 5.88-6.48) than females (4.37; 95% CI: 4.13-4.61) ( Table 5).
Similarly to prevalence rate, the incidence rate increased with increasing age, with the lowest value being in the youngest age group: the rate (per 100,000 persons-years) rose from 0.92 among people aged less than 55 years (CI%: 0.82-1.01) to 25.59 among people aged 80-84 Using the BCD, the overall annual incidence rate (100,000 person-years) was 3.74 (95% CI: 3.58-3.90). The rate was significantly higher among males (4.63; 95% CI: 4.37-4.89). The agestratified analysis showed a pattern that was similar to that observed using the GCD, although the point estimates were lower: the lowest incidence rate, estimated for people aged less than 55 years, was 0.62 (95% CI: 0.54-0.69), and the highest rate (which was estimated for people aged 80 to 84 years) was 19.40 (95% CI: 17.26-21.55).
For all case definitions, Poisson regression models (Table 6) confirmed the significance of the trends observed for age classes: the estimated β for age class was always significantly positive.
The analysis of the temporal trend during the period 2005-2010 revealed that, using the GCD, the estimated annual incidence rates slightly decreased, with a more consistent decline over the last two years (Fig 2). However, the analysis of temporal trend using the BCD and NCD revealed that the annual incidence rates were stable during the study period, with the only exception of the year 2009, when a sudden decrease was observed. The overall Poisson regression model detected a significant negative trend for all case definition (Table 6). However, such trend was not confirmed for BCD and NCD when the model excluded 2009 data.

Discussion
In our study we used healthcare administrative databases to evaluate epidemiology of IPF in Lombardy, which, with nearly 10 million inhabitants in 2013, is the most populous region of Italy. As highlighted in a recent review [19], the analysis of health care databases is one of the most common methodologies used to identify cases of IPF. This approach provides information from a large population without the expenditure required by the creation of a national registry; moreover, it is critical for accruing a sufficient sample size for epidemiological studies for rare diseases such IPF. In our study, the estimated mean annual incidence rate of IPF varied between 2.3 and 5.3 per 100,000 person-years, and the estimated prevalence rate varied between 12.6 and 35.5 per 100,000 person-years, depending on the case definition used to identify IPF patients.
The comparison between our results and findings from other studies is hampered by the variety of IPF case definitions and by differences in study population, time period of analysis, and geographic locations. In the US, two different research groups studied the incidence and prevalence of IPF by analyzing medical claims databases [28,31]; however, the broad and narrow definitions that were used in those studies included slightly different criteria. Raghu et al. [28] estimated the incidence rate to be 6.8 and 16.3 per 100,000 person-years in the year 2000 using their narrow and broad definition, respectively; in the study of Fernandez-Perez et al. [31], which focused on the epidemiology of IPF during the period 1997-2005, the estimated incidence rate was 8.8 per 100,000 person-years based on their narrow definition, and was 17.4 per 100,000 person-years based on their broad definition. The overall prevalence of IPF estimated by Raghu et al. [28] was 14.0 and 42.7 per 100,000 person-years using their narrow and broad definition, respectively, while the prevalence estimated by Fernandez-Perez et al. [31] was 27.9 and 63.0 per 100,000 person-years by their narrow and broad definition criteria, respectively. It should be noted that only residents aged 50 years and older were evaluated in the Fernandez-Perez et al. study [31].
In Japan, only one study has been conducted to investigate the epidemiology of IPF [17], and the results suggested that the estimated prevalence and incidence in 2008 appear to be much lower (10.0 and 2.2 per 100,000 person-years, respectively) than those observed in the US population. In general, IPF estimates of prevalence and incidence rates appear to be lower in Europe relative to those reported in the US population [18,20,22]. In particular, studies that investigated the IPF epidemiology during the first years of the 21st century-which, therefore, are comparable to our study-found that the estimated mean annual incidence rate (per 100,000 personyears) ranged from 0.9 in Greece [32] to 7.4 in UK [33]; the estimated prevalence rate was only evaluated in Greece, and was 3.4 [32]. Recently, the study of Agabiti et al. [13] focused on epidemiology of IPF in the adult population of a region of Central-Southern Italy using hospital admission records: findings indicated that the estimated annual incidence rate over the period 2005-2009 varied between 7.5 and 9.3 (per 100,000 person-years), and that the estimated IPF prevalence in 2009 varied between 25.6 and 31.6 (per 100,000 person-years), depending on the case definition. These results show a higher incidence and a lower prevalence rate than ours. These differences might be due to different demographic characteristics of the analysed populations, or to environmental factors, for example Lazio and Lombardy differ for climate as well as concentration and sources of environmental pollution, or they might be even due to differences in the patient management, as each region of Italy has its own Healthcare System that is to some extent autonomous. However, we can't rule out that, they might also originate from differences in study design: first, our estimates refer to the whole population, while those from Agabiti are related to people aged 18 or older; second, the criteria used in the study of Agabiti et al. were similar to the GCD criteria used in our study, but did not include outpatient claims; finally, different washout periods were used to identify IPF cases as incident or prevalent.
Our findings suggested that prevalence and incidence of IPF in Lombardy might be similar to those estimated in other European countries, and thus lower compared to those observed in the United States; however, the use of different IPF case definitions and differences in subject selection criteria might invalidate a direct comparison of findings among countries [18,19].
Our results are consistent with several previous surveys finding that incidence and prevalence of IPF are higher among men, and increase with increasing age [11,17,28,31,[33][34][35]. Interestingly, we observed that both prevalence and incidence of IPF significantly decreased in the oldest age group (aged 85 years and older). A similar pattern of incidence and prevalence rates was observed in the study conducted in UK by Gribbin et al. [35], and might be related to clinical complexities inherent in elderly patients. In our study, we not only assessed the prevalence and incidence of IPF but also provided information about temporal trend. To our knowledge, only four studies have focused on this aspect [11,31,33,35]. Two of these studies investigated temporal trends before 2005 [31,35]. The other two studies indicated that incidence rates remained stable in the UK [33] and the US [11] since 2005 and that IPF prevalence in the US has increased annually in recent years [10]. Results from our study are in line with such findings: the analysis of temporal trend revealed that the annual incidence rates estimated using the BCD and NCD were stable in Lombardy during the study period, while IPF prevalence (per 100,000 person-years) increased from 19.0 in 2005 to 24.7 in 2010 using BCD, and from 10.2 in 2005 to 13.9 in 2010 using NCD (Fig 1). This increase might be attributable to a gain in survival of patients developing IPF during the last years as a result of recent advances in the diagnosis and management of the disease and in particular of comorbidities [11]. However, it should be noted that we observed a significant drop in incidence rate during 2009. Such anomaly might be the consequence of a change in legislation establishing administrative and management protocols in Lombardy: indeed, in this period it was established that some health care services that were previously provided in dayhospitals would be provided in outpatient settings. The adaptation to the new legislation might have resulted in a temporary reduction in the number of hospitalizations in 2009, which, however, ended before the following year. Such adaptation problems may also account for the results on prevalence trend: the increase in prevalence observed from 2005 to 2007 flattened between 2008 and 2009; afterwards, prevalence rate continued to increase. In order to assess the influence of the drop on the overall mean annual rates, we computed rates excluding the year 2009 and we obtained slightly higher incidence rates: 5 Tables 3 and 5 are not significant because the confidence intervals always overlap. The same conclusion applies to age or gender stratified rates (results not shown). Therefore, we can conclude that including 2009 in the analysis does not lead to bias in the estimated rates.
Our study was carried out using health care administrative databases and, therefore, it has some limitations, as already suggested by Raghu et al [11,28]. First, patients with IPF were identified based on their access to health care services: as a matter of fact, it was impossible to determine the exact timing of the onset of IPF and, therefore, the date of the first encounter with the inpatient or outpatient healthcare system facility coinciding with the first recorded medical diagnostic claim of IPF was used as a proxy for the timing of disease onset. Second, the accuracy of the reported diagnoses in the hospital admissions databases was unknown as we could not integrate in the study a clinical review of each medical chart or of a representative sample of them. In the study of Agabiti et al. [13], it was found that a number of revised hospital charts carrying the ICD9-CM 515 code (post inflammatory pulmonary fibrosis), 516.8 code (other specified alveolar and parietoalveolar pneumonopathies) and 516.9 code (unspecified alveolar and parietoalveolar pneumonopathies) were eventually redefined as "IPF confident" cases. Therefore, some extent of misdiagnosis or misreporting of IPF cases might occur, thus resulting in an underestimation of IPF cases [13]. Moreover, the sensitivity of the ICD-9-CM code 516.3 in identifying IPF cases might be low [13], as this code includes some extremely rare conditions such as the alveolar capillary block and Hamman-Rich syndrome [11]; furthermore, some other ILDs might be wrongly coded as 516.3 without appropriate health examinations. We partially overcame this latter limitation by recognizing patients using the broad and narrow case definitions, which consider patients' medical history before and after the presumed IPF onset and, therefore, exclude those patients for whom such types of miscoding might have occurred. Nevertheless, as we observed that patients who were likely to be misclassified as having IPF received a diagnosis of ILD other than IPF within two years after the presumed onset, a certain extent of misclassification might have persisted in 2009 and 2010.
From a methodological point of view, we based our case definitions on the most rigorous available criteria that were suitable for the analysis of the administrative data [11,28]; in addition, we included outpatient claims. Compared to the study of Agabiti et al. [13], this approach had the advantage of allowing identification of patients who had never been admitted to hospital because of IPF (and accounted for about 10% of IPF cases, as observed using the general and broad case definitions). Furthermore, the identification of IPF cases as prevalent or incident was based on criteria that were much stricter than those used in the above-mentioned studies [11,28]: indeed, we used a longer washout period (5-years) in order to minimize the probability of misclassification of prevalent as incident IPF case.
Despite the above-mentioned limitations, we believe that health care administrative databases are useful tools to investigate the epidemiology of a rare disease like IPF. This approach allowed us to select a sample of population that was larger than those evaluated in most of the published studies. Indeed, this was one of the biggest sample ever considered, together with studies conducted in UK [33,35]: to date, the largest sample in US was composed of roughly 3.7 million inhabitants [11,28], which accounted for about 37% of the population selected for our study. Besides, our study investigated for the first time trends in IPF incidence and prevalence in an Italian region.
Moreover, thanks to the universal health care coverage and to the DENALI data warehouse, which traces a complete medical history of each resident in Lombardy by merging data of different datasets belonging to the same individual, we could investigate the prevalence and incidence of IPF in an unselected population, without restrictions related to age [11,13,31], adherence to some health plan (e.g. Medicare) [11,28], or voluntary recruitment [17,32].
In conclusion, our results on IPF prevalence and incidence are in line with those reported in other epidemiological studies conducted in Italy and Europe, and incidence and prevalence trends are in agreement with those reported in European and American studies. The convergence of findings from different epidemiological studies of IPF-some of which were also performed using administrative databases-is a very important achievement, considering all the challenges that are faced in investigating epidemiology of rare diseases (e.g. diagnostic issues and difficulties of study design). Future studies are warranted to validate the accuracy of the diagnostic codes used to identify IPF cases in large databases, as well as to investigate different target populations, living in areas with different geographical, social and environmental characteristics.