Spirometry, questionnaire and electronic medical record based COPD in a population survey: Comparing prevalence, level of agreement and associations with potential risk factors

Background COPD-diagnosis is confirmed by post-bronchodilator (BD) spirometry. However, epidemiological studies often rely on pre-BD spirometry, self-reports, or medical records. This population-based study aims to determine COPD-prevalence based on four different operational definitions and their level of agreement, and to compare associations between COPD-definitions and risk factors. Methods COPD-prevalence in 1,793 adults from the general Dutch population (aged 18–70 years) was assessed based on self-reported data, Electronic Medical Records (EMR), and post-BD spirometry: using the FEV1/FVC below the lower limit of normal (LLN) and GOLD fixed cut-off (FEV1/FVC <0.70). Using spirometry as a reference, sensitivity was calculated for self-reported and EMR-based COPD. Associations between COPD and known risk factors were assessed with logistic regression. Data were collected as part of the cross-sectional VGO study (Livestock Farming and Neighboring Residents’ Health Study). Results The highest prevalence was found based on spirometry (GOLD: 10.9%, LLN: 5.9%), followed by self-report (4.6%) and EMR (2.9%). Self-reported or EMR-based COPD identified less than 30% of all COPD-cases based on spirometry. The direction of association between known risk factors and COPD was similar across the four definitions, however, magnitude and significance varied. Especially indicators of allergy were more strongly associated with self-reported COPD compared to the other definitions. Conclusions COPD-prevalence varied depending on the used definition. A substantial number of subjects with spirometry-based COPD cannot be identified with questionnaires or medical records which can cause underestimation of COPD-prevalence. The influence of the different COPD-definitions on associations with known risk factors was limited.


Introduction
Chronic obstructive pulmonary disease (COPD) is a leading cause of mortality and morbidity worldwide and expected to increase in the coming decades [1]. Epidemiological studies estimating COPD prevalence show remarkable variation due to differences in measurement methodology [2]. Halbert et al. conducted a meta-analysis to quantify the global prevalence of COPD [2]. Objective definitions based on spirometry tended to produce higher prevalence estimates than patient reported diagnosis and physician diagnosis (9.2% versus 4.9% versus 5.2%, respectively). This likely reflects the underestimation and under-diagnosis of the disease prevalence [3]. COPD based on post-bronchodilator (BD) spirometry is therefore preferred in epidemiological studies and very common. Objective measurements are also preferred because they are not influenced by symptom-perception, recall-bias and access to health care [4]. However, the advantage of self-reports or medical records are the relatively low costs, allowing large sample sizes and "big data" analysis.
Studies comparing COPD-prevalence based on different data sources in the same population also found that the definitions used to assess COPD greatly influence prevalence estimates [5][6][7][8][9][10]. A study from de Marco et al. showed that the effect of risk factors for the development of COPD, such as gender, age and Body Mass Index (BMI), may also depend on the definition used [11]. However, most of these studies were conducted in patient populations [7,9,10]. In the few studies that compared COPD-definitions in the general population, only pre-BD spirometry results were available [5,6,11]. To our knowledge, this is the first population-based study that compares post-BD spirometry-based COPD with COPD-prevalence based on other data sources.
For spirometry-based COPD, the recommended cut-off for the Forced Expiratory Volume in 1 second (FEV1)/ Forced Vital Capacity (FVC) is the lower limit of normal (LLN) based on the Global Lung Initiative-2012 (GLI) reference equations that take into account sex, age, and height [12,13]. Another commonly used cut-off point for COPD is the ratio between post-BD FEV1 and FVC <0.70 (Global Initiative for Chronic Obstructive Lung Disease (GOLD)) [1]. This GOLD-definition has been criticized since the FEV1/FVC ratio generally decreases with age which results in over-diagnosis in elderly and under-diagnosis in younger people [14,15].
A comparison of different definitions for determining COPD-prevalence will give more insight into the possible effects of using various COPD-definitions on prevalence estimates and their associations with potential risk factors.
The objectives of this study are: 1) to compare COPD-prevalence and the level of agreement based on four different operational definitions: self-reported COPD, COPD based on general practitioners' (GP) Electronic Medical Records (EMR) and COPD based on post-BD spirometry: LLN and GOLD-definition, 2) to compare associations between COPD (four operational definitions) and potential risk factors and severity measures and 3) to analyze COPD-prevalence based on pre-BD spirometry and to assess whether associations with potential risk factors are different from COPD based on post-BD spirometry.

Study population
Data of the present study are derived from the cross-sectional VGO study (Dutch acronym for Livestock Farming and Neighboring Residents' Health), which aims to investigate health of residents living in the vicinity of livestock farms. In 2012, a questionnaire survey was conducted among 14,163 adults from the general population (aged 18-70 years) in the south of the Netherlands. Recruitment and inclusion criteria have been described previously by Borlée et al. [16]. Questionnaire participants who gave consent for further contact for a follow-up study, and who were not working or living on a farm were eligible for a medical survey (n = 8,714). Based on their home addresses, twelve temporary research centers were established. Between March 2014 and February 2015, all participants living within a distance of approximately 10 km of a temporary research center (n = 7,180) were invited to the nearest research center for medical examination which resulted in 2,494 participants (response 34.7%). The medical examination consisted of a second and more extended questionnaire, length and weight measurements, a lung function measurement (pre-and post-BD spirometry), collection of serum, EDTA-blood, nasal and buccal cells, and a nasal swab. In addition, fecal samples were taken by the participants at home and sent to the laboratory by mail. In this study we conducted analyses on subjects with a pre-and post-BD measurement with a sufficient quality (quality C or better), with good quality EMR available and with non-missing self-reported COPD (n = 1,793) (see Fig 1 for a flow chart of the study population).

Ethical aspects
The VGO study protocol was approved by the Medical Ethical Committee of the University Medical Centre Utrecht (protocol number 13/533). All 2,494 subjects signed informed consent. Patients' privacy was ensured by keeping medical information and address records separated at all times by using a Trusted Third Party.

Data sources and COPD-definitions
Self-reported data. Self-reported COPD was defined as a positive answer to the question: 'Have you ever been told by a doctor that you had chronic obstructive pulmonary disease or emphysema?' Questionnaire data on respiratory health was assessed with the first questionnaire collected in November 2012 as described previously [16]. This was a two-page questionnaire with questions on respiratory health adopted from the European Community Respiratory Health Survey-III (ECRHS-III) screening questionnaire [17].
Electronic Medical Records. EMR-based COPD was defined as: ICPC-code R91 (Chronic bronchitis) or R95 (Emphysema/COPD) recorded in 2010-2012. EMR data were available through the GPs who all participated in the Netherlands Institute for Health Services Research (NIVEL) Primary Care Database [18]. The practice had to meet the following EMR quality criteria: 1) record diagnostic information in the patients' EMR using the International Classification of Primary Care ICPC (4), 2) assign ICPC-codes to at least 70% of the morbidity records, and 3) record morbidity data at least 46 weeks per year. In addition, patients had to be registered at the GP for at least three-quarters of the year 2012. All subjects included in the data analysis gave written permission to link their study data to their EMR.
Spirometry. Two COPD-definitions were used based on spirometry data 1) a post-BD measurement of FEV1/FVC below the LLN, and 2) a post-BD measurement of FEV1/FVC <0.70 (GOLD). LLN was calculated with GLI-reference values based on age, gender and height [13]. Pre-and post-BD spirometry was conducted according to European Respiratory Society (ERS) guidelines and the European Community Respiratory Health Survey III (ECRHS-III) [19]. Participants stopped using inhalers and oral lung medication 4 and 8 hours prior to the lung function test, respectively. An EasyOne Spirometer (NDD Medical Technologies, Inc.) was used which measures flow and volume by ultra-sound transit time. After the pre-BD test, four puffs of short-acting beta-agonist (salbutamol, 100 μg per puff) were administered to the participant using a standard spacer. The post-BD measurement was performed at least 15 minutes after the lastly administered puff. To increase the quality of the spirometry data, we attempted to obtain four acceptable spirograms (pre-and post) per subject. The quality of all lung function curves were manually reviewed in NDD software by a specialist. The three best curves were selected or ranked manually when the best curves that were chosen by the NDD software program were not the best curves based on predefined ATS/ERS and NDD criteria [20]. In total 97.8% of the participants who conducted a lung function test had a pre-and/or post-BD measurement with a quality of C or higher (quality C: at least two reproducible curves or reproducibility within 200 ml) (N = 2,322/2,375, respectively see Fig 1).

Potential risk factors and severity measures of COPD
Patient characteristics and severity measures of COPD were collected with the second, more extended, questionnaire which subjects completed before the medical examination. The questionnaire comprised amongst others items on symptoms and diseases, smoking habits, education and profession. Body Mass Index (BMI, kg/m 2 ) was assessed during the medical examination. Atopy was defined as the presence of specific serum IgE antibodies to one or more common allergens and/or a total IgE higher than 100 IU/ml. Specific IgE to common allergens (house dust mite, grass, cat and dog) and total IgE levels were determined in serum with enzyme immunoassays as described before [21]. To gain more insight into asthma-COPD overlap, we included selfreported current asthma as a potential risk factor. Self-reported current asthma was defined as: self-reported ever asthma AND either one or more asthma-like symptoms (wheezing/whistling in the chest, chest tightness, shortness of breath at rest/following strenuous activity/at night-time or asthma attacks) or use of inhaled or oral medication for breathing problems in the last year (described before by de Marco et al. [22]. Three severity measures for COPD were computed for all participants: GOLD grades [1], self-reported health status, and the Clinical COPD Questionnaire (CCQ)-score [23]. The CCQ-score is developed to identify activity limitations and emotional dysfunction of COPD patients.

Statistical analysis
We conducted a detailed non-response analysis in order to detect possible selection bias. Characteristics of different population subsets were compared (see Fig 1 and Table 1). The likelihood of agreeing to follow-up, or being a participant was modeled for different characteristics with logistic regression and adjusted for age, gender and smoking habits. In order to study the effect of potential selection bias, we compared the association between self-reported COPD and risk factors among different populations subsets (see S1 Table.).
Agreement between the presence of COPD based on the three different data sources was determined by calculating Cohen's kappa. Using the results of the post-BD spirometry as reference standards for COPD, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for self-reported and EMR-reported COPD were calculated.
The association between each potential risk factor or severity measure with COPD was assessed by means of multiple logistic regression analysis. All analyses were adjusted for age (as a continuous variable), gender and smoking habits. To include both the qualitative effect of smoking status and the quantitative effect of smoking exposure, we included ever smoking and pack-years of smoking together as confounders [24]. Sensitivity analyses were conducted: 1) with COPD based on pre-BD measurements; 2) on subjects aged !40 years, since COPD diagnosis is more reliable in older patients [25,26]; and 3) after excluding subjects with selfreported current asthma.

Non-response analysis
Subjects who agreed to be contacted for a follow-up study were slightly older (mean age 51.1 vs. 49.8 years), were more often former smokers (38.8% vs. 31.4%) and had more often asthma (both self-reported and EMR-based asthma) compared to subjects who disagreed (Table 1). Subjects who participated in the medical examination were older (mean age 54.7 vs. 49.1 years), more often female (54.6% vs. 52.2%) and more often former smokers (44.6% vs. 35.7%) compared to subjects who were invited but did not participate. Selection bias did not seem to affect associations between potential risk factors and self-reported COPD in different population subsets (S1 Table).

Fig 2. Comparison of COPD prevalence based on four different definitions, presented in n cases and in % of total identified cases.
Legend Fig 2: In total 243 COPD cases were ascertained by at least one definition. In total, 1793 subjects who had a pre-and post-BD lung function measurement with a sufficient quality (C or better), Electronic Medical Records (EMR) of good quality, and without missing data on self-reported COPD (see Fig 1) were included. Self-report = self-reported data based on the ECRHSIII screening questionnaire, EMR = Electronic Medical Records, spirometry LLN = post-bronchodilator measurement of FEV1/FVC lower than FEV1/FVC-lower limits of normal, spirometry GOLD = postbronchodilator measurement of FEV1/FVC < 0.70. doi:10.1371/journal.pone.0171494.g002 Table 3. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of COPD based on self-reported data and based on Electronic Medical Records compared with COPD based on spirometry-LLN and GOLD-definition. Agreement between the three different data sources was determined with Cohen's Kappa.

COPD-LLN
COPD-GOLD   Table 2. As expected, since LLN is a subset of the GOLD definition, the proportion of subjects with self-reported or EMR-based COPD confirmed by spirometry-based COPD (PPV) was higher when compared with the GOLD-definition (PPV self-report: 0.50, PPV EMR: 0.71) than with the LLN-definition (PPV self-report: 0.38, PPV EMR: 0.52).

Associations between COPD-definitions and potential risk factors and severity measures
Overall, the direction of associations was consistent across all four COPD-definitions (Table 4). A low BMI (<20 vs. [20][21][22][23][24][25] and pack years of smoking were significant risk factors for each COPD-definition with comparable magnitude. However, the magnitude and significance of other associations varied between the definitions. In particular, the association of age and gender with COPD varied according to the definition used. Age was significantly positively associated with COPD, except when the LLN-definition was used. The negative association with female gender was only statistically significant when the GOLD-definition was used, whereas the EMR-based definition showed a non-significant positive association. The positive association between self-reported allergy and COPD was only significant when using self-reported COPD or EMR-based COPD. When focusing on indicators for objectively measured allergy, we found strong positive associations between self-reported COPD and all three definitions of IgE sensitization (>1 positive specific IgE, total IgE > = 100 IU/ml, and a combination of both). EMR-based COPD and COPD based on LLN-definition were only associated with total IgE > = 100 IU/ml. Current asthma was positively associated with all four definitions, nonetheless, a substantially stronger association was observed with self-reported COPD. Indicators for COPD severity were positively associated with COPD regardless of the definition used, but stronger associations were observed with self-reported and EMR-based COPD.
Sensitivity analyses of the 1626 subjects aged !40 years showed a small increase in COPDprevalence based on all four definitions (self-report: n = 81 (5.0%), EMR: n = 52 (3.2%), LLN: n = 103 (5.7%), GOLD: n = 196 (12.1%))(S2 Table). The associations between COPD and potential risk factors did not change. Analyses without patients with current asthma showed a lower prevalence of self-reported COPD (n = 52 (3.0%) vs. n = 82 (4.6%)), prevalence based on the other definitions did not show major changes (S3 Table). A stronger association was observed between self-reported COPD and age and a low BMI. The association between selfreported COPD and self-reported allergy and indicators for objectively measured allergy became weaker.

Pre-versus post bronchodilator spirometry
COPD-prevalence increased when using pre-BD measurements (LLN pre: 9.1% vs. post: 5.9%, GOLD pre: 16.4% vs. post: 10.9%) (see Table 5). In general, similar associations with risk factors were identified by using pre-or post-BD spirometry, although associations were stronger and more often significant when COPD was based on post-measurements.

Discussion
In a general population sample of adults aged 20-72 years from the Netherlands, we found that COPD-prevalence varied depending on the used definition (2.9-10.9%). The overlap between COPD-prevalence based on the four different operational definitions was low. Selfreported or EMR-based COPD identified less than 30% of all subjects with spirometry-based COPD, but specificity was high. Despite the variation in prevalence estimates, low overlap and low sensitivity, the direction of associations between potential risk factors and all four operational definitions of COPD were more or less similar, although the magnitude and statistical significance of the associations varied between the definitions. The combination of a relatively low prevalence and high specificity of self-reported and EMR-based COPD compared to both LLN and GOLD as a reference explains the minor changes in the associations between risk factors with the different COPD-definitions [28]. A high specificity causes a relatively low number of 'false positive' COPD cases in the 'true positive' COPD group. COPD-prevalence was substantially higher based on pre-instead of post-BD measurements. We found similar associations with risk factors when using pre-or post-BD spirometry, but the associations with risk factors were stronger and more often significant when COPD was based on post-BD measurements.
Other studies also confirmed underestimation of COPD in the general population when using self-reported or EMR-data [2,3,5,7,10]. Pulmonary specialists may argue that COPD only based on spirometry is an overestimation since for a clinical COPD diagnosis also other indicators are needed like respiratory symptoms, family history of COPD, or history of exposure to risk factors [1]. We want to emphasize that this study aims to assess COPD for epidemiological usage and not for clinical case finding. Therefore COPD based on only lung Table 5. Associations between spirometry-based COPD and potential risk factors and severity measures. COPD is defined on pre-and post-measurements and on LLN-definition and GOLD-definition. OR and 95% CI were adjusted for age, gender, ever smoking and pack years (number of pack years was mean-centered for ex-and current smokers). Bold type indicates statistical significance (p <0.05). Spirometry: pre and post-bronchodilator lung function measurement. Used definitions for COPD based on different databases are presented in Table 2. *Mean packyears are calculated for ex-smokers and current smokers. † Clinical COPD Questionnaire (CCQ)-score [23]. ‡Less than good self-reported health: bad/moderate/reasonable, reference category: good/excellent self-reported health.

Spirometry LLN Spirometry GOLD
doi:10.1371/journal.pone.0171494.t005 Comparing four different operational definitions for COPD in a population-based study function criteria is justifiable. Therefore COPD based on only lung function criteria is justifiable. On the other hand there are also arguments for underestimation of COPD prevalence based on spirometry since the likelihood of producing a reproducible spirometric measurement decreases with disease severity. We excluded non-reproducible tests and therefore it is likely to selectively exclude a higher proportion of subjects with airflow obstruction [2]. Furthermore, COPD is a slowly progressive disease and symptoms slowly worsen over time [1]. People adapt to these slowly developing respiratory problems and might be unaware of their condition and may not visit a GP. This could explain the low sensitivity of self-reported or EMR-based COPD. Furthermore, self-reported COPD or a diagnosis of COPD in EMR will also depend on the severity of the disease which is highly associated with care seeking behavior [29]. The CCQ-score and self-reported health-indicators of the severity of the disease-were both more strongly associated with self-reported-and EMR-based COPD compared to spirometry-based COPD. Decline in lung function occurs faster in earlier stages of the disease. Therefore, early diagnosis may slow disease progression by physical activity and prevention of exposure to smoke and other noxious agents. In addition, pharmacological intervention may control symptoms and improve quality of life [30,31].
A follow-up study by de Marco et al. [11] in young adults (ECRHS-II study, n = 4,636, 20-44 years old at the time of inclusion) studied risk factors of new-onset COPD and compared associations between risk factors and several pre-BD spirometric COPD-definitions. The association with LLN-based COPD incidence and gender, age, and being underweight lost their statistical significance compared to GOLD-based COPD incidence. We found similar associations with age, gender and underweight and these associations were also stronger with pre-BD GOLD-based COPD compared with LLN. However, being underweight was stronger associated with LLN-based COPD than GOLD when using post-BD measurements.
In our study, most associations between risk factors and different COPD-definitions had a similar magnitude and overlapping confidence intervals, except for the associations with allergy indicators. We found strong positive associations between self-reported COPD and indicators for allergy. Allergy is associated with asthma, and the association between COPD and current asthma was more prominent for self-reported COPD than for the other COPDdefinitions. The associations between self-reported COPD and allergy indicators became weaker when subjects with current asthma were excluded, this indicates that some misclassification in self-reported COPD may be present due to overlap with asthma. Except for allergy indicators, this study overall indicates that for epidemiological studies with the aim to evaluate risk factors for COPD, the influence of the used definition seems to be limited. However, we focused only on risk factors that are known to be associated with COPD, and we can only speculate that the influence of different COPD-definitions on associations with unknown risk factors is limited.
Our population-based study is unique in the simultaneous use of three different data sources to assess COPD: post-BD spirometry, GP registrations, and ECRHS questionnaire items. We applied stringent quality standards to both spirometry and EMR data. In most population-based epidemiological studies based on spirometry, only pre-BD lung function measurements are available. It is not unsurprising that the prevalence estimates were higher when COPD was based on pre-BD spirometry. By using post-BD spirometry we studied fixed airway obstruction, which will reduce the number of false-positives due to overlap with asthma [32,33]. Nevertheless, the influence of using pre-BD instead of post-BD definitions on associations with potential risk factors, including current asthma, was limited. As expected, associations were somewhat stronger and more often significant when COPD was based on post-measurements, since a reduction in the number of false-positives will reduce measurement error and consequently will strengthen risk factor associations. To the best of our knowledge, this was not studied before in a population-based study. Another strength of our study was the extensive non-response analysis from the source population up to the current study population. We previously compared characteristics of non-responders and responders of the questionnaire survey (source-population) [16]. This study continued the non-response analysis by comparing characteristics of responders and non-responders in different stages of the data collection. Participants of the medical examination were older, more often female and more often former smokers compared to subjects who were invited but did not participate. Both in the previous analysis [16] and in the present study we were able to demonstrate that selection bias did not affect the associations under study, e.g. the association between self-reported COPD and potential risk factors (see S1 Table).
The three different data sources were not collected at the same time, which is a limitation of our study. Questionnaire data on COPD were collected in November 2012, EMR from 2010-2012 were used, and spirometry was conducted between March 2014 and February 2015. However, it is unlikely that the lack of overlap is to a large degree explained by COPD development during the relatively short data collection period. Previous studies that did collect selfreported data and spirometry data synchronically, also found a large degree of non-overlap [5,6].
General population studies are often conducted in urban populations. Our study population lived in a rural area outside the larger cities and farmers were excluded. The prevalence of GP-diagnosed COPD in the study area did not differ from the prevalence in other Dutch rural areas without livestock farming (42.6 vs. 47.1 prevalence per 1000 for patients aged >40 years, average over 2007-2013) [34], and we have no reason to expect that agreement between different COPD-definitions would be different in other areas.

Conclusions
The operational definition used for COPD greatly influences prevalence estimates. Selfreported or EMR-based COPD identified less than 30% of all subjects with COPD based on persistent airflow limitation, which implies that a substantial number of subjects with COPD cannot be identified by questionnaires or medical records. However, the effect of the different COPD-definitions on associations with potential risk factors was limited, except for indicators of allergy, which were more strongly associated with self-reported COPD compared to the other definitions. In addition, the use of pre-BD spirometry instead of post-BD spirometry resulted in higher prevalence estimates, but had a minimal effect on associations with potential risk factors. Researchers using these operational definitions to group individuals according to COPD status, need to be aware of the impact of such choices. Results of this study may be informative for population-based epidemiological studies with the aim to evaluate potential risk factors for COPD.
Supporting information S1 Table. Association between self-reported COPD and several characteristics compared between different population subsets to study potential selection bias. Legend Table S1. OR and 95% CI were adjusted for age, gender, and ever smoking. Bold type indicates statistical significance (p <0.05). Self-reported COPD was defined as a positive answer to the question: 'Have you ever been told by a doctor that you had chronic obstructive pulmonary disease or emphysema?'. The sub-populations are represented in Fig 1 (main article). (DOCX) S2 Table. Associations between risk factors and severity measures with four different definitions of COPD, only subjects older than 39 years of age are included. Legend Table S2.
Data are presented as mean ±SD or %, unless otherwise stated. OR and 95% CI were adjusted for age, gender, ever smoking and pack years (number of pack years was mean-centered for ex-and current smokers). Bold type indicates statistical significance (p <0.05). Self-reported: self-reported data based on the ECRHSIII screening questionnaire, EMR: Electronic Medical Records, spirometry: post-bronchodilator lung function measurement. Used definitions for COPD based on different databases are presented in Table 2. Ã Mean packyears are calculated for ex-smokers and current smokers. † GOLD 1: FEV1/FVC<0.70 and FEV1! 80% predicted, GOLD 2-4: FEV1/FVC <0.7 and FEV1 <80% predicted ‡ Clinical COPD Questionnaire (CCQ)-score (van der Molen et al. 'Development, validity and responsiveness of the Clinical COPD Questionnaire.' Health Qual Life Outcomes 2003;1:13.) NA: Not available, as very few (LLN) or no (GOLD) subjects with spirometry-based COPD had FEV1/FVC > 0.7. §Less than good self-reported health: bad/moderate/reasonable, reference category: good/excellent selfreported health (DOCX) S3 Table. Associations between risk factors and severity measures with four different definitions of COPD, subjects with current asthma are excluded. Legend S3 Table. Data are presented as mean ±SD or %, unless otherwise stated. OR and 95% CI were adjusted for age, gender, ever smoking and pack years (number of pack years was mean-centered for ex-and current smokers). Bold type indicates statistical significance (p <0.05). Self-reported: selfreported data based on the ECRHSIII screening questionnaire, EMR: Electronic Medical Records, spirometry: post-bronchodilator lung function measurement. Used definitions for COPD based on different databases are presented in Table 2.