Instability in the COPD Diagnosis upon Repeat Testing Vary with the Definition of COPD

Background A low FEV1/FVC from post-bronchodilator spirometry is required to diagnose COPD. Both the FEV1 and the FVC can vary over time; therefore, individuals can be given a diagnosis of mild COPD at one visit, but have normal spirometry during the next appointment, even without an intervention. Methods We analyzed two population-based surveys of adults with spirometry carried out for the same individuals 5-9 years after their baseline examination. We determined the factors associated with a change in the spirometry interpretation from one exam to the next utilizing different criteria commonly used to diagnose COPD. Results The rate of an inconsistent diagnosis of mild COPD was 11.7% using FEV1/FVC <0.70, 5.9% using FEV1/FEV6 <the lower limit of the normal range, LLN and 4.1% using the GOLD stage 2-4 criterion. The most important factor associated with diagnostic inconsistency was the closeness of the ratio to the LLN during the first examination. Inconsistency decreased with a lower FEV1. Conclusions Using FEV1/FEV6 <LLN or GOLD stage 2-4 as the criterion for airflow obstruction reduces inconsistencies in the diagnosis of mild COPD. Further improvement could be obtained by defining a borderline zone around the LLN (e.g. plus or minus 0.6 SD), or repeating the test in patients with borderline results.


Methods
We analyzed two population-based surveys of adults with spirometry carried out for the same individuals 5-9 years after their baseline examination. We determined the factors associated with a change in the spirometry interpretation from one exam to the next utilizing different criteria commonly used to diagnose COPD.

Results
The rate of an inconsistent diagnosis of mild COPD was 11.7% using FEV 1 /FVC <0.70, 5.9% using FEV 1 /FEV 6 <the lower limit of the normal range, LLN and 4.1% using the GOLD stage 2-4 criterion. The most important factor associated with diagnostic inconsistency was the closeness of the ratio to the LLN during the first examination. Inconsistency decreased with a lower FEV 1 .

Conclusions
Using FEV 1 /FEV 6 <LLN or GOLD stage 2-4 as the criterion for airflow obstruction reduces inconsistencies in the diagnosis of mild COPD. Further improvement could be obtained by

Introduction
Population-based prevalence of poorly reversible airflow obstruction (COPD) has been recently estimated for a variety of countries in Latin America [1] and other continents [2] with standardized methods including post-bronchodilator spirometry. In those and other surveys, it has been clear that the several criteria to define airflow obstruction would produce important differences in COPD prevalence, which would be lower with criteria based on FEV 1 /FVC or FEV 1 /FEV 6 below the 5th percentile (lower limit of normal, LLN), than with the traditional Global Initiative for Obstructive Lung Disease (GOLD) definition (FEV 1 /FVC<0.7). The majority of the emphasis has been placed on cross-sectional exams and on the impact of COPD criteria on population prevalence, but consistency of diagnosis is also very important for these persons because false positives label them as having a potentially severe disease and false negatives could lead to denial of useful treatments. Additional complications would derive from inconsistency of diagnosis after serial spirometric tests due to known and expected variations in spirometric results after repeated testing that may change the diagnosis along time [3] adding to systematic changes due to aging, and to worsening or improvements in airflow obstruction. In a population-based survey, COPD prevalence based on FEV 1 /FEV 6 <LLN showed more consistency than criteria based on FEV 1 /FVC as the former has fixed times to determine the numerator and denominator of the ratio, whereas FEV 1 /FVC only fixes the time to measure the numerator and variations in the expiratory time change the airflow obstruction prevalence [4].
We hypothesized that there would be instability in the designation of abnormal spirometric results depending on the COPD definition used. Therefore, we conducted this study to determine the frequency of change in the diagnosis in the same subjects studied in two populationbased surveys, as well as the correlates for these inconsistencies.

Methods
The detailed methods of the PLATINO baseline [5] and follow-up studies [6]  Spirometry was undertaken in individuals who did not present any exclusion criteria (99% of the sample) using an ultrasonic spirometer (EasyOne; ndd Medical Technologies, Zurich, Switzerland). Spirometry was performed before (pre-BD) and 15 minutes after the administration of 200 μg of Salbutamol (post-BD) according to the American Thoracic Society (ATS) criteria of acceptability and reproducibility [7]. The quality control exercises showed that >90% of the tests fulfilled ATS quality criteria [5].
Follow-up studies were conducted in Montevideo, Santiago, and Sao Paulo 5, 6, and 9 years after the baseline surveys, respectively [6]. Only individuals with valid spirometric data at baseline were eligible for a follow-up exam. Individuals were visited at their homes based on the contact information provided by these during the baseline exam. For the purposes of the present study, only individuals with valid postbronchodilator spirometric tests during both surveys were analyzed.
Data analyses contained the description of the sample characteristics and the calculation of the intraclass correlation coefficient (ICC Rho) between the spirometric parameters (as a continuous variable) in the two tests and the concordance in the diagnosis of COPD (as dichotomic data using the ICC Rho and the Kappa coefficient) with the following criteria: a) LLN [8][9][10]-defined as the lower 5 th percentile for predicted post-BD FEV 1 /FVC based on equations derived from the baseline study in a sub-set of healthy and never smoking subjects [11]; b) a ratio of the post-BD FEV 1 over FVC < 0.70 according to the Global Initiative for Obstructive Lung Disease (GOLD) [12,13]; c) FEV 1 /FVC 6 <LLN defined as the lower 5 th percentile for predicted post-BD FEV 1 /FEV 6 based on PLATINO reference equations [11]; d) GOLD stages 2-4 defined as FEV 1 /FVC<0.7 & FEV 1 <80% predicted, that has been used increasingly to add specificity, and similar indices based on LLN, e.g. FEV 1 /FVC & FEV 1 <LLN, and FEV 1 /FEV 6 The probability of an inconsistent spirometric diagnosis during the second examination (one with COPD and the other with no COPD) versus consistent results (both with the same diagnosis) was estimated from logistic regression models, including as independent variables the deviation of the measured FEV 1 /FVC and FEV 1 /FEV 6 (in standard deviations [SD] from predicted) from the LLN and, in addition, adjustment by several confounders (forced expiratory time, age, gender, current smoking, cumulative smoking in pack-years, the presence of respiratory symptoms, and previous physician diagnosis of respiratory diseases obtained from a questionnaire). Weight and height were measured, and Body mass index (BMI) was categorized into two groups: normal and overweight/obesity. Although in the present work we do not analyze adverse outcomes in detail [14], we investigated if indices giving less inconsistencies on repeated testing also better predicted risk of death and the risk of two or more exacerbations in the previous year. In the presence of airflow obstruction (for the different definitions) we estimated risk of death during follow-up from proportional hazard models (Cox regression done in all individuals including those who died and did not performed the second spirometry) and risk of more than 2 exacerbations in the previous year with a logistic regression model taking into account age and gender. All of the analyses were adjusted for study site (country) and were stratified by gender. Table 1 describes the main baseline characteristics of the study participants. Follow-up exams were conducted for 885 adults in Montevideo, 1,173 in Santiago, and 963 in Sao Paulo. Information was obtained for 758 (85.6%), 993 (84.7%) and 748 (77.7%) subjects, respectively, of whom 2,026 had a good quality post-BD spirometry test during both examinations (68.8% of those with post bronchodilator testing in the first evaluation and 75.6% of those with post bronchodilator testing in the first evaluation surviving or lost at the time of the second evaluation) (see Table 2). Follow-up rates for each independent variable category were around 80% [5,6] Compared with the first examination, individuals with a follow-up exam had less current smoking, and with slightly lower lung function. Mean Forced expiratory time (FET) decreased by about 1 second but was >6 seconds on average, and the rate of valid tests decreased from 90% to 83%. The coefficient of variability of FEV 1 /FEV 6 was lower than that of FEV 1 /FVC, both before and after BD in both examinations. Fig. 1 depicts a scatter plot of the FEV 1 /FVC and Fig. 2 the FEV 1 /FEV 6 both expressed as Zscores of the PLATINO predicted values.

Results
Dispersion is narrower with FEV 1 /FEV 6 (Intraclass correlation coefficient Rho 0.81, 95%CI 0.78-0.84) when compared to the FEV 1 /FVC (ICC Rho 0.77, 95%CI 0.74-0.79). Reliability for for diagnosis (FEV 1 /FVC or FEV 1 /FEV 6 as a dichotomic variable) measured similarly by the ICC Rho were much lower numerically than the ICC Rho coefficients for original variables: for GOLD criteria was 0.57 (95%CI 0.54-0.60), for GOLD stages 2-4 was 0.64 (95%CI 0.61.0.66), for FEV 1 /FEV 6 <LLN was 0.64 (95%CI 0.61-0.66) and for FEV 1 /FVC<LLN 0.57 (95%CI 0.61-0.66) (S1 Table). Inconsistent diagnoses occurred with all three criteria, but were more common when utilizing the GOLD criterion. Tables 2 and 3 describes the individuals switching spirometric diagnoses in the two examinations according to the airflow obstruction criteria as a proportion those with and without airflow obstruction (Table 2) or as a proportion of the total population (Table 3). Population inconsistencies were highest for GOLD criterion (11.7%), intermediate for FEV 1 /FVC<LLN (6.5%), and lowest for the FEV 1 /FEV 6 <LLN criterion (5.9%). Prevalence of airflow obstruction and inconsistency in diagnosis was further reduced using indices requiring a low FEV 1 in addition to a low ratio: GOLD-2-4 (4.1%), FEV 1 /FVC &FEV 1 <LLN (2.2%) AND FEV 1 /FEV 6 Table 3). In Table 3 also note that in general indicators giving the lowest inconsistency in longitudinal diagnosis also provided the lowest population prevalence, the highest hazard ratio for mortality in Cox regression models and the highest risk of having 2 or more exacerbations in the previous year adjusting by age and gender. In addition, in logistic regression models (adjusted by age and gender), individuals with airflow obstruction in both evaluations had increased risk of exacerbations compared to those with inconsistent airflow obstruction regardless of the criterion of airflow obstruction. For example those with FEV 1 /FVC<LLN in both evaluations had an OR 2.99 (95%CI 1.2-7.3) of having 2 + exacerbations in the previous year, compared to an OR of 1.6 (0.6-4.0) in those with inconsistent obstruction. Similarly individuals with FEV 1 /FEV 6 <LLN in both evaluations had an OR of 3.6 (1.6-7.9) of 2+ exacerbations, vs. 1.8 (0.7-4.5) in those obstructed only in one evaluation. However, the association disappeared if FEV 1 expressed as percentage of predicted was included in the models. All individuals with FEV 1 <50% predicted were in the consistent COPD diagnosis. If inconsistency were expressed as percentage of airflow obstruction prevalence at first evaluation (Table 2 and column 6/column 2 from Table 3), differences among indices would be minor. In addition although prevalence of airflow obstruction may be similar in both evaluations, "abnormal" individuals are not necessarily the same: some of them had a "new diagnosis" at the second evaluation and some "normalized" (see Table 2). For example, from the total of individuals with COPD diagnosis at first examination (see Table 2), 30% would be reverted at second evaluation if using GOLD criteria, 26% if using GOLD stages 2-4, and 33% if using A new diagnosis of COPD was observed in 7.3% (GOLD), 3.2% (FEV 1 /FVC<LLN) and 4.4% (FEV 1 /FEV 6 <LLN) of the studied population, whereas a reversed diagnosis of COPD was observed in 4.4% (GOLD), 3.3% (FEV 1 /FVC<LLN), and 1.8% (FEV 1 /FEV 6 <LLN). Variations of both the numerator and the denominator (FEV 1 /FVC or FEV 1 /FEV 6 ) could explain that of the ratios and therefore the diagnosis of airflow obstruction. The majority of individuals with new airflow diagnosis had a drop of FEV 1 in the second evaluation (from 68-82% depending on the criteria) and a minority an increase in FVC or FEV 6 (18 to 34%). In the individuals whose airflow obstruction normalized in the second evaluation this was due (70-94%) to a decrease in FVC or FEV 6 (the denominator) whereas an increase in FEV 1 was observed less frequently (17-48%). A decrease in FEV 1 was associated with an increase in BMI, continuous smoking, and use of bronchodilators or corticosteroids whereas an increase in FVC or FEV 6 was associated with a decrease in BMI or lack of use or bronchodilators or corticosteroids but predicting variables explained less than 3% of the variation of the change in FVC, FEV 6 or FEV 1 . Therefore >97% of changes in FEV 1 , FVC or FEV 6 were unaccounted for by the collected variables. Incident airflow obstruction was associated consistently with age and continuous smoking regardless of the definition but elimination of airflow obstruction was also associated with increased age. Again the variability of the change in spirometric measurements explained by the models was <3%.
If a borderline "buffer zone" is constructed 0.6 SD above and below the LLN of FEV 1 /FEV 6 , this would include 68% of measurements during the first examination and 63% of the measurements during the second examination. None of the individuals would have changed from no COPD in the first examination to COPD in the second examination (performed 5-9 years later), and only four individuals would have changed from COPD in the first exam to no COPD during the second exam. Therefore, nearly all of the changes in diagnosis would fell within the borderline zone. In addition, in this large borderline zone, only 7/1383 individuals (0.5%) had an FEV 1 <80% predicted (GOLD stage 2) and none had an FEV 1 <50% predicted (GOLD stages 3 or 4). The obstructed category was 16% of the total, and included all GOLD stage 3-4 individuals whereas the normal category was also 16% of the total. The borderline zone could be constructed In the logistic regression model (table 4), inconsistency was associated very powerfully with absolute distance of FEV 1 /FVC or FEV 1 /FEV 6 (expressed as Z-scores) from the LLN. That is, the closer the measured FEV 1 /FVC or FEV 1 /FEV 6 to the LLN, the higher the probability of an inconsistent diagnosis during the second examination. For example, the Odds ratio (OR) for an inconsistent diagnosis was 12 (95%CI 8-18) for FEV 1 /FVC<LLN, 28 (95%CI 18-45) for FEV 1 / FEV 6 <LLN and 7 (95%CI 5-9) for GOLD criterion if the first result was <0.6 SD (above or below) from the LLN, compared to the remaining participants. Also, the lower the FEV 1 , the lower the inconsistency (OR 0.65-0.68 per SD from normality, see Table 5 and Fig. 3).    Percentages on the left part of columns are from the total studied population, whereas those on the right are strata (overlapping) from the above categories. 95%CI = 95% confidence interval; Inconsistent = different result between the two examinations. New GOLD = fulfilling FEV 1 /FVC<0.70 only during the second exam (similar for the other airflow obstruction criteria). Reversed GOLD = met GOLD criterion during the first examination, but not during the second examination; LLN = lower limit of normal doi:10.1371/journal.pone.0121832.t005 A longer forced expiratory time was also associated with inconsistent diagnosis (OR 1.06-1.22), and a history of wheezing or asthma in some models, but usually with a borderline statistical significance. Other variables analyzed were age, continuous smoking, BMI, gender, clinical diagnosis of COPD, use of respiratory medications and the duration of the expiratory maneuvers, but all together they explained about 7-17% of the variability of inconsistent diagnosis, and the statistical significance disappeared (except for forced expiratory time) once we included in the models the proximity of the spirometry measurements to the threshold for airflow obstruction and FEV 1 . The closeness of FEV 1 /FVC or FEV 1 /FEV 6 to the LLN was the main predictor of switching diagnosis in the two evaluations whether it was a new airflow obstruction in the second evaluation or an airflow obstruction found in the first evaluation disappearing in the second (see S2 Table with information for FEV 1 /FEV 6 ).

Discussion
We have described the rates and correlates of inconsistent interpretations of airway obstruction in adults during population-based follow-up examinations according to several criteria of airflow obstruction. There were three important findings. Firstly, the inconsistency of COPD diagnosis was observed with all criteria because these are based on sharp cut-off points (the LLN or a FEV 1 /FVC<0.7 for the GOLD criteria plus FEV 1 <80% of predicted for GOLD stage 2-4 or FEV 1 <LLN): that is, we arbitrarily dichotomize a continuous variable (FEV 1 /FVC, FEV 1 / FEV 6 for COPD or FEV 1 for GOLD stages). Utilizing the GOLD criterion, an individual with Instability of COPD Diagnosis an FEV 1 /FVC of 0.69 would be diagnosed with COPD, and during a follow-up exam, if the ratio were 0.7, the subject would no longer be diagnosed as having COPD. Intraclass correlation coefficient, a measure of concordance was lower numerically for COPD diagnosis than for FEV 1 /FVC or FEV 1 /FEV 6 as continuous variables. The same problem exists with strict cut-off values based on LLN. Although certain inconsistency is expected, the recommended diagnosis of COPD assumes a single spirometry and therefore does not consider longitudinal inconsistency.
Secondly, airflow obstruction criteria leading to lower population prevalence also produce less inconsistency on repeated testing: the FEV 1 /FVC had less concordance in the two tests than FEV 1 /FEV 6 whereas the consistency of COPD diagnosis was highest for FEV 1 /FEV 6 <LLN and for definitions of airflow obstruction requiring a low FEV 1 (GOLD stage 2-4).
Thirdly, the closeness of the FEV 1 /FVC, FEV 1 /FEV 6 and FEV 1 to the thresholds or cut points, the higher the possibility of a change in diagnosis on repeated testing (Table 5) independently if the switch represents a new airflow obstruction or a "reversed" airflow obstruction (S2 Table).
Previous studies have shown that the within-test reproducibility of FEV 1 /FEV 6 based on 6-second maneuvers was smaller when compared with the traditional FEV 1 /FVC [9]. We previously reported that utilizing FEV 1 /FEV 6 <LLN provided even more consistent estimates of COPD prevalence in the same populations as criteria based on FEV 1 /FVC (GOLD or <LLN). Therefore, we expected that at the individual level, using the 6-second maneuvers would also provide a consistent diagnostic performance from our baseline examination to a follow-up examination; however, the inconsistency rate was not reduced to zero. This may be reduced by adding an interpretation category of "borderline" COPD in other words, by creating a buffer zone immediately around the LLN, as suggested previously by the National Lung Health Education Program (NLHEP) in the United States [15]. This zone would reduce or avoid the changes in diagnosis from extreme groups i.e. from COPD to normal, or from normal to COPD, instead forcing the majority of changes to occur from and to the borderline zone. This strategy would increase confidence in spirometric diagnosis and would be relatively easy to implement compared to a more sound approach of keeping as continuous FEV 1 /FVC, FEV 1 /FEV 6 and FEV 1 and estimating risks of disease or adverse outcomes or likelihood of response to treatments.
Clinicians usually possess information on individual patients that is not available to Epidemiologists and therefore can use this information to estimate the pre-test disease probability. They can then consider the magnitude of the abnormality from the spirometry test results in order to estimate the post-test probability of COPD. They can also order follow-up tests that will further increase or decrease the probability of COPD, such as a Diffusing capacity (DLCO) test, an exhaled nitric oxide test, induced sputum for eosinophil counts, or a lung Computed tomography scan (looking for emphysema). Clinicians can also perform an asthma therapy trial to determine whether the patient responds, rendering asthma much more likely than COPD. Clinical practice guidelines for hypertension recommend that prior to prescribing blood pressure medication, 3-6 repeated measurements of blood pressure over weeks to months be performed [16]. Perhaps a similar conservative approach should be taken with spirometry for COPD, especially if the results fall within the borderline zone. For these borderline cases, a conservative approach seems adequate because a false COPD label entails costs and potential adverse effects for the patient, including unwarranted bronchodilator prescription or further expensive tests. On the other hand, missing a case of mild COPD has little consequence, as all smokers should be advised to quit smoking regardless of spirometric results, and the majority of drug treatments are meant for moderate or severe COPD.
The main strengths of this study include the population-based sampling, high quality of the spirometry tests (15), the measurement of post-BD spirometry during both examinations and the relatively high rates of follow-up after 6-9 years. Limitations of our study include the following: A lack of annual spirometry testing, which could have determined whether or not study participants were "rapid decliners" consistent with COPD progression [15]. Shifts from airway obstruction to normal spirometry during the 5-9 year period could have occurred due to factors that we did not measure, such as respiratory infections or exposure to irritants during the first examinations (but not during the second examination), smoking cessation, or treatment for asthma, COPD, or heart failure; or reductions in exposures causing bronchospasm at the time of the first examination. However, an inconsistent airflow obstruction diagnosis, with any definition was associated significantly with the proximity to the limit of normality and the expiratory time of the tests and therefore likely mostly determined by intrinsic properties of the tests (reliability) rather than true pathobiological processes. Finally, in the current work we did not analyze in detail adverse outcomes, but thresholds giving less prevalence of airflow obstruction, also give less inconsistencies and are associated to a greater risk of death and of exacerbations.
In conclusion, current criteria for airflow obstruction, even those based on age-and genderadjusted FEV 1 /FVC or FEV 1 /FEV 6 <LLN, and those requiring in addition a low FEV 1 produce inconsistencies over time that can cause confusing changes in the COPD diagnosis more likely if tests are around the threshold defining airflow obstruction. Diagnosing airflow obstruction using the 6-second spirometry (FEV 1 /FEV 6 <LLN) especially if FEV 1 is also low, reduces these inconsistencies compared with criteria based on FEV 1 /FVC<LLN or GOLD criterion. Perhaps the inclusion in the diagnostic scheme of a "gray or borderline" zone around the LLN would minimize changes from disease to health or vice-versa. Finally, clinicians should utilize information about the patient to weigh the consequences of a false-positive versus a false-negative disease label before making clinical decisions.
Supporting Information S1 Dataset. Dataset in Excel (XLS) format. (XLS) S1 Table. Reliability by the intraclass correlation coefficient (ICC) and 95%CI between two spirometry tests (at baseline and follow-up) in the same individuals. (DOCX) S2 Table. Determinants of new and reversed airflow obstruction diagnosis by a logistic regression model using as criteria FEV 1 /FEV 6 <LLN. (DOCX)