Two Distinct Chronic Obstructive Pulmonary Disease (COPD) Phenotypes Are Associated with High Risk of Mortality

Rationale In COPD patients, mortality risk is influenced by age, severity of respiratory disease, and comorbidities. With an unbiased statistical approach we sought to identify clusters of COPD patients and to examine their mortality risk. Methods Stable COPD subjects (n = 527) were classified using hierarchical cluster analysis of clinical, functional and imaging data. The relevance of this classification was validated using prospective follow-up of mortality. Results The most relevant patient classification was that based on three clusters (phenotypes). Phenotype 1 included subjects at very low risk of mortality, who had mild respiratory disease and low rates of comorbidities. Phenotype 2 and 3 were at high risk of mortality. Phenotype 2 included younger subjects with severe airflow limitation, emphysema and hyperinflation, low body mass index, and low rates of cardiovascular comorbidities. Phenotype 3 included older subjects with less severe respiratory disease, but higher rates of obesity and cardiovascular comorbidities. Mortality was associated with the severity of airflow limitation in Phenotype 2 but not in Phenotype 3 subjects, and subjects in Phenotype 2 died at younger age. Conclusions We identified three COPD phenotypes, including two phenotypes with high risk of mortality. Subjects within these phenotypes may require different therapeutic interventions to improve their outcome.


Introduction
Chronic obstructive pulmonary disease has long been categorized using the FEV 1 -based GOLD classification [1]. However, marked heterogeneity exists within each GOLD stage in terms of symptoms, exacerbations, quality of life and exercise capacity [2]. Mortality risk is also heterogeneous within each GOLD stage, because FEV 1 is not the only determinant of mortality in COPD patients [3]. Other factors independently associated with survival include age, dyspnoea, health status, hyperinflation, gas exchange abnormalities, exacerbation frequency, exercise capacity, pulmonary hemodynamic, and nutritional status [4].
Recently, interest has emerged for the identification of clinical COPD phenotypes [5], as defined by ''a single or combination of disease attributes that describe difference between individuals with COPD as they relate to clinically meaningful outcomes'' [6]. Cluster analysis has appeared as a useful tool to identify subgroups of patients with airway diseases [7,8,9,10], including subgroups of patients with COPD [11,12].
In the present study, we performed a cluster analysis using multiple variables (including lung function, imaging, and comorbidities) obtained in a large cohort of COPD subjects recruited in stable condition. The clinical relevance of these clusters of subjects was validated using survival data obtained during longitudinal follow-up. Our aim was to examine whether clusters of COPD patients identified with an unsupervised approach differed in mortality.

Patients
Clinical, functional and imaging data obtained in COPD patients [1] at inclusion in the study (cross-sectional data) were analyzed using unsupervised analysis. Validation of the clinical relevance of these clusters of patients was achieved using survival data obtained during prospective follow-up. To ensure sufficient patient heterogeneity, subjects recruited in two separate cohorts were studied. The first cohort was composed of 506 subjects recruited at the LEUVEN university hospital COPD outpatient clinic. The second cohort was composed of 378 subjects recruited in the neighbourhood of LEUVEN as part of the Dutch-Belgian randomized lung cancer screening (NELSON study) [13]. Inclusion criteria in this latter cohort were a smoking history $15 pack-years and age .50 years, and only 154 patients had a diagnosis of COPD (according to a post-bronchodilator FEV 1 / FVC,0.70) [1]. Further, eleven patients were excluded from the cohort LEUVEN clinic cohort due to a FEV 1 /FVC ratio$0.70. Thus, our COPD population was composed of 649 subjects (495 from the LEUVEN clinic and 154 from the NELSON study). The COPD subjects included in this cluster analysis were required to have complete information for 7 selected continuous variables (see below), leading to the exclusion of 122 COPD subjects (121 from the LEUVEN clinic) due to missing data. The final study population included in the cluster analysis contained 527 COPD (LEUVEN clinic n = 374; NELSON subjects, n = 153) [13]. A flow chart describing patient selection is provided in Figure 1. A description of characteristics of COPD patients recruited in the LEUVEN clinic and in the NELSON study and a description of the excluded COPD subjects is provided in Table S2. All studies were approved by the Ethics Committee at the University Hospitals of Leuven (Leuven, Belgium) and all participants provided written informed consent.

Data Collection
Data were obtained at the time of inclusion in the studies. Demographic characteristics, post-bronchodilator pulmonary function assessment, CT scan of the chest, and questionnaires on dyspnoea (mMRC) and quality of life (CCQ) [14] were collected. In patients recruited at the LEUVEN clinic, data on comorbidities were obtained from medical records at the time of inclusion. Comorbidities of subjects enrolled via the NELSON study were obtained by detailed interview and review of concomitant medications at the time of inclusion. In case of doubt, general practitioners were contacted for double checking. Data on the following COPD-related comorbidities were collected: ischemic heart disease, stroke, peripheral arterial disease, diabetes, osteoporosis, skeletal muscle weakness (quadriceps force ,80% predicted) and anaemia (haemoglobin ,11 g/dl on last venous blood sample). Patients recruited in the NELSON study had no data available for peripheral arterial disease and muscle weakness.
The complete protocol used for CT imaging, which was based on National Emphysema Treatment Trial criteria [15], was described in a previous report [16]. Emphysema was semiquantitatively assessed by a visual scoring system. Four categories were generated yielding a four-level alveolar destruction scale (no emphysema, mild emphysema affecting ,20%, moderate emphysema between 20-50%, and severe emphysema .50% of the lung) [16]. Thickening of the bronchial walls was scored on a semi- quantitative three-level scale and presence or absence of bronchiectasis was assessed [16].
All pulmonary function data were obtained with standardized equipment (Jaeger) according to ATS/ERS consensus guidelines. Spirometric values were post-bronchodilator values. Diffusing capacity was determined by single breath carbon monoxide gas transfer method (DLCO) and corrected for alveolar ventilation (Kco) but not for haemoglobin. All data were obtained as absolute values and expressed as percent predicted of reference values [17].

Plan for Cluster Analysis
Our strategy was to combine both continuous and categorical data in a single cluster analysis aimed at the identification of COPD phenotypes. Based on the result of a first analysis of this database ( Table S1) we made a selection of continuous variables to be included the cluster analysis. This analysis resulted in the selection of 7 continuous variables (see below). Because some continuous variables were correlated with each other (Table S3), we eliminated correlations between variables by performing a principal component analysis. For categorical variables, all variables were used in the analysis but these variables were submitted to multiple correspondence analyses (MCA) to transform them into independent mathematical axes. This latter procedure allowed using in a single cluster analysis (Ward's procedure) the significant axes identified in the MCA and the significant component identified by PCA. A description of these procedures is presented in Text S1.

Processing of Continuous and Categorical Variables
Seven continuous variables were selected for their relevance to COPD natural history: age, body mass index (BMI), FEV 1 (% predicted), mMRC scale, CCQ total score, thoracic gas volume (TGV, % predicted) and DLCO (% predicted). Subjects with complete data for these 7 variables were submitted to PCA. The first two axes identified in the PCA had eigenvalues .1 and were kept for cluster analysis (Table S4 and S5).
All categorical variables available were submitted to MCA. The variables included in these analyses were comorbidities, and data obtained from CT analysis, including emphysema, bronchial thickening and bronchiectasis. MCA identified 17 axes of which 3 were excluded because they happened to be correlated mostly with missing information on comorbidities (Table S6 and S7). Thus, we were able to exclude these 3 axes without losing significant information and only 14 axes were kept for cluster analysis.

Vital Status and Survival Analyses
Vital status was assessed as per January 1 st 2010. For patients followed at the University hospital, mortality data were obtained from medical files. When no data on mortality was retrieved, general practitioners (GP's) caring for the patient were contacted to check survival. For subjects from the NELSON study, survival was checked by direct telephone contact with GP's. Subjects who were lost to follow-up (n = 8) were not included in the survival analysis because no information was available on their vital status. Additionally the exact date of death was unavailable in 8 subjects who died during follow-up. Thus, the survival analyses were performed in 511/527 (97%) subjects.
Survival analyses were performed on all-cause mortality using Kaplan-Meier and log-rank tests with Tukey-Kramer adjustments for multiple comparisons. Because age was markedly different among Phenotypes, we further studied mortality risk using a Cox model adjusted for age.

Statistics
Data are presented as median [interquartile range (IQR)] or %. A P,0.05 was considered statistically significant. Analyses were performed using the SAS 9.2 statistical software (Cary, North Carolina, USA).

Characterization of COPD Patients Based on GOLD Classification
Characteristics of the 527 COPD patients according to the spirometric GOLD classification are presented in Table 1. Subjects recruited in the NELSON study were mostly in spirometric GOLD stage I and II (65% and 31%, respectively), whereas subjects recruited in the LEUVEN clinic were mostly in spirometric GOLD stage II, III and IV (33%, 38%, and 24%, respectively) (also see Table S2). Increasing GOLD stage was associated with increased dyspnoea, decreased HRQoL (higher CCQ total score), a lower BMI, higher lung hyperinflation and decreased lung diffusing capacity. Extent of emphysema, bronchial thickening and bronchiectasis were also associated with more severe airflow limitation. Muscle weakness and osteoporosis increased with GOLD stages, whereas diabetes and cardiovascular comorbidities appeared relatively unrelated to the degree of airflow limitation.

Identification of COPD Phenotypes using Cluster Analysis and Mortality Rates
We performed a Ward's cluster analysis based on the significant mathematical axes identified by PCA and MCA for continuous and categorical variables, respectively. Classification of the 527 COPD patients resulted in a dendrogram showing the progressive joining of the clustering process ( Figure 2). Based on visual assessment of the dendrogram, data could be optimally grouped into 3 or 5 clusters, each cluster corresponding to a potential phenotype. To decide on the number of phenotypes, we examined mortality rates among clusters. When grouping the data into 3 clusters, there was a clear difference in mortality rates among clusters ( Table 2 and Figure 2). Grouping the data into 5 clusters did not improve the ability to predict mortality because this only resulted in the division of clusters 1 and 3 into two new clusters (for each), but mortality was comparable in these newly formed clusters (Figure 2).

Characterization of COPD Phenotypes
Characteristics of subjects grouped into 3 clusters (phenotypes) are presented in Table 2.
Phenotype 1 (n = 219 subjects) corresponded to subjects with a median [IQR] age of 62 [58-68] yrs., mild to moderate airflow limitation, absent or mild emphysema, absent or mild dyspnoea, normal nutritional status and limited comorbidities. Two third of these subjects were recruited in the NELSON study whereas one third of these subjects were recruited in the LEUVEN clinic. Of note, 95% of the NELSON subjects clustered in this phenotype. Only 1/219 (0.5%) subject died in this phenotype.
Phenotype 2 (n = 99 subjects) corresponded to subjects with a median [IQR] age of 61 [57-66] yrs., severe airflow limitation, marked emphysema and hyperinflation, low BMI, severe dyspnoea, and impaired HRQoL. One third of these subjects were women, and osteoporosis and muscle weakness were highly prevalent, whereas diabetes and cardiovascular comorbidities were less prevalent. Two subjects were lost to follow-up and mortality rates were very high with 20/97 (20.6%) deaths. Phenotype 3 (n = 209 subjects) mostly corresponded to male subjects with a median [IQR] age of 72 [65-77] yrs., and moderate to severe airflow limitation. These subjects had less severe emphysema than subjects in Phenotype 2, but higher prevalence of bronchial thickening. They were often obese and had high rates of diabetes and cardiovascular comorbidities. Six subjects were lost to follow-up and mortality rates were also high with 29/203 (14.3%) deaths. When comparing Phenotypes 2 and 3, in which subjects were at high risk of mortality, the pattern of mortality was different. In Phenotype 2, 75% of subjects who died were in GOLD stage IV and 25% were in GOLD stage III, indicating that the mortality pattern followed the severity of airflow obstruction. By contrast, in Phenotype 3, mortality distributed among all GOLD stages ( Figure 3).

Survival Pattern According to Phenotypes
Kaplan-Meier analysis of mortality between the 3 phenotypes is presented in Figure 4. Subjects in Phenotype 2 and 3 were at higher risk of mortality than subjects in Phenotype 1 (each comparison, P,0.0001; log-rank test), but no significant difference was observed between Phenotype 2 and 3. Because age at inclusion was markedly different between these latter phenotypes (median age, 61 yrs. vs. 72 yrs.), we hypothesized that subjects in Phenotype 2 had died earlier in life than subjects in Phenotype 3. Median [IQR] age of death was 64.5 [60.4-68.9] yrs. in Phenotype 2 (n = 16) and was 75.9 [70.8-77.8] yrs. in Phenotype 3 (n = 25). To take this difference into account, we performed Cox model analyses of mortality using phenotypes and age as covariates ( Table 3). After adjustment for age, subjects in Phenotype 2 had a 3-fold increase in mortality compared with subjects in Phenotype 3.

Discussion
In this large population of COPD subjects with a wide range of airflow limitation, we identified three COPD phenotypes, including one phenotype at low risk of mortality and two distinct phenotypes (Phenotype 2 and 3) at high risk of mortality. Phenotype 2 included younger patients with severe respiratory disease, low BMI and low rates of cardiovascular comorbidities. Phenotype 3 included older patients with less severe airflow limitation, but who were often obese and had higher rates of cardiovascular comorbidities and diabetes. These findings suggest that different strategies for improving outcome should be proposed to these two groups of COPD patients.
We have identified clusters of COPD subjects, which were associated with different mortality rates and patterns, qualifying as phenotypes [6]. In a French cohort of COPD subjects, investigators identified four clusters of subjects, including two clusters of subjects at high risk of predicted mortality [11]. In the present study, the two phenotypes that were at high risk of actual mortality  were very similar to those identified in the French study [11]. Because all subjects had extensive characterization, including complete lung function assessment and CT scan, our current data further improve the description of these phenotypes. Garcia-Aymerich et al. also performed a cluster analysis of 342 Spanish subjects hospitalized for the first time because of COPD exacerbation [12]. The authors described 3 phenotypes, including ''a severe respiratory phenotype'' and ''a systemic COPD phenotype'' [12], which were at high risk of serious events (hospitalization and/or mortality). The ''systemic COPD phenotype'' was characterized by high prevalence of obesity and cardiovascular disease [12], corresponding to our Phenotype 3. However, the ''severe respiratory phenotype'' differed from our Phenotype 2 in that patients were not younger and had no malnutrition [12]. Such difference may be related to the recruitment of a specific population of subject at the time of first hospitalization for COPD exacerbation. Female represented only 6-8% of subjects in the Garcia-Aymerich's study, whereas they represented up to one third of subjects in our Phenotype 2. Interestingly, recent data suggested that female gender is a risk factor for early onset COPD and more severe disease at young age [18]. These phenotypic differences underline the need for external validation of identified phenotypes across multiple populations.
Some limitations have to be taken into account when interpreting our results. Although repeated and severe exacerbations are important predictors of mortality [19], we had no data on exacerbations. Our study was based on the assessment of COPD patients coming to an outpatient clinic and smokers recruited for a study on lung cancer screening. Although these patients had a wide range of disease severity, they may not represent the COPD population at large and different results may be observed when studying different populations of patients. COPD subjects recruited as part of the NELSON study [13] were submitted to systematic screening and may not be representative of symptomatic subjects receiving a diagnosis of COPD. The inclusion of these subjects allowed for studying COPD subjects with a wide range of disease severity because the NELSON subjects were mostly in GOLD stage I and II, whereas the LEUVEN subjects were mostly in GOLD stage II, III and IV. Interestingly, 95% of the NELSON subjects and 19% of the LEUVEN subjects clustered in Phenotype 1, in which mortality was almost absent. Thus, our methodology was able to identify subjects at low risk of mortality in subjects with previously diagnosed and with previously undiagnosed COPD. In this real-life COPD population, 8/527 (1.5%) subjects were lost to follow-up and the exact date of death was unavailable in 8/50 (16%) subjects who died during follow-up. Because survival analyses were performed in 511/527 (97%) subjects, missing data   were unlikely to significantly affect our results. Our survival analyses were based on all-cause mortality and specific causes of mortality could not be determined, which prevented us from determining whether causes of death differed between phenotypes. Phenotype 2 subjects who died during follow-up were mostly in GOLD stage IV, whereas Phenotype 3 subjects who died distributed in all GOLD stages (Figure 3), suggesting that airflow obstruction was not its main determinant. Although it is likely that subjects in Phenotype 2 had higher rates of lung function decline, further studies specifically assessing lung function decline with longitudinal spirometric data will be required to confirm this hypothesis. Assessment of comorbidities was based on physiciandiagnosed comorbidities and not on a systematic diagnostic workup. Because high rates of underdiagnosed cardiovascular comorbidities have been previously reported in COPD patients [20], we cannot exclude that such comorbidities contributed to death in some patients without any diagnosed concomitant disease. Finally, our methodology lead to the exclusion of 122 COPD patients who had missing data for mMRC and CCQ scores (see Methods  (Table S8). Although we were unable to include the patients in our analysis, these findings further reinforce our conclusions, as the excluded patients could correspond to subjects in Phenotype 2. Although some therapies (e.g., smoking cessation, pulmonary rehabilitation, bronchodilators) may be beneficial in all COPD subjects, differential characteristics of subjects in Phenotype 2 and Phenotype 3 suggest that different strategies may be developed for improving outcome, eventually resulting in better survival. Subjects in Phenotype 2 may preferentially benefit from lung transplantation as they are younger and have little co-morbidities. Early detection of subjects in Phenotype 2 would allow for early intervention, with the goal of developing disease-modifying therapy. Thus, future treatment targeting airway and parenchymal disease progression (e.g., growth factor receptor antagonists, protease inhibitors) may be of particular interest in these subjects with severe and early onset respiratory disease. We also speculate that interventions (e.g., aspirin, statins, beta-blockers) shown to reduce mortality in subjects with cardiovascular diseases may show optimal survival benefit in older subjects with cardiovascular comorbidities (Phenotype 3).
In summary, this study identified two very different phenotypes of subjects at high risk of mortality: younger subjects with severe respiratory disease and emphysema, and older subjects with less severe respiratory disease and marked cardiovascular and metabolic comorbidities. Pathophysiological studies should take these phenotypes into account to determine whether they relate to specific mechanisms and/or are differentially associated with specific genotypic signature or biomarkers. Further, potential therapeutic implication of these phenotypes can now be examined in prospective trials. Future studies should also focus on establishing simple algorithms based on the most discriminant factors for assigning patients to specific phenotypes. Such algorithms will have to be tested in validation cohorts before they can be utilized in clinical practice.

Supporting Information
Text S1 Additional information on statistical analyses. (DOC)