The COPD multi-dimensional phenotype: A new classification from the STORICO Italian observational study

Background This paper is aimed to (i) develop an innovative classification of COPD, multi-dimensional phenotype, based on a multidimensional assessment; (ii) describe the identified multi-dimensional phenotypes. Methods An exploratory factor analysis to identify the main classificatory variables and, then, a cluster analysis based on these variables were run to classify the COPD-diagnosed 514 patients enrolled in the STORICO (trial registration number: NCT03105999) study into multi-dimensional phenotypes. Results The circadian rhythm of symptoms and health-related quality of life, but neither comorbidity nor respiratory function, qualified as primary classificatory variables. Five multidimensional phenotypes were identified: the MILD COPD characterized by no night-time symptoms and the best health status in terms of quality of life, quality of sleep, level of depression and anxiety, the MILD EMPHYSEMATOUS with prevalent dyspnea in the early-morning and day-time, the SEVERE BRONCHITIC with nocturnal and diurnal cough and phlegm, the SEVERE EMPHYSEMATOUS with nocturnal and diurnal dyspnea and the SEVERE MIXED COPD distinguished by higher frequency of symptoms during 24h and worst quality of life, of sleep and highest levels of depression and anxiety. Conclusions Our results showed that properly collected respiratory symptoms play a primary classificatory role of COPD patients. The longitudinal observation will disclose the discriminative and prognostic potential of the proposed multidimensional phenotype. Trial registration Trial registration number: NCT03105999, date of registration: 10th April 2017.


Introduction
Chronic Obstructive Pulmonary Disease (COPD) is an umbrella definition encompassing a variety of clinical and pathophysiological conditions. Selected clinical phenotypes, namely the bronchitic, the emphysematous and the asthma-like, have been repeatedly recognized, but disease pattern not always univocally conforms to one of them [1]. Indeed, the clinical presentation frequently resembles a mixture of the classical phenotypes. Furthermore, comorbidity and age itself largely shape the clinical and health status of COPD patients. Indeed, a senescenceassociated secretory phenotype has been identified and might affect the clinical expression of COPD in the elderly [2]. Unfortunately, large pharmacological randomized clinical trials (RCTs) could not shed light on phenotypic variability due to the stringent selecting criteria. Further complicating this issue is the lack of attention to the circadian rhythm of respiratory symptoms in RCTs. On the other hand, several observational studies point out at circadian symptom variability as a correlate of disease severity [3] or a distinctive clinical trait [4] or a correlate of derangement of selected health status dimensions [5]. Furthermore, nocturnal symptoms are reported by the majority of COPD patients [6] and dramatically increased with age in a COPD population over 65 [7]. However, circadian rhythm variability in COPD is only marginally and partially explored in selected RCT with the only intent of providing 24 hours pharmacological coverage and, then, symptoms control. Thus, how circadian rhythm of symptoms affects the clinical patterns and therapeutic needs of COPD has not been the object of interest. Paradoxically, the growing interest in comorbidity of COPD and its role as a determinant of health status seems to overcome the research on the respiratory dimension of COPD.
The STORICO (STudio Osservazionale sulla caratteRizzazione dei sIntomi delle 24 ore nei pazienti con broncopneumopatia cronica ostruttiva, Observational study on characterization of 24-h symptoms in patients with COPD) study assessed the association between circadian rhythm of symptoms and several measures of health status [8]. Thus, it represents the ideal framework to verify whether an in depth characterization of symptoms may allow define clinical variants of COPD with distinctive classificatory and discriminatory properties. Thus, the present paper is aimed to: i. develop an innovative classification of COPD based on a multidimensional assessment (named multi-dimensional phenotype) which takes into consideration primarily circadian rhythm of symptoms, but also demographic characteristics, health related quality of life, respiratory function and comorbidities.
ii. describe the patients belonging to the so-identified multi-dimensional phenotypes with respect to body mass index, respiratory parameters, previous exacerbations, level of physical activity, quality of sleep, level of depression and anxiety and ongoing COPD pharmacological therapies.
The final objective of such an attempt is to investigate the discriminatory properties and potential clinical implications of this alternative classificatory method.

Study subjects
The STORICO study (trial registration number: NCT03105999) enrolled subjects aged �50, current/ex-smokers, with a diagnosis of stable COPD according to the GOLD 2014. The study was approved by the ethical committee of the coordinating center (Fondazione Toscana G. Monasterio Pisa, Italy) and was conducted in accordance with the Declaration of Helsinki and the Good Clinical Practices guidelines for observational studies, complying with all requirements of local regulations. Patients provided written, informed consent before study participation. Subjects included, study design and methodology of the STORICO study are extensively described elsewhere [8].
Patients with available information about early-morning, day-and night-time COPD symptoms and clinical phenotype at enrollment were considered evaluable for analyses at baseline; among these, the ones with available variables for (factor and cluster) analyses (below detailed in Methods paragraph) were analyzed in this paper.

Study design
STORICO is an Italian observational cohort multicentre currently ongoing study conducted in 40 pneumology centers. The study lasted from February 2016 (first subject first visit) to June 2018 (last subject last visit); three visits were planned (baseline, 6-and 12-months follow up).

Methods
At enrollment patients completed the Night-time, Morning and Day-time Symptoms of COPD questionnaire [5] (hereafter named "symptoms questionnaire") covering the frequency and severity of COPD symptoms (breathlessness, coughing, bringing up phlegm or mucus, chest tightness, chest congestion and wheezing) during each part of the day (night-time, early-and day-time). Linguistic validation of the questionnaire in Italian was performed by the authors to ensure accurate translation and a clear understanding of the questionnaire itself [5].
Health Related Quality of Life was evaluated by the St. George's Respiratory Questionnaire (SGRQ), in its Symptoms, Activity component and Impact scores [9][10] ranging between 0 (no impairment) and 100 (highest impairment), lower scores corresponding to better health.
Anxiety and depression levels were evaluated through the Hospital Anxiety and Depression Scale (HADS) [11], a total score (ranging 0-42) and anxiety and depression subscales scores (ranging 0-21) were computed, with higher scores indicating more distress.
The impact on sleep due to respiratory disease was assessed with the total score of COPD and Asthma Sleep Impact Scale (CASIS) [12], ranging 0-100, with higher scores indicating greater sleep impairment.
Physical activity was assessed by means of the categorical score (low, medium, high physical activity) of the International Physical Activity Questionnaire (IPAQ) [13] on patients aged 15-69 years. Spirometry was performed according to the recommendations of the American Thoracic Society (ATS) and the European Respiratory Society (ERS) and lung function measurements were done with patients either standing or sitting with the nose clipped after at least 10 minutes rest [14].
Presence of relevant comorbidities according to clinical judgment, body mass index (BMI), spirometry functional assessment and occurrence of exacerbations in the 5 years before baseline were recorded too as far as ongoing pharmacological therapies for COPD (long-acting beta-agonists (LABA), long-acting muscarinic antagonist (LAMA), inhaled corticosteroid (ICS) and any combination, Others).

Analysis
In order to identify the multi-dimensional phenotypes (m-phenotypes), a multi-step approach was followed. STEP 1-Exploratory factor analysis. An exploratory factor analysis (EFA) was performed to find independent latent constructs (factors), not directly measurable and influencing responses on observed variables. EFA is a variable reduction technique which does not impose any preconceived structure on the outcome [15] and the observed variables included in the model are a linear combination of the underlying factors.
The following items of the symptoms questionnaire were included in the factor analysis: presence/absence of feeling short of breath or breathless (items 18a, 26a, 7a), cough (items 18b, 26b, 7b), bringing up phlegm or mucus (items 18c, 26c, 7c), each symptom evaluated during the 24-hours (i.e. in the early-morning, day-time and night-time). The presence of the symptom was coded as 1 and the absence as 0.
The value of the FEV1% of the predicted, of (symptoms, activity, impact) SGRQ scores, demographic variables (age as continuous variable and gender coded as 1 for males and 2 for females) and presence/absence of relevant comorbidities (cardiac ischemic disease, arterial hypertension, heart failure, atrial fibrillation, diabetes, osteoporosis, depression, kidney insufficiency) were also included in the model (presence was coded as 1 and absence as 0). Orthogonal VARIMAX rotation was applied; factors having an eigenvalue > 1.0 [16][17] and individual variables with higher-than-0.5 loadings on retained factors were retained [18]. STEP 2 -Definition of classificatory variables. STEP 1 brought to identification of n factors. Then, the variables included in the n-th factor were evaluated and, whenever possible, combined into a new classificatory variable (a classificatory variable for each factor was defined). STEP 3 -Cluster analysis. A cluster analysis with classificatory variables mentioned at STEP 2 as input was performed. Average linkage was chosen as clustering method and average distance between clusters equal to 0.70 was taken as reference to cut the dendrogram. Quality of obtained clusters was evaluated by means of Semipartial R-squared (i.e. the loss of homogeneity due to combining two clusters to form a new cluster) and R-square (i.e. the proportion of variance accounted for by the clusters).
Once identified, the different m-phenotypes were described with respect to BMI, spirometric parameters, number of previous exacerbations, quality of sleep, level of physical activity, depression and anxiety and therapies for COPD at enrollment by means of median, 25 th and 75 th percentile for quantitative variables and absolute/relative frequency for categorical ones within each class of m-phenotype. Moreover, although SGRQ scores were included in EFA, SGRQ scores of m-phenotypes were compared to provide a comprehensive description of health status.
Analysis of variance and Kruskal-Wallis test by ranks (on means and medians respectively) and Fisher exact test (for categorical variables) were used to compare variables vs m-phenotype (variable in 5 classes). Upon statistical significance of these tests, Mann-Whitney test on medians and Chi-square or Fisher exact tests were then applied to compare variables among specific pairs of m-phenotypes. Alpha (with Bonferroni correction) was set to 0.0004 considering the total number of performed tests.
Statistical analysis was performed using SAS v9.4 and Enterprise Guide v7.1.

Evaluable patients
Among the 683 COPD patients enrolled in the STORICO study, 606 (88.7%) subjects (age 71.4 ±8.2 years, 75.1% males) were deemed evaluable for the analysis at baseline; 92 subjects were then excluded because they had missing information on variables analyzed in the factor and cluster analyses. Violations causing exclusion are shown in Fig 1. So, 514 (mean±SD age: 71.4± 8.0 years) were deemed evaluable for the analyses here described.
Identification of m-phenotypes STEP 1-Exploratory factor analysis. As results of the factors analysis, two factors accounting for 82% of the total variability were retained. In Table 1, the factor loadings of individual variables are shown. Factor 1 covers 58.6% of variability and its components are bringing up phlegm or mucus and cough at any time of the day. Factor 2, explaining 23.2% of variability, is composed by breathlessness at any time of the day and by SGRQ scores. A third factor explained a reduced amount of variability (9.5%) and no variables had a factor loading >0.5; so, it was not retained (output of factor analysis is reported in S1 Table and Table 1), were combined into 2 new 3-levels classificatory variables, namely "cough and/or bringing up phlegm or mucus" and "breathlessness". The levels of them had increasing severity and were defined based on the frequency of occurrence of the symptoms: (i) never during 24 hours, (ii) in the early-morning and/or daytime, but never in night and (iii) in night and in the early-morning and/or day-time. The distribution of patients according to classificatory variables is shown in Table 2.
As SGRQ scores were retained in the second factor, they were described too within the levels of "breathlessness" classificatory variable ( Table 2). The (symptoms, activity, impacts) scores significantly increased with level of severity of "breathlessness" (p-value < .0001 for scores in all pairwise comparisons among "breathlessness" categories). STEP 3 -cluster analysis. The cluster analysis run on previous mentioned classificatory variables brought to five well defined clusters (Semipartial R-squared = 0.0260, R-square = 0.828; dendrogram is shown in S2 Table and S2 Fig). Such clusters are thereafter called m-phenotypes and they are below described.

M-phenotypes and circadian rhythm of symptoms
As shown in Fig 2 the m-phenotypes were characterized with regard to the circadian rhythm of symptoms as follows:

M-phenotypes vs BMI, respiratory parameters and number of exacerbations
In Table 3

M-phenotypes vs quality of life, quality of sleep, level of physical activity, anxiety and depression
Quality of life, as expressed by SGRQ scores, was better in MILD COPD than in patients of other m-phenotypes (p-values tests MILD COPD vs each of the other m-phenotypes < .0001) ( Table 4). The SEVERE MIXED COPD m-phenotype had higher SGRQ symptoms score than the other phenotypes (p-values < .0001) and higher SGRQ activity and impacts scores than in MILD EMPHYSEMATOUS or SEVERE BRONCHITIC patients (p-values < .0001). The patients in MILD COPD m-phenotype had better quality of sleep than patients of other m-phenotypes (p-values < .0001) ( Table 4). A negative grading of quality of sleep was evident from MILD COPD to MILD EMPHYSEMATOUS and, then, SEVERE BRON-CHITIC, SEVERE EMPHYSEMATOUS and SEVERE MIXED COPD.
The patients with MILD COPD had lower HADS total, anxiety and depression scores than MILD EMPHYSEMATOUS or SEVERE MIXED COPD patients (p-values < .0001) and had lower total and depression scores than SEVERE BRONCHITIC patients (p-values < .0001) ( Table 4). Differences between HADS scores of patients with SEVERE MIXED COPD vs MILD EMPHYSEMATOUS or SEVERE BRONCHITIC patients were also significant (p-values < .0001).
Low physical activity, as expressed by the IPAQ score, largely prevailed in the patients with SEVERE MIXED COPD and SEVERE EMPHYSEMATOUS m-phenotypes. However, differences between m-phenotypes did not reach significance (Table 4).

M-phenotypes vs pharmacological therapies for COPD
Pharmacological therapy did not differ significantly between m-phenotypes (Table 5).

Discussion
We found that the circadian rhythm of the three main respiratory symptoms and an index of disease-related health status could classify the vast majority of patients, whereas comorbidity, FEV1, age and gender added little to the classificatory power of the model. Indeed, based on these components of the latent factors, it was possible to identify five multidimensional phenotypes with distinctive pattern of disease (Fig 2): the MILD COPD, the MILD EMPHYSEMA-TOUS, the SEVERE BRONCHITIC, the SEVERE EMPHYSEMATOUS and the SEVERE MIXED COPD.
Taken together, these findings point out that the "core business" of COPD, the clinical one, can distinguish patients with different disease profiles. The fact that two factors, extracted through a parsimonious model, could explain 82% of intrinsic variability testifies to the strength of the classificatory procedure and to the overall quality of the clustering based mainly on clinical ground.
Consistent with the classical distinction of bronchitic and emphysematous phenotypes, dyspnea and productive cough contributed to define two well distinguished dimensions of the disease. Accordingly, attempts at optimally caring the individual COPD patient should have individually tailored goals: relieving bronchial obstruction might not be the main objective of the care in patients burdened mainly or exclusively with productive cough, whereas relieving dyspnea could be the core therapeutic goal in another group of patients, and the timing of Table 3. Multidimensional phenotype vs BMI, respiratory parameters and previous COPD exacerbations at enrollment.

MILD COPD MILD EMPHYSEMATOUS SEVERE BRONCHITIC SEVERE EMPHYSEMATOUS SEVERE MIXED COPD
BMI n = 188 n = 92 n = 132 n = 18 n = 79 26.6 (24.1-28.7) 26.8 (24.2-29.8) 26.9 (24.3-30.1) 26.9 (23.7-30.1) 26.5 (23.5-30.1) N    The COPD multi-dimensional phenotype symptoms further adds to the decision making [19]. This seemingly obvious finding has important practical implications. Indeed, FEV1 or the six minute walked distance should not be considered universal effect measures in randomized clinical trials. While these indexes have great merit as overall indexes of disease severity, they might not catch the core expression of COPD and, then, the main expected benefit of therapy in an important proportion of patients.
The findings of the current study strongly advocate for the design of multi-outcome or patient-tailored RCTs in the perspective of precision medicine. Pursuing the patient centered objective of care is also expected to increase adherence to therapy. Indeed, the surprisingly high rate of non-adherence to inhaled medication, including inappropriate inhaler use by COPD patients [20] as well as the usually high rate of enrollees lost in RCT due to withdrawn consent [21][22] might reflect the perception of the care as missing the own needs by many patients.
The circadian rhythm of symptoms played a major classificatory role. Indeed, MILD COPD and MILD EMPHYSEMATOUS patients never/very unfrequently experience nighttime symptoms, which, instead, are always present in SEVERE EMPHYSEMATOUS, SEVERE BRONCHITIC and SEVERE MIXED COPD m-phenotypes. Several evidences support the role of circadian rhythm of symptoms in defining the clinical spectrum of COPD. Indeed, bronchial obstruction is strictly related to nocturnal symptoms [3] and wheezing has been reported to be the most common of them [23][24]. Interestingly, night time symptoms qualified as the main correlate of depression in the Assess study [5]. However, Physicians significantly underestimate the impact of COPD on the patient's ability to get up in the morning and on sleep [6]. Unfortunately, only in the last few years attention has been paid to nocturnal symptoms, but mainly to empirically modulate the pharmacological therapy rather than to rigorously understand COPD heterogeneity.
Selected innovative models have recently been proposed to classify COPD patients. They variably rely upon comorbidities and complications of COPD [25][26] or on a comprehensive respiratory function assessment [27]. However, they do not rate symptoms according to a standardized procedure. This likely explains why symptoms played a major classificatory role in ours and not in these models. Thus, two different phylosophies, respectively centered on the non respiratory and the respiratory dimension of COPD, found previous model and the present one. Only the longitudinal observation will allow compare discriminative and prognostic properties of these models. On the other hand, classificatory models including analytes such as cytokines or genetic markers are intended to define idyotypes and not phenotypes and, thus, are not comparable with our prettily clinical and, then, phenotypic model [28]. Also the Spanish clinical classification of COPD patients differs from ours because it simply aims at assessing the perceived prevalence of predefined clinical phenotypes [29]. Given patients were on regular topical therapy for COPD, the important prevalence of night time symptoms and severely impaired health status testifies to the poor quality of the overall pharmacological approach. Analogously, in a European survey from primary care centers two thirds of COPD patients complained of dyspnea despite current treatment [30]; an even higher percentage of subjects suffered from chronic cough, and this was similar across all severity COPD stages. Otherwise, the important clinical differences among clusters suggest that many patients are expected to gain the greatest benefit from non-pharmacological measures contrasting phlegm and mucus. Indeed, being recognized as a multidimensional and heterogeneous condition, COPD should be the object of personalized care.
The fact that the SGRQ contributed to categorize COPD patients further confirms that health status is something different from and, thus, may not be merely expressed by symptoms. Interestingly, dyspnea associates with SGRQ in latent factor 2, as if dyspnea were the main correlate of perceived health status. This partly reflects the structure of the SGRQ, which largely relates the perceived status to physical activities which are obviously limited by dyspnea. Furthermore, this finding stresses the need of including an index of health status among the instruments rating COPD severity and the effects of therapeutic interventions.
Limitations of the study deserve consideration. First, we chose the classificatory variables with the aim of limiting recall bias. This is the reason why we selected the Night-time, Morning and Day-time Symptoms of COPD questionnaire, with a recall period of a week. Indeed, we based the classification on variables reflecting the patient's condition at the time of enrollment in order to prevent any classificatory bias linked to uncertain definition of variables requiring an accurate recall, such as frequency of exacerbations. Second, the definition of a clinical variant or subtype of COPD should be founded on the clinical pattern in the absence of any therapy. However, it is almost impossible to recruit patients with severe COPD free from pharmacological treatment, and those untreated likely represent a poorly representative sample. Third, the observed clinical variants of COPD should not be considered true phenotypes. Instead, phenotypes are intrinsically stable, and only the follow-up will assess cluster stability. Thus, the full spectrum of classificatory properties of our model will emerge only at the end of the follow-up phase of the study, which is currently ongoing. Finally, our patients had mean age of 71.4 years; thus, present results might not fully apply to the growing fraction of COPD patients classified as old (age over 75 years), which pose distinctive diagnostic and therapeutic problems [31].

Conclusions
Despite these limitations, our study emphasizes the clinical meaning of the circadian rhythm of COPD symptoms. It suggests that a classificatory approach based on the respiratory clinical dimension of COPD succeeds in identifying classes of patients with distinctive features. In an era characterized by a growing interest in extra pulmonary features of COPD, which obviously are worthy of consideration, our study redirects the attention of physicians to the elementary truth that COPD is a primarily respiratory disease and an accurate clinical rating of patient's status is the first task required to any physician caring for a COPD patient.
Supporting information S1 Table. Factor analysis output (eigenvalues of the reduced correlation matrix). This table shows a part of the output of factor analysis (eigenvalues of the reduced correlation matrix). (DOC) S2 Table. Cluster analysis output (cluster history). This table shows a part of the output of cluster analysis (cluster history). (DOC) S1 Fig. Factor analysis output (scatter plot of eigenvalues). This figure shows a part of the output of factor analysis (scatter plot of eigenvalues). (DOC) S2 Fig. Cluster analysis output (dendrogram). This figure shows a part of the output of cluster analysis (dendrogram). (DOC)