A machine learning approach to predict extreme inactivity in COPD patients using non-activity-related clinical data

Facilitating the identification of extreme inactivity (EI) has the potential to improve morbidity and mortality in COPD patients. Apart from patients with obvious EI, the identification of a such behavior during a real-life consultation is unreliable. We therefore describe a machine learning algorithm to screen for EI, as actimetry measurements are difficult to implement. Complete datasets for 1409 COPD patients were obtained from COLIBRI-COPD, a database of clinicopathological data submitted by French pulmonologists. Patient- and pulmonologist-reported estimates of PA quantity (daily walking time) and intensity (domestic, recreational, or fitness-directed) were first used to assign patients to one of four PA groups (extremely inactive [EI], overtly active [OA], intermediate [INT], inconclusive [INC]). The algorithm was developed by (i) using data from 80% of patients in the EI and OA groups to identify ‘phenotype signatures’ of non-PA-related clinical variables most closely associated with EI or OA; (ii) testing its predictive validity using data from the remaining 20% of EI and OA patients; and (iii) applying the algorithm to identify EI patients in the INT and INC groups. The algorithm’s overall error for predicting EI status among EI and OA patients was 13.7%, with an area under the receiver operating characteristic curve of 0.84 (95% confidence intervals: 0.75–0.92). Of the 577 patients in the INT/INC groups, 306 (53%) were reclassified as EI by the algorithm. Patient- and physician- reported estimation may underestimate EI in a large proportion of COPD patients. This algorithm may assist physicians in identifying patients in urgent need of interventions to promote PA.


Introduction
Patients with chronic obstructive pulmonary disease (COPD) are known to be substantially less physically active than age-and sex-matched healthy subjects [1]. Several studies have shown that low physical activity (PA) levels are associated with poor prognosis in COPD patients [2,3], yet pulmonary rehabilitation programs that incorporate endurance and strength training have shown significant benefit in this patient population [4]. Thus, accurate identification of the true PA status is a crucial factor in ensuring that the least active patients, who would be expected to derive the greatest benefit from PA, can be encouraged to become more active and/or referred to a rehabilitation program.
Several methods have been devised to assess and quantify PA levels in patients with various respiratory diseases. In particular, accelerometers can be worn over several days to analyze the full range of different activities and their distribution over time. Data from such devices have generally correlated well with assessments of daily metabolic expenditure, as measured using the doubly labeled water method, and accelerometers are also sufficiently sensitive to detect low levels of PA in COPD patients [4]. These quantitative studies have estimated that approximately 26%-30% of COPD patients are physically inactive and exhibit sedentary behavior, both of which are independently associated with an increased risk of morbidity and mortality [3,5,6]. However, accelerometry requires considerable cost, time, and effort commitments on the part of the patient and physician, and it is generally considered impractical for routine clinical use. At the same time, clinical interviews and patient questionnaires alone cannot accurately determine the patient's true PA level [7]. To improve this situation, the PROactive consortium proposed that a combination of questionnaires and accelerometric measurements be used to assess the behavior of COPD patients [8,9]. Nevertheless, this approach does not eliminate the drawbacks of accelerometry, and therefore does not resolve the primary clinical concern, which is to accurately and objectively detect extreme inactivity (referred to hereafter as EI) in patients whose PA status initially presents as unclear or equivocal [10,11]. Although such patients may be identified during consultation with experienced practitioners, it is likely that a significant percentage of EI patients fall under the radar of clinical vigilance, which most often focuses on respiratory function. Given the proven benefit of pulmonary exercise programs in COPD patients, we therefore sought to develop a predictive algorithm that can reliably detect EI patients, who might most benefit from interventions such as pulmonary rehabilitation programs.
We hypothesized that certain physiological and clinical variables may be more frequently observed (through cause or effect) among patients at the extreme ends of the PA spectrum (i.e., EI and overtly active [OA] patients), and that such 'phenotype signatures' composed of non-PA-related variables could be used to develop the predictive algorithm.

Patients and data collection
This was a retrospective analysis of data submitted to the COLIBRI-COPD database [12,13], which has been authorized by the French national commission on personal data privacy (Commission Nationale de l'Informatique et des Libertés, CNIL, #2013-526). The requirement for written consent was waived in this observational study in accordance with French law. Patients provided oral informed consent to their physician. At the time of the analysis, data were available from 5035 initial consultations for COPD patients (Fig 1). We selected 1409 patients with comprehensive information on 22 specific variables (see Table 2) in the areas of anthropometry, smoking habits, resting pulmonary function, comorbidities, exacerbations during the preceding year, Global Initiative for COPD (GOLD) ABCD classification, and self-reported questionnaires: the modified Medical Research Council dyspnea scale (mMRC) [14] and Disability related to COPD Tool (DIRECT), both of which assess dyspnea [15,16]; the COPD Assessment Test (CAT), which assesses quality of life [17]; and the Hospital Anxiety and Depression Scale, which separately assesses anxiety and depression [18,19].

Construct of the predictive machine learning
We first categorized a cohort of COPD patients into one of four activity levels based on the patient's own estimates of their PA (daily walking time) and the physician's estimates of the patient's PA intensity level (domestic, recreational, and fitness-directed). We then tested existing machine learning processes already in use for predicting disease outcomes using routine clinical data [20,21], and trained the model to identify an EI signature using clinicopathological data from a subset (80%) of patients in the EI and OA categories. After training, we tested the algorithm's predictive validity on the remaining 20% of patients in the EI and OA categories, and then evaluated its ability to detect EI patients in the intermediate (INT) or inconclusively determined (INC) PA categories.

Definition of PA categories
Assignment of patients to PA categories was based on physician estimates of the predominant intensity level of the patient's daily PA: domestic (D, in-home activities), recreational (R, mostly outside the home), or active (A, devoted to maintaining physical fitness) and patient estimates of the average daily walking time outside the home (including weekends): <10 min, 10-30 min, 30-60 min, and >60 min. Based on these criteria, we constructed a 3 × 4 table to identify four main PA categories: (i) least active (EI, n = 172); (ii) most active (OA, n = 660); (iii) intermediate activity level (INT, n = 410), which had three subcategories (a, b, and c); and (iv) incompatible (INC, n = 167), which had four subcategories (a, b, c, and d) and consisted of patients whose self-reported and physician-reported activities were considered conflicting  Table 1

Predictive statistical methods
The predictive machine learning method was developed in five steps. (i) We first verified that the EI variable and its variability correlated well with a set of continuous and categorical variables. Then, we performed an explanatory canonical discriminant analysis of mixed data followed by a scree plot to select the statistically significant canonical variables to be used in more elaborate individual predictive models. After this step, a reduced rank display (S1 Fig) showed that two canonical discriminant projections accounted for 98.6% of the variation between categories, of which 95.8% concerned EI and OA, while the projection of INT and INC on the two canonical directions was very slight. (ii) Based on this, we opted to develop an algorithm focused on individual prediction of the two most extreme categories; EI (n = 172) versus OA (n = 660). The predictive model was developed using an ensemble regression and classification algorithm [22] with a version for balancing error in unbalanced data (weighted random forest, WRF). To account for random effects, such as the physician identity or study center, we also combined the random forest methodology with generalized linear mixed models using the binary mixed model (BiMM) forest algorithm [23]. (iii) Data from the 832 patients in the EI and OA groups were randomly selected; of these, we used data from 666 patients (80%) to develop the model and data from the remaining 20% (166 patients) to assess its accuracy (i.e., predictive error). (iv) In the next step, we addressed the imbalance in our final prediction using a recent hyper-ensemble of SMOTE under sampled random forests (HyperSMURF) method, which is based on resampling techniques and a hyper-ensemble approach (S2 Fig). (v) Finally, once validated, the algorithm was applied to patients in the combined INT and INC subcategories.
Descriptive results are presented as mean ± standard deviation. The performance of the algorithm for predicting EI and OA is expressed as overall error, weighted accuracy, true negative value, true positive value, and sensitivity. Additional performance measurements included area under the precision and recall curve (AUPRC) and area under the receiver operating characteristic curve (AUROC). Table 1 shows the distribution of the 1409 patients into four categories and 12 subcategories according to the combination of patient and physician estimates. The reference category EI (n = 172) was composed of patients with the lowest duration and intensity PA level (subcategory D and <10 min walking/day), whereas the OA category (n = 660) included the most active patients (subcategory R or A and >30 min walking/day). Patients who spent short times (�30 min) in daily activities were referred to as the INT group (n = 410) and were subcategorized as a, b, or c, depending on the physicians' estimate of the activity intensity (Table 1). Finally, patients whose self-and physician-reported subcategories were incompatible were referred to as the INC group (n = 167) and were further assigned to a, b, c, or d groups based on the time and intensity. The seven categories encompassed by INC a-d and INT a-c together account for about 40% of the total cohort, highlighting the need for a tool to more accurately assess daily PA.

Descriptive results
After validation and predictive validity testing (see next section), we applied the algorithm to patients in the full cohort as well as the INC and INT categories and determined the number of patients who were identified by the algorithm as having the EI phenotype (  Fig 2 shows a comparison of selected anthropometric and behavioral characteristics (continuous variables) stratified by our PA categories or the GOLD ABCD 2017 categories. Of note, the symptom-related variables (mMRC, DIRECT, and CAT scores) logically discriminate between patients according to the GOLD ABCD classification, but they overlap the PA categories, indicating that these questionnaires individually have a poor ability to predict PA level. As shown in Fig 3, this possibility was confirmed by the large overlap between not only continuous variables (DIRECT score, CAT score, age, body mass index) but also categorical variables (age, sex, exacerbation, and GOLD ABCD) for patients in the EI, INT, INC, and OA categories, consistent with their poor individual ability to predict EI status. Table 3 shows the analysis of the predictive algorithm performance using several classifier methods. The BiMM and WRF results did not differ significantly, suggesting that the prediction was independent of the physician who collected the data and the practice setting. This assertion was further checked by performing a panel data analysis on the clustered data and testing the hypothesis of presence of random effects. This analysis yielded a p value of 0.0069, thus supporting a fixed effects model (i.e., a random forest prediction without random effects). Overall, the AUPRC indicates that the HyperSMURF algorithm achieved significantly better sensitivity than WRF or BiMM for predicting EI, with little deterioration in the sensitivity of the OA classification. As an example, Fig 3 shows

PLOS ONE
Predict inactivity using non-activity-related data can be seen, only the higher scores (mMRC �3, CAT >30, DIRECT >23) are associated with a probability of EI >0.5. The strength of our predictive model is also confirmed by the corresponding ROC curves (Fig 4). Although the differences between the WRF and HyperSMURF predictions, as measured by the AUROC, are not large, AUPRC is considered to be more informative than AUROC for imbalanced data [24]. Finally, we applied our predictive algorithm process to the INT and INC subcategories. Table 1 shows that about half of the patients were predicted to be EI; specifically, 54% and 41% in the INT and INC categories, respectively.

PLOS ONE
Predict inactivity using non-activity-related data S1 Table shows the distribution of the GOLD 2011 and ABCD classifications within the PA categories.

Discussion
The main contribution of this study is to demonstrate the predictive validity of an algorithm for predicting the least active COPD patients from information available in routine pulmonologist practice independently of PA-related measures. The originality and strength of our algorithm lies in its ability to predict EI in patients whose PA level is equivocal or unclear based on the patient's and physician's opinions, thus bringing to light the precise subgroup of COPD patients who are most in need of increased PA. Depending on the options available to the referring pulmonologist, this algorithm will help in deciding the optimal next step for each patient; whether that is accelerometry, as proposed by the PROactive consortium, 8 referral to supervised rehabilitation [25], and/or simply encouraging the patient to participate in social activities that include PA [26].

Selection of machine learning methods
In the present study, we demonstrate that a specific random forest machine learning algorithm, which we refer to as the EI algorithm, is effective in predicting the EI or OA status of COPD patients. In addition, the algorithm has the potential to automatically detect the most informative predictors of EI by excluding many irrelevant confounding factors that influence both the dependent variable (EI or OA) and independent variables (explanatory variable), thus causing a spurious association. The EI algorithm outperforms traditional multiple linear/logistic regression models by unmasking predictive potential not apparent in a linear model. We  could also have considered using a Bayesian machine learning framework to develop a prediction procedure and simultaneously identify promising subsets of relevant predictors. While the Bayesian framework may have achieved equivalent predictive performance, it would have  Table 1 for definitions of activity categories.
https://doi.org/10.1371/journal.pone.0255977.g003 required a large number of assumptions on independent variables and many successive statistical checks, making it much more difficult to interpret. Because of this complexity, we opted for a frequentist framework that markedly reduces the number of mathematical steps and their validation and obtains a level of predictive validity acceptable for its intended clinical use. Our results confirm that the EI algorithm possesses two critical features of a predictive model: the agreement between observed probabilities and predicted probabilities (i.e., calibration) and the ability to clearly distinguish between categories (i.e., discrimination). Thus, for the intended purpose of guidance in clinical decision-making, our EI algorithm provides an acceptable balance between a high rate of true positives (correctly identified patients) and a low rate of false positives (incorrectly identified patients). As with any predictive algorithm designed to assist in medical decisions, the EI algorithm should be considered a contributing tool that takes into account the potential impact on the patient's health.

Decision-making process and machine learning
Matching these predicted probabilities with a 0-1 classification, by choosing a threshold above which a new observation is classified as 1 versus 0, is no longer part of the statistics. It is part of the decision-making process that integrates other contingencies or issues than the probabilistic results of the model. Practitioners may ask several pertinent questions that could influence this threshold. For example, will a binary categorization (EI and OA) negatively affect patient care compared with a more detailed determination of daily PA behavior? If so, in what way will it affect care, especially with respect to the design of individualized pulmonary rehabilitation programs or personalized recommendations? Like any diagnostic method implemented in the clinical decision-making process, the predictive validity of the information and the operational impact of the level of precision must be evaluated. This echoes some points raised by Faner and Agusti [27], who questioned the practical use of conclusions based on clustering studies for identifying a clinical phenotype predictive of mortality for a single patient. In that case, the issue was whether a complex analytical approach-as opposed to common sense-was really necessary to know that patients with severe airflow limitation and comorbidities would have a poor prognosis.
In real-life practice, the purpose of the EI algorithm would be to alert the physician to the probability of a new patient having EI or OA status. This is particularly important because only a minority of patients who are eligible for pulmonary rehabilitation actually derive benefit [28], partly because the referring pulmonologist may be unaware of the patient's true EI status, which may be sufficiently poor as to predispose them to failure. In support of this, our results suggest that the most extreme inactivity (i.e., EI) is largely underestimated in routine consultations. Indeed, application of the EI algorithm increased the proportion of the total population with EI status from 12.2%, detected by the patient and physician estimates, to 21.7%. Our results compare favorably with those reported by Schneider et al. [5], who examined daily PA in COPD patients using accelerometry. The detailed analysis of the kinetics and intensity of PA by those authors found that 49% (n = 67) of patients could be defined as "active and nonsedentary" and 26% (n = 35) as "non-active and sedentary", which compare with 46.8% OA (n = 660) and 34% EI (n = 478) in our study. Nevertheless, further comparisons between studies based on accelerometry measurements and machine learning using non-PA data are beyond the scope of this analysis.

Limitations and strengths
The method we have proposed to define EI status may seem too simplistic compared with objective measurements from accelerometry. Our definition was based on two assumptions: that employing both patient-and physician-derived information would compensate for any imprecision resulting from subjectivity; and that EI status could be predicted from routine clinical data (e.g., behavioral, psychological, symptomatic) that are causes and/or consequences of extreme inactivity. It is important to note that whether the EI status used here would be exactly the same as one derived from accelerometry is ultimately not a crucial factor.
The most important intended use of the algorithm is to enable patients with genuine EI status to be identified when the clinical data are equivocal. The best illustration that our assumptions were acceptable is the accuracy of prediction with the test sample of EI and OA patients (n = 166), which had a modest predictive error of 13.7%. Another limitation is that we did not perform accelerometry of the 306 patients with intermediate PA levels who were reclassified by our algorithm as EI. However, various studies have reported that between 10% and 20% of data are routinely missing from accelerometry studies (incomplete measurements or any other reasons) and the patient number included per study rarely exceeds about 100. In addition, considering that >200 pulmonologists from throughout France contributed data to the EI algorithm, any attempt to perform comparative accelerometry would undoubtedly have resulted in an even higher rate of lost or unusable data. We propose that the predictive validity of our predictive algorithm will increase as the size and diversity of the COLIBRI-COPD database increases. Moreover, the addition of new variables to the EI algorithm is technically possible, because the machine learning approach developed for the algorithm is an evolutionary and adaptable process. Examples of potentially influential variables for predicting EI status are psychological and social vulnerability, and regional climate and pollution data [9]. The addition of physiological data, such as functional exercise capacity (6-minute walk test, chair-rising test, grip strength, pedometer readings) could also be valuable, even though these parameters have been shown to be individually unreliable for identifying patients with extremely inactive lifestyles [11].

Interpretation
In conclusion, we report that a predictive machine learning algorithm, developed from routine clinical data collected during online consultations, can identify EI status among patients with all stages of COPD severity. Integration of this algorithm within online consultations via an R-Shiny-python interface [29] could alert the clinician to the frequently overlooked patients who urgently require intervention to promote PA. Thus, it is our hope that the approach proposed here will advance the field of medical decision-making and move it further towards the holy grail of predictive and personalized medicine for COPD patients.
Supporting information S1 Fig. 2D plot of the first two canonical discriminant variables accounting for the greatest variation between physical activity categories (red) relative to error. The two dimensions account for 98.6% of the variance between categories, most (95.8%) of which is due to EI versus OA. The latter is mainly influenced by FEV 1 /FVC and the former by CAT, DIRECT, and mMRC scores.  Table 1 for definitions of activity categories. (TIFF) S1