A quantitative approach for the analysis of clinician recognition of acute respiratory distress syndrome using electronic health record data

Importance Despite its efficacy, low tidal volume ventilation (LTVV) remains severely underutilized for patients with acute respiratory distress syndrome (ARDS). Physician under-recognition of ARDS is a significant barrier to LTVV use. We propose a computational method that addresses some of the limitations of the current approaches to automated measurement of whether ARDS is recognized by physicians. Objective To quantify patient and physician factors affecting physicians’ tidal volume selection and to build a computational model of physician recognition of ARDS that accounts for these factors. Design, setting, and participants In this cross-sectional study, electronic health record data were collected for 361 ARDS patients and 388 non-ARDS hypoxemic (control) patients in nine adult intensive care units at four hospitals between June 24 and December 31, 2013. Methods Standardized tidal volumes (mL/kg predicted body weight) were chosen as a proxy for physician decision-making behavior. Using data-science approaches, we quantified the effect of eight factors (six severity of illness, two physician behaviors) on selected standardized tidal volumes in ARDS and control patients. Significant factors were incorporated in computational behavioral models of physician recognition of ARDS. Results Hypoxemia severity and ARDS documentation in physicians’ notes were associated with lower standardized tidal volumes in the ARDS cohort. Greater patient height was associated with lower standardized tidal volumes (which is already normalized for height) in both ARDS and control patients. The recognition model yielded a mean (99% confidence interval) physician recognition of ARDS of 22% (9%-42%) for mild, 34% (19%-49%) for moderate, and 67% (41%-100%) for severe ARDS. Conclusions and relevance In this study, patient characteristics and physician behaviors were demonstrated to be associated with differences in ventilator management in both ARDS and control patients. Our model of physician ARDS recognition measurement accounts for these clinical variables, providing an electronic approach that moves beyond relying on chart documentation or resource intensive approaches.

A hallmark of healthcare quality improvement is the consistent measurement of an outcome (ex: number of infections, checklist use, etc.). In the case of LTVV use for ARDS, measurement of an outcome is challenging for multiple reasons. First, delivering LTVV is a twostep process comprised of i) recognizing ARDS and ii) selecting and adjusting tidal volumes based on patient response. Both steps in this process can be affected by patient characteristics. [3,13,17,[20][21][22] Second, while previous studies have employed LTVV use or physician documentation of ARDS as surrogates for physician recognition of ARDS, these proxies have de-identified dataset containing all data required to reproduce our findings is available on a public repository managed by Northwestern University: https://doi.org/10.21985/n2-33my-2s89. Requests for the full data set including dates of admission, intubation, mortality, and full ventilator data will require approval of the Northwestern Institutional Review Board and should be submitted to the corresponding author (CWeiss@Northshore.org). Full data will be made available for researchers who meet the criteria for access to confidential data as set forth by the Northwestern Institutional Review Board (https://irb.northwestern.edu/).
limitations. The gold standard for measuring physician recognition of ARDS would be to directly ask physicians if a patient has ARDS. While this approach has been used in the past, [3] it introduces important biases such as the observer effect, subjective reporting, and priming clinicians to think about ARDS. [23][24][25] Furthermore, it is labor and resource intense, making it an infeasible solution for widespread implementation. An alternative method is to collect physician documentation of ARDS from electronic health records (EHR). While this approach is more scalable, physicians may recognize ARDS but not document it in their notes, nor deliver LTVV despite this recognition. We sought to use data science approaches on EHR data to address the above challenges with measurement of physician recognition of ARDS and build an estimate of physician recognition of ARDS that could be widely implemented. We use tidal volume selection as a proxy for physician decision-making behavior and quantify the factors affecting tidal volume selection in both ARDS patients and a novel control cohort of patients who do not have ARDS. We then build two models of physician recognition of ARDS that account for these factors. Our methods not only address the issues of bias and resource availability, but also provide insights into underlying physician behavior that have implications for effective intervention design.

Cohort development
We have previously described the development of the ARDS cohort used in this study, which includes 362 patients who met the Berlin Definition of ARDS [5] (see summary below) via independent clinician review and were admitted to an ICU at one academic and three community hospitals in the Chicago region between June 24, 2013 and December 31, 2013. [16] Berlin Definition Criteria Summary [5] 1 For this study, we developed an additional cohort from the same time period and initial screening population at two of the same hospitals (one academic, one community): 388 patients with acute hypoxemic respiratory failure requiring mechanical ventilation with at least one instance of P a O 2 /F I O 2 � 300 but not with ARDS according to the above Berlin Definition ("control cohort"). We excluded patients with missing key information (predicted body weight [PBW], gender, tidal volumes), intubation duration less than 5.67 hours (the shortest duration of intubation in the ARDS cohort), and PBW less than 25 kg (Fig 1).
Patients were not actively recruited for either cohort, but instead all data was mined from the electronic health record. The ARDS and control cohorts were similar across several clinical and demographic measures (S1 Table). These cohorts are representative of the larger population of patients with ARDS and non-ARDS acute hypoxemic respiratory failure due to our broad inclusion criteria, and their similarity to larger cohorts (e.g., LUNG SAFE [3]) with respect to height, weight, and hypoxemia severity. This study was approved by the Northwestern University Institutional Review Board (STU00208049) with a waiver of consent on October 30, 2018.

Data acquisition
All patient data was obtained from the electronic health records serving the participating hospitals. We defined study entry as the start of ARDS for the ARDS cohort and the first instance of P a O 2 /F I O 2 � 300 in the control cohort. Study end was defined as the earlier of extubation, death, or ICU discharge. We recorded gender, height, and all P a O 2 /F I O 2 and weights between ICU admission and study end. We recorded all tidal volumes (V T ) and plateau pressures (P plat ) between intubation and study end where available (P Plat was not recorded at two hospitals). Note that 22% of the ARDS cohort and 44% of the control cohort only had one unique tidal volume over their study duration. We recorded which ICU the patient was treated in and whether an ARDS diagnosis was documented in the critical care physician's notes. For the control cohort, we recorded whether or not bilateral infiltrates were present for all chest radiographs or computed tomography scans between intubation and study end. [16] For data availability for both cohorts and all subgroups, see S2 Table. Patients who met cohort inclusion criteria but were missing other data points were only excluded from analyses that required those missing data points.
In this study, we use PBW as a gender-adjusted and gender-neutral measurement of height because LTVV thresholds are defined using PBW. Any references to patient height refer to PBW (kg) and any references to patient weight refer to a patient's weight measured at ICU admission (kg). We calculated PBW according to the ARDS Network definition (see below) and defined LTVV as a standardized tidal volume (V T ) � 6.5 mL/kg PBW.

Significance testing
We used α = 0.01 instead of 0.05 to ensure the statistical strength of our findings [26] and applied the Bonferroni correction for multiple hypotheses. In the regression analyses (see Results), there were 33 comparisons whereV T was the dependent variable, thus we set p < 0.0003 (0.01/33) as the threshold for statistical significance for these analyses. For the covariate analyses, the threshold was p < 0.005 (0.01/2). For the Kolmogorov-Smirnov tests in Model Approach #2, the threshold was p < 0.003 (0.01/3).

Analysis of potential factors in tidal volume selection
Factors assessed. We used the lowest standardized tidal volume (V T ) (mL/kg PBW) for each patient as the dependent variable in both univariable and multivariable ordinary least squares (OLS) regressions.V T was used as a continuous variable. OLS regressions were implemented using the statsmodels (version 0.6.1) Python package.
We determined the relationship between several factors andV T , choosing variables that have been identified previously in the literature as potential barriers or facilitators of LTVV use [11,[18][19][20][21]27]: first P a O 2 /F I O 2 � 300, lowest P a O 2 /F I O 2 , highest P plat , patient weight at ICU admission, ARDS documentation in the patient chart, presence of bilateral infiltrates (control only), admitting ICU (ARDS only), and patient height (we used the gender neutral PBW). These factors comprise measures of illness severity (P a O 2 /F I O 2 , P plat , radiographic findings), patient characteristics (height, weight), and physician behaviors (ARDS documentation, patient weight). Plateau pressure was included due to the previously reported practice of physicians not lowering tidal volumes in ARDS patients when P plat � 30 cm H 2 O. [19,22] Patient weight was included due to the previously reported barrier of physicians using actual body weight instead of predicted body weight in the LTVV threshold calculation [19]. Note that we use a standardized tidal volume (V T ) as opposed to the recorded tidal volume (V T ) and PBW is included as a control variable. SinceV T is already normalized for PBW, we expected no additional remaining relationship between PBW andV T . Input variables were rescaled between 0 and 1 to allow for comparison of coefficients.
Univariable analysis. The relationship between each factor andV T was investigated through univariable OLS regressions for the ARDS and control cohorts (Table 1). Standardized tidal volume (V T ) decreased with worsening hypoxemia (lower P a O 2 /F I O 2 ) and documentation in the ARDS cohort (p < 0.0003), but not in the control cohort (Fig 2, Table 1). In both cohorts,V T decreased with increasing PBW (gender neutral height, p < 0.0003, Fig 3 and Table 1)-a surprising result sinceV T already takes PBW into consideration. Plateau pressure, weight at ICU admission, P a O 2 /F I O 2 at study start, admitting ICU (ARDS cohort only), and the presence of bilateral infiltrates (control cohort only) were not associated with significant changes in standardized tidal volume in any cohort or subgroup (Table 1, Fig 4).
Covariate analysis. The factors demonstrating a significant association withV T in the univariable analyses were evaluated for covariance with each other using OLS regression. Three factors were evaluated for covariance: PBW, lowest P a O 2 /F I O 2 , and documentation of ARDS. PBW was not associated with increasing documentation probability (Fig 3) in both cohorts, which was anticipated. Documentation and lowest P a O 2 /F I O 2 were significantly correlated (p < 0.005) in both cohorts (Fig 2). This association was also anticipated as sicker patients are easier to recognize. To test the strength of the documentation and lowest P a O 2 / F I O 2 association, we repeated the univariable analysis on the three major subgroups (ARDS non-documented, control non-documented, and pooled documented) (Fig 4). Only PBW was associated with lowerV T in all three subgroups (Table 1, S3 Table). There was no association between PBW and lowest P a O 2 /F I O 2 in both cohorts.
Multivariable analysis. Significant factors from the univariable analyses were included in multivariable regressions comprised of all possible linear combinations of the factors and appropriate interaction terms (see S1 Methods). The "best" multivariable model was selected using the Akaike and Bayesian Information Criterion (AIC, BIC).  In the ARDS cohort, the multivariable regression model that included PBW, lowest P a O 2 / F I O 2 , and documentation as independent variables with no interaction terms resulted in the lowest AIC and BIC (S4 Table). In this model, PBW and documentation were significantly correlated withV T (p < 0.0003), while lowest P a O 2 /F I O 2 was not. Of these variables, PBW had the greatest effect onV T (β -3.7, 99% CI -4.8 --2.7). For the control cohort, only PBW was associated withV T , and therefore no multivariable analysis was performed.
Sensitivity analyses. To test the robustness of our cohort definitions, we conducted two sensitivity analyses: 1) patients with a study duration longer than 12 hours, and 2) patients within the 2.5-97.5 percentiles of PBW. The first sensitivity cohort is intended to capture clinician behavior, which may require longer time scales, such as a shift change and/or patient rounds. The second sensitivity cohort aimed to evaluate a potential disproportionate effect of PBW outliers on linear trends. Neither sensitivity analyses yielded any difference in the regression results.

Construction of model of and estimation of physician recognition of ARDS
Sixty patients-44 (12.2%) in the ARDS cohort and 16 (4.2%) in the control cohort-had a documented diagnosis of ARDS by at least one physician. While documentation of ARDS implies that the physician recognized ARDS, lack of documentation does not imply that a physician did not recognize ARDS. Beyond their mere documentation practices, a physician's clinical management behavior (e.g., selecting a tidal volume) may shed additional light on whether a physician believes that a patient has ARDS.
We used two approaches to more completely characterize physician recognition of ARDS. To this end, we split the two main cohorts (ARDS and control) into three major subgroups: 1) ARDS non-documented (n = 317), 2) Control non-documented (n = 371), and 3) Pooled documented (n = 60, Fig 4). All patients in the pooled documented cohort have a physician-documented diagnosis of ARDS in their chart. All patients in the non-documented cohorts do not have a physician-documented diagnosis of ARDS in their chart. Our two approaches are based on the assumption that the physician behaviors observed in the ARDS non-documented subgroup represent a mixture of patient-care scenarios in which patients are either recognized by their physician as having ARDS or not recognized as having ARDS. If a patient in the ARDS non-documented subgroup was recognized by their physician as having ARDS, we assume the physician tidal volume selection would be the same as the tidal volume selection seen in the pooled documented subgroup. If a patient in the ARDS non-documented subgroup was not recognized as having ARDS, we assume the physician tidal volume selection would be the

Effects of predicted body weight (gender neutral height) on standardized tidal volume (V T ) and ARDS documentation in ARDS and control cohorts.
Top panels show patients with ARDS documented in their chart (purple diamonds) and non-documented patients (tan circles). Gray areas represent LTVV range from current guidelines [7], with dashed line at 6.5 mL/kg PBW at current recommended threshold. Solid lines show linear (V T ) and logistic (documentation) fits for scatter plot data (shaded regions, 95% confidence bands). Reported beta coefficients are for standardized inputs. � p < 0.0003. https://doi.org/10.1371/journal.pone.0222826.g003 Quantitative measurement of physician recognition of acute respiratory distress syndrome same as for control non-documented subgroup. Therefore, the non-documented ARDS subgroup patients can be viewed as a mixture of the pooled documented subgroup and the nondocumented control subgroup.
Approach #1: Naïve Bayes. We used a Naïve Bayes model for classifying patients in the non-documented ARDS subgroup as either recognized or unrecognized by their care teams. We used multivariate kernel density estimation (KDE) to characterize the PBW vsV T clusters for the pooled documented and non-documented control subgroups (Fig 5, top panels). Classifying a patient in the non-documented ARDS subgroup as recognized or unrecognized was based on the following conditional probabilities leveraging Bayes Theorem: In the absence of a reasonable prior for P(documented) and P(control), we assign each term 0.5, assuming equal probability of belonging or not belonging to each subgroup. We were able to define a boundary in the PBW vsV T space where P(documented | PBW,V T ) = P(control | PBW,V T ) (Fig 5, top panels, black line). Below this boundary, P(documented | PBW,V T ) is greater than P(control | PBW,V T ) and the patient was classified as 'recognized'. Above this boundary, P(documented | PBW,V T ) is less than P(control | PBW,V T ) and the patient was classified as 'unrecognized'. Due to the size discrepancy between the non-documented control and pooled documented subgroups, we bootstrapped (100 iterations) the "non-documented" control subgroup and repeated this analysis to produce confidence bands (S1 Fig). The KDE clusters for the pooled documented and control non-documented subgroups as well as the estimated probability equality line are shown in Fig 5. The peaks of male and female PBW (gender neutral height) frequency (Fig 5, bottom panel) align with the two peaks in the pooled documented subgroup (Fig 5, middle panel). Physician recognition of ARDS calculated for each ARDS severity category was: mild, 26%; moderate, 32%; severe, 57% (Table 2).
Model #2: Mixture Model. In the second model, we incorporateV T , hypoxemia severity (lowest P a O 2 /F I O 2 ), and PBW with the goal of calculating the fraction of recognized patients Quantitative measurement of physician recognition of acute respiratory distress syndrome in each Berlin Definition ARDS severity category (mild, moderate, and severe). [5] To calculate physician recognition of ARDS, we estimated the fraction of patients recognized by physicians in each severity category (f i recognition ) from the following set of equations: where severity can take the values "mild," "moderate," or "severe," as set forth in the Berlin Definition [5] ( Table 2) and we defined the difference between the probability density functions as the L1 norm: where the sum extends over all bins for values ofV T and PBW. We determined the optimal value of f i recognition by minimizing Δ. Since the corresponding optimization problem is formulated as a linear programming problem, we used CPLEX (version 12) as a solver. To determine the uncertainty in our estimates of f i recognition , we used bootstrapping to generate 1000 samples for P ARDS (V T , PBW | severity) and repeated the optimization for the bootstrapped samples. As a result, we generated distributions for the optimal value of f i recognition for each hypoxemia severity category and tested the null hypothesis that these data were drawn from the same distributions with a Kolmogorov-Smirnov test (Python package scikit-learn (version 0.18.1)).

Discussion
We quantified the potential impact of patient characteristics and physician behaviors on the decision-making behavior for tidal volume selection by physicians for patients with ARDS and a novel control cohort. This quantification allowed for the construction of a model to measure physician recognition of ARDS that is not confounded by these patient characteristics and clinical factors. These analyses have allowed us to establish several important findings.
First, we corroborated prior studies' findings that height, hypoxemia severity, and ARDS documentation are associated with the use of lower tidal volumes in ARDS patients. [3,17,[19][20][21]27] We found no evidence for an association between other clinical factors-such as plateau pressure or patient weight-and lower tidal volume use, which have been identified as potential barriers to LTVV use in prior studies. [11,17,19,21] These barriers may still have an impact at the level of the individual physician, but the lack of generalizability to the entire physician population makes them suboptimal for future intervention targets.
Second, our analyses provide additional insight into the previously established relationships between patient height and LTVV use. [21,27] The most common lowest V T reported in the ARDS and control cohorts were identical (450, 500, and 600 mL), and constitute 51% and 63% of the tidal volumes for the ARDS and control cohorts, respectively. This prevalence of a small documented (below line) and non-documented control (above line). (Bottom panel) Normalized gender frequency across PBW for combined patient population of documented and control non-documented. Male and female peaks align with high density regions in above heatmaps. https://doi.org/10.1371/journal.pone.0222826.g005 Quantitative measurement of physician recognition of acute respiratory distress syndrome number of lowest V T suggests that clinicians are not following the canonical relationship between height and lung size originally established in animal studies [28], but instead use a simpler heuristic based on where the patient falls on the height spectrum of their particular gender. This theory is supported by the idea that humans select fast and frugal heuristics under time and knowledge limitations [29], which would both be present in clinical medicine and heightened in critical care. The utilization of this heuristic would translate to a general use of a lower standardized tidal volume (V T ) for taller patients that is closer to or, in some cases, below the LTVV threshold; which would lead to our observation of the strong relationship between PBW andV T , despite thatV T already includes PBW in its calculation. Our findings are strong evidence that at least some delivery of LTVV may be unintentional-i.e., solely of a default V T (450, 500, or 600 mL)-and not based on ARDS recognition or other clinical decision-making factors. While evidence for this physician behavior phenomenon has been previously reported in ARDS patient cohorts [17,27], our findings observe this behavior in a diverse control cohort, implying that the simpler tidal volume selection heuristic use is not restricted to ARDS patients alone.
Alternative explanations for the association between height and LTVV use in both cohorts include some physicians believing in LTVV for patients in the control cohort or some of those patients being classified by physicians as having ARDS. Supporting the latter possibility, 4.2% of control cohort patients had a physician-documented diagnosis of ARDS. Nonetheless, these alternative explanations are less likely because of the low ARDS documentation rate, low use of LTVV in both cohorts, and the strong correlation between PBW (gender neutral height) andV T in both cohorts. Another alternative explanation is physicians using a non-linear relationship between tidal volume and PBW, but this is less likely given the low variability in chosen tidal volumes in both cohorts. Our results suggest that the relationship between PBW (gender neutral height) andV T should be accounted for when measuring LTVV use and when designing implementation strategies to improve LTVV use.
Third, our estimate of physician recognition of ARDS for severe ARDS was comparable to previous studies [3], while our estimated physician recognition of ARDS rates for mild and moderate ARDS were lower. We believe our estimated rates are more plausibly representative of real-world practices. Unlike prior studies, the estimated physician recognition of ARDS rates in our study do not rely on subjective reporting [18,19], the observer effect [3,9,10,[13][14][15], or imposed interventions such as additional training of physicians in ARDS recognition or LTVV use. [3] Our results suggest that the potential impact of these biases was limited to mild and moderate ARDS and was neutralized by the presence of severe ARDS. This finding has implications for the selection of implementation strategies: while the mere presence of severe hypoxemia may be enough to trigger physician recognition of ARDS, additional prompting or training is required to improve recognition of mild or moderate ARDS. Quantitative measurement of physician recognition of acute respiratory distress syndrome

Limitations
Our study has several limitations. First, it was conducted in a single metropolitan area, so we were unable to address regional or national differences. Second, we were limited to the patient data recorded in the EHR, which may be overlooked by physicians in lieu of other information, such as a visual estimation of height. [30] Third, we did not evaluate physician knowledge of ARDS or LTVV, specifically the Berlin criteria and what standardized tidal volume threshold they believe qualifies as LTVV. Alternative LTVV thresholds may be justified by the layout of the ARDS Network tidal volume table, which appears to suggest that tidal volumes ranging from 4 to 8 mL/kg PBW qualify as LTVV. [5] Finally, we acknowledge that it is possible that our application of the Berlin definition may have been biased, leading to misclassification of ARDS or control status-this could also explain why some patients classified as control were documented by their physicians as having ARDS.

Conclusions
Our findings could have implications for the design of implementation strategies to improve LTVV use. First, we believe documentation and physician recognition of ARDS should be unlinked. Whereas previous studies rely on documentation of ARDS as the sine qua non of physician recognition of ARDS [11,13,16,20], there are likely to be patients whose physicians recognized ARDS but did not document it. Our study demonstrates two novel approaches for estimating physician recognition of ARDS that consider additional behaviors beyond documentation (e.g., tidal volume selection), providing a more complete characterization of physician recognition of ARDS. Implementation strategies-which commonly rely on behavior change-should account for the multiple facets involved in physician recognition of ARDS, and multiple channels of data required to measure recognition.
Second, our approach provides a measurement of physician recognition of ARDS that is not subject to confounding by patient characteristics and clinical factors. This approach could be integrated into EHR systems to evaluate an arbitrary number of physicians and sites, which will allow for the comparison of physician recognition of ARDS not only between individuals and institutions, but also over different points in time. This methodology could be used to accurately drive interventions, like clinical decision-support or feedback, to create an even stronger structure for improving the adoption of evidence-based practices. [31,32] This study offers a compelling example of how data science methods can use EHR data to provide new opportunities for measuring and addressing quality of patient care, specifically in a complex setting such as critical care. [33] While traditional implementation studies allow a broad analysis of the situation as a whole, they are constrained by high costs, logistical complexity, and potential for bias. Our approach achieved similar, and potentially more representative results, while minimizing these disadvantages. Furthermore, our methods may be implementable and sustainable on the local level, providing individual institutions the opportunity to continuously assess or track evidence-based medicine implementation.
Supporting information S1 Methods. Multivariable model selection.
(DOCX) S1  Table. Multivariable models of lowest standardized tidal volume (mL/kg PBW) in ARDS cohort. (TIF) S1 Fig. Naïve Bayes boundary between recognized and unrecognized regions with 95% confidence intervals from bootstrapping. Scatter plot shows pooled documented patients (purple diamonds) and control non-documented patients (tan circles). Size of marker represents number of data points. Solid line shows boundary separating region with unequal probability of belonging to documented (below line) and non-documented control (above line) with 95% confidence bands from bootstrapped data (shaded region).