Acute respiratory distress syndrome after SARS-CoV-2 infection on young adult population: International observational federated study based on electronic health records through the 4CE consortium

Purpose In young adults (18 to 49 years old), investigation of the acute respiratory distress syndrome (ARDS) after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been limited. We evaluated the risk factors and outcomes of ARDS following infection with SARS-CoV-2 in a young adult population. Methods A retrospective cohort study was conducted between January 1st, 2020 and February 28th, 2021 using patient-level electronic health records (EHR), across 241 United States hospitals and 43 European hospitals participating in the Consortium for Clinical Characterization of COVID-19 by EHR (4CE). To identify the risk factors associated with ARDS, we compared young patients with and without ARDS through a federated analysis. We further compared the outcomes between young and old patients with ARDS. Results Among the 75,377 hospitalized patients with positive SARS-CoV-2 PCR, 1001 young adults presented with ARDS (7.8% of young hospitalized adults). Their mortality rate at 90 days was 16.2% and they presented with a similar complication rate for infection than older adults with ARDS. Peptic ulcer disease, paralysis, obesity, congestive heart failure, valvular disease, diabetes, chronic pulmonary disease and liver disease were associated with a higher risk of ARDS. We described a high prevalence of obesity (53%), hypertension (38%- although not significantly associated with ARDS), and diabetes (32%). Conclusion Trough an innovative method, a large international cohort study of young adults developing ARDS after SARS-CoV-2 infection has been gather. It demonstrated the poor outcomes of this population and associated risk factor.

ARDS has a severe impact on patient outcomes. In a cohort study carried out in New York City on COVID-19 patients, the mortality of ARDS patients reached 39% [4]. ARDS has been frequently associated with long-term disabilities [6][7][8][9][10] and represents a heavy care burden for health systems [11] due to long ICU stays and extended rehabilitation [7,9].
Age is an important risk factor for developing ARDS [3]. However, young adults (18-49 years old) represented a third of hospitalized patients [12] and a quarter of patients admitted to the ICU [4]. Based on the Premier Healthcare Database, which includes 1,030 hospitals in the United States, Cunningham et al. [13] reported that 21% of young adults (aged 18 to 34 years) hospitalized with COVID-19 disease were admitted to the ICU and 10% required mechanical ventilation. Similarly, in a separate cohort, young adults represented more than 20% of the patients admitted to ICUs for COVID-19 infection with ARDS [3]. Few studies [12][13][14][15] have investigated the young adult population, mostly were single-center analyses, all exclusively in the U.S. population and none focused on ARDS patients. To our knowledge, there have been no specific studies on ARDS after SARS-CoV-2 infection in the young adult population among an international cohort. This may be due to the difficulty in obtaining a large sample of this population. Key questions remain related to the risk factors of ARDS in young adults, and the difference, in terms of outcomes, compared to an older population.
In this study, we investigate the risk of ARDS among young adults hospitalized with COVID-19 using an international cohort from the international Consortium for Clinical Characterization of COVID-19 (4CE) [16][17][18][19][20][21]. This international consortium collects data from 342 hospitals in 6 countries and develops an innovative federated approach for electronic health records (EHR) analysis.
Through a federated analysis, the objectives were to evaluate the risk factors for developing ARDS following infection with SARS-CoV-2 and hospitalization in young adults and to compare characteristics, care, and outcomes between this population and an older population (greater than 49 years old) who similarly developed ARDS during their COVID-19 hospitalization.

Patients and methods
The 4CE consortium [16][17][18][19][20][21] has developed a framework to extract and standardize data directly from the EHRs of participating healthcare systems (HS) and to streamline federated analyses without sharing patient-level data. A common data model for structuring patientlevel data was adopted to enable identical analyses across all participating HS. Fig 1 presents the workflow from 4CE data collection to ARDS analysis.

Common 4CE data collection by HS
As previously described [16], each participating HS were responsible for and obtained ethics approval, as needed, from the appropriate ethics committee at their institution. IRB protocols were reviewed and approved at APHP (IRB00011591, Project CSE-20-29_ClinicalCOVID), The research was determined to be exempt at University of Michigan (IRB# HUM00184357), Beth Israel Deaconess Medical Center (IRB# 2020P000565), University of Pittsburgh (STUDY20070095), and University of Pennsylvania (IRB#842813). University of California Los Angeles determined that this study does not need IRB approval because research using limited data sets does not constitute human subjects research.
Cohort identification. Across each participating HS, we included all hospitalized patients within 7 days before and up to 14 days after a positive PCR SARS-CoV-2 test. The first hospital admission date within this time window was considered day 0 (the index date). Note that although all patients had a positive PCR test near their admission date, it is possible that for some patients the hospitalization was for reasons other than COVID-19.
Patient-level data collection by HS. Patient-level data were collected by HSs, which can represent one or several hospitals. At each HS, data were extracted directly from the EHR and consisted of time to admission and discharge, survival status, sex and age group [18][19][20][21][22][23][24][25], 50-69, 70-79, and 80+ years old]. Diagnoses were collected from the first 3 digits of the billing code using international classification disease (ICD) version 10. This 3-digit rollup was adopted to account for finer-grained differences in coding practices across hospitals. Procedures related to endotracheal tube insertion or invasive mechanical ventilation were collected and were denoted as severe procedures [17]. Medications administered were collected at the class level (as per the ATC standard nomenclature [22], S1 Appendix). Severe medication [17] refers to sedatives/anesthetics or treatment for shock (classes: SIANES, SICARDIAC).
All patient-level data were standardized to a common format, then stored and analyzed locally at each HS. Several quality controls were conducted iteratively at each HS to ensure the quality of the data.

ARDS analysis
Data aggregation by HS for ARDS analysis. Final data extraction was completed on 30 th August 2021 and included patient hospitalizations occurring from 1 st January 2020 to 28 th February 2021. All patients of 18 years or older were included in the analysis. ARDS patients were identified using the ICD10 code, J80-Acute respiratory distress syndrome.
Using patient-level data, each HS ran an R script locally to classify patients into 3 groups as follows: • ARDS: Patients with an ARDS ICD code • NO_SEVERE: Patients without an ARDS ICD code, severe medication or severe procedure • SEVERE_NO_ARDS: Patients with severe medication or severe procedure but without an ARDS ICD code For the analysis, the cohort was divided into two age groups: patients aged 18 to 49 years and patients older than 49 years (Fig 2). For each group, the number of patients was aggregated in terms of: • Age, sex, mortality at 90 days after the admission • Each ICD code, Elixhauser index (23) and complication class (S2 and S3 Appendices) Aggregate data were centrally collected, and several quality controls were executed before pooling the aggregated data together. Descriptive analysis was presented S1 Table. Statistical analysis. Risk factor: comparison between young patients with and without ARDS.
To identify the risk factors associated with an ARDS after SARS-CoV-2 infection and hospitalization, we compared the young patients with ARDS and the young non severe patients. Patients classified in the "SEVERE_NO_ARDS" group were excluded from this analysis. For comorbidities classified by the Elixhauser Comorbidity Index [23], risk ratios with confidence intervals were calculated from a univariable analysis considering diagnoses recorded between 365 days before (-365) the admission and 90 days after (+90) the admission. First univariable analysis was performed at each HS and aggregated through a random effect metaanalyses to account for heterogeneity between HS. In addition, comorbidities associated with ARDS in this meta univariable analysis and sex were selected for a multivariable analysis. Multivariable analysis was performed at each HS and then aggregated through another meta-analysis with random effect.
Complications and mortality: comparison between young and old adults with ARDS.
The proportion of patients per sex were evaluated and compared between young adults and older adults with ARDS. Complications were identified as novel diagnoses established between the day of admission and +90 days after the admission. To compare complications between young and older patients with ARDS, we performed a univariable analysis and reported estimated risk ratios with confidence intervals. Moreover, mortality was evaluated for both groups at 90 days after the index admission.
Statistical analyses were performed locally at each HS and then aggregated via meta-analysis with the R package metafor [24].

Results
12 HS participated in the analysis: 9 U.S. HS representing 241 hospitals, two French HS representing 42 hospitals, and one German HS representing 1 hospital (Table 1). 75,377 hospitalized patients with biological confirmation of COVID were included in the analysis.

Risk factors: Comparison between young adults with ARDS and young non severe patients (Table 2)
For the risk factor analysis, young ARDS patients (n = 1001) were compared to young non severe patients (n = 10,107). Among young ARDS patients, 43/1001 (4.3%) were aged between 18 to 25 years old. In an univariable analysis, patients aged 26 to 49 years old had an increased risk of developing ARDS compared to those aged 18 to 25 years old (RR = 2.94; 95% CI: [2.11; 4.1]). Due to the low proportion of patients between 18 to 25 years, age class was not included in the multivariable analysis.
In the multivariable analysis, compared to women, men had a higher risk for developing ARDS (RR = Peripheral vascular disease, and renal failure were associated with developing ARDS in univariable analysis, but not in multivariable analysis. AIDS/HIV, alcohol abuse, cancer, drug abuse, hypothyroidism, and psychosis were not associated with higher risk. Nicotine dependency was not associated with a higher risk (p = 0.138).

Complications and mortality: Comparison between young and old adult population with ARDS
6378 patients aged > 49 with ARDS were compared to the young adult population with ARDS. The percentage of males was 67.1% (672/1001) and 75.2% (4797/6378) for the

Data
The aggregated data per site are available here. Sites were anonymized.

Discussion
In a large international EHR-based cohort, we employed a novel federated approach including 241 hospitals in the United States and 43 in Europe, to describe comorbidities, complications, and mortality of young adults developing ARDS after SARS-CoV-2 infection. Even thoughyoung patients with ARDS represent a small proportion of hospitalized patients with COVID (HS range: [0.4%; 3.3%]), we were able to gather a large cohort thanks to this innovative method and demonstrated the poor outcome of young ARDS patients with notable mortality (16.2%).

Mortality and complications
Independently on the etiology, in-hospital mortality for ARDS patients has been reported to be between 30 to 40% [7,25,26]. Mortality at 30 days for ARDS patients of any age with COVID-19 was reported at 39% [4] and corresponds to the mortality for the older ARDS population in our study. The young ARDS population's mortality at 90 days was smaller, around 16.2% with large variability between HS [11.2; 36.8%]. Importantly, it was not possible to assess the attributable COVID-19 mortality from our data. However the mortality appeared high for this young population; in a 2018 study conducted in France, all-cause mortality of ICU patients in the same age range was estimated to be less than 10% [27]. The relatively higher risk of developing pneumonia due to Streptococcus pneumoniae and Streptococcal sepsis in young adults is probably related to their greater survival rate compared to older patients. The high frequency of complications in this young population emphasizes the major impact of ARDS on poor outcomes and mortality.

Risk factors
Although the proportion of the general population is low, ARDS appears in 7.8% of young hospitalized adults with COVID. These percentages are in agreement with those reported by Cummings et al. [3] and Cunningham et al. [13]. Among those young ARDS patients only 4,3% were aged between 18-and 25-years old. Patients developing ARDS in this young adult population had a high prevalence of obesity (53%), hypertension (38%) and diabetes (32%). A limitation of relying on billing codes to identify comorbidities is the challenge of accurately distinguishing comorbidities from complications. In our analysis, comorbidities were considered as those diagnoses from billing codes assigned up to one year before and up to 90 days after the admission. In electronic health records, each code is attached to one specific hospitalization visit. For patients with prior hospitalizations, comorbidities are easily identified with the codes attached to those previous hospitalization. However, for patient without previous hospitalization, the fact that electronic health records do not contains any code, do not mean that patient did not have comorbidities. For example, an obese patient without prior hospitalization would be identified as "obese"only if we take into account the code associated with the index hospitalization. This approach is more sensitive, but it can lead to considering complications as comorbidities. It is particularly true for peptic ulcer disease or paralysis which was identified as a comorbidity associated with ARDS but which is also known to be a common complication of mechanical ventilation [28,29] or prolonged ICU admission. We perform a complementary univariable analysis on the sub population who had previous hospital visits and considering only the ICD code related to those previous visits as comorbidities (one year and-14 days before the admission). In this univariable analysis presented in S2 Table, ARDS was associated with the presence of peptic ulcer disease or paralysis in a previous hospitalization, which explained our choice to keep both in the main multivariable analysis that means considering them as comorbidities. "Paralysis" regroups is related a large diversity of diagnoses. including encephalitis, myelitis and encephalomyelitis, hereditary ataxia, cerebral palsy, hemiplegia and hemiparesis, paraplegia (paraparesis) and quadriplegia (quadriparesis), and other paralytic syndromes (S2 Appendix); but a common co-occurrence is reduced lung capacity which could contribute to its association with ARDS. The association with peptic ulcer as comorbidities remains unclear and requires additional investigations.
Obesity has been identified as a risk factor for poor outcome for ARDS [30] and for SARS-CoV-2 infection [3,14,15,31] and it also appears in this analysis as a risk factor in this young adult population. Diabetes has a controversial association with ARDS [32][33][34] but appears in this population as a risk factor and has also been associated with the severity of SARS-CoV-2 infection in other studies [3,14,15,31]. Despite its association with poor outcomes in several cohorts of COVID-19 patients [2,15], hypertension was not significantly associated with ARDS in our study, possibly due to the choice of the variable included in the multivariable analysis and/or a lack of power.
Congestive heart failure, valvular disease, chronic liver disease, and chronic pulmonary disease are not associated with ARDS in the literature, however, their associations with COVID-19 have been identified as a risk factor for poor outcomes [3,14,15,31]. Through our analysis, it seems that most of the comorbidities associated with ARDS in the young adult population are similar to the ones associated with poor outcomes after SARS-CoV-2 infection in the general population. However, for most of them, it is unclear whether they are truly related to the onset of ARDS or just general comorbidities. Further analysis needs to be carried out to eliminate confounding factors and better understand the potential mechanisms of those associations.

Limitations
Our major limitation is that group membership, comorbidities, and complication analyses are based on billing codes, procedures, and medications directly extracted from EHR. Variation in billing coding practices, especially across international healthcare systems, may result in missing data and related biases [35]. However, multiple quality controls have been established to reduce those potential biases. For the detection of ARDS patients, a correct sensibility is expected as billing code is related to reimbursement in most countries and ARDS is associated with heavy care. Regarding the relation between ARDS and COVID-19 infection, patients included in our analysis had positive reverse transcription PCR tests for SARS-CoV-2 infection 7 days before to 14 days after the date of admission. This inclusion criterion allows us to ensure that included patients had the COVID-19 infection at least at the beginning of the hospitalization. Even if it is not possible to establish a clear temporal relationship or causality between ARDS and COVID-19 with ICD codes, it would be extremely rare that the development of ARDS during the hospitalization of a COVID-19 positive patient had no relation to COVID-19 infection. It is possible that COVID-19 infection was not the primary cause of the ARDS but most likely had an impact on the ARDS development.
To identify comorbidities associated with ARDS following hospitalization with COVID, a comparison was performed considering only non severe patients. Patients with mechanical ventilation, sedatives/anesthetics, or treatment for shock but without ARDS code were not included, which could generate a selection bias. This choice was conducted to eliminate potential miscoded ARDS patients and patients with severe disease or care not related to SARS-CoV-2 infection but with a concomitant infection. Those patients could have been included in the ARDS population, but the objective of this study was to focus precisely on ARDS patients, and this grouping would have resulted in a significant measurement bias. Especially because the number of young SEVERE_NO_ARDS patients is greater than the one of ARDS patients. In addition, we believe that the descriptive analysis of the SEVERE_NO_ARDS brings credit to this choice (S1 Table). Compared to the other groups, SEVERE_NO_ARDS population had the higher percentage of women (52.2%) and of patients with previous contact with the healthcare system (72%). In addition, 15.1% of those patients had a billing code associated with pregnancy and 36.1% with long-term drug therapy. These results suggest that the COVID-19 infection was simply concomitant but not the main cause of these hospitalizations.
Treatment like the use of mechanical ventilation, ECMO or even ICU admission were not collected. The collection of treatment data, described by specific codes in EHR, has proven to be too partial and largely heterogeneous between health systems (even from the same country) to be collected. It also appears that the accuracy of ICU admission in EHR data was poor. It was particularly true at the beginning of the pandemic, where hallways were converted into ad hoc ICUs to support the surge of sick patients, without notification in the chart. This issue has already been discussed in a previous article from the consortium [17].
More detailed analysis on age's threshold was not possible because age was intentionally not collected by the 4CE consortium as a continuous variable. This choice was made to ensure greater security/de-identification on the data collection process which allowed for an easier regulatory process for international aggregated data sharing.

Conclusion
We federated a large EHR-based international cohort of young adults developing ARDS after COVID-19. ARDS appears in 7.8% of hospitalized young patients with COVID and was associated with high mortality (16.2%). Young adults developing ARDS presented a high prevalence of comorbidities, particularly obesity, hypertension (although not being associated with ARDS), and diabetes. ARDS development was associated with peptic ulcer disease, paralysis, obesity, congestive heart failure, valvular disease, diabetes, chronic pulmonary disease, and liver disease.   Table. Number and percentage of patients per Elixhauser comorbidities for young adult patients with ARDS and non severe young adult patients. Risk ratio associated in univariable analysis for the sub population which had previous hospital visits and considering only the ICD code related to those previous visits (one year and-14 before the admission). (DOCX)