Covid-19 and excess mortality in medicare beneficiaries

We estimated excess mortality in Medicare recipients in the United States with probable and confirmed Covid-19 infections in the general community and amongst residents of long-term care (LTC) facilities. We considered 28,389,098 Medicare and dual-eligible recipients from one year before February 29, 2020 through September 30, 2020, with mortality followed through November 30th, 2020. Probable and confirmed Covid-19 diagnoses, presumably mostly symptomatic, were determined from ICD-10 codes. We developed a Risk Stratification Index (RSI) mortality model which was applied prospectively to establish baseline mortality risk. Excess deaths attributable to Covid-19 were estimated by comparing actual-to-expected deaths based on historical (2017–2019) comparisons and in closely matched concurrent (2020) cohorts with and without Covid-19. Overall, 677,100 (2.4%) beneficiaries had confirmed Covid-19 and 2,917,604 (10.3%) had probable Covid-19. A total of 472,329 confirmed cases were community living and 204,771 were in LTC. Mortality following a probable or confirmed diagnosis in the community increased from an expected incidence of about 4.0% to actual incidence of 7.5%. In long-term care facilities, the corresponding increase was from 20.3% to 24.6%. The absolute increase was therefore similar at 3–4% in the community and in LTC residents. The percentage increase was far greater in the community (89.5%) than among patients in chronic care facilities (21.1%) who had higher baseline risk of mortality. The LTC population without probable or confirmed Covid-19 diagnoses experienced 38,932 excess deaths (34.8%) compared to historical estimates. Limitations in access to Covid-19 testing and disease under-reporting in LTC patients probably were important factors, although social isolation and disruption in usual care presumably also contributed. Remarkably, there were 31,360 (5.4%) fewer deaths than expected in community dwellers without probable or confirmed Covid-19 diagnoses. Disruptions to the healthcare system and avoided medical care were thus apparently offset by other factors, representing overall benefit. The Covid-19 pandemic had marked effects on mortality, but the effects were highly context-dependent.


Introduction
The Covid-19 pandemic has profoundly influenced US healthcare, especially among Medicare recipients who are mostly at least 65 years old. By March 1, 2021, SARS-CoV-2, the virus responsible for Covid-19, had already infected more than 29 million US-Americans and more than 500,000 deaths associated with infection [1]. However, many people infected with Covid-19 are never tested or have false-negative test results; consequently, the true toll of Covid-19 remains uncertain. Furthermore, while the clinical course is sometimes apparent, Covid-19 also kills people by worsening chronic conditions, with those deaths often being attributed to other causes. Especially early in the pandemic, due to limited testing availability, it was difficult to differentiate deaths caused by Covid-19 from those that may have occurred naturally due to underlying health conditions. It is thus evident that many people who died consequent to Covid-19 infections may not have been diagnosed with the condition or may have died due to underlying causes.
Several teams have estimated "excess" mortality due to Covid-19 in the US population by comparing weekly observed death totals with those that occurred in a prior year or an average from multiple years. For example, Chen et al. estimated that from March 1 through August 22, 2020, 146,557 deaths were recorded in California, with an estimated 19,806 (95% CI, 16,210) deaths in excess of those predicted by historical trends [2]. Similarly, Faust et al. estimated that from March 1, 2020, to July 31, 2020, a total of 76,088 all-cause deaths occurred among US adults aged 25 to 44 years, which was 11,899 more than the expected 64,189 deaths based on a previous year (incident rate ratio, 1.19 [95% CI, 1.14-1.23]) [3]. Rossen et al. estimated excess mortality from January 26 through October 3 to have decreased 2% for the youngest subjects (aged <25 years) but increased 14.4% for those 45-64 years, 21.1% for those 65-74 years, 21.5% for those 75-84 years, and 14.7% in subjects �85 years old [4]. These reports suggest that all-cause mortality in the first six months of the pandemic increased by about 15-20%. However, historical comparisons do not account for risk at an individual level which may be useful to determine true excess mortality. Furthermore, historical comparisons that incorporate trending over multiple years may provide a more robust baseline estimate of expected deaths over a given time and indicate that Covid had a similar impact in many other high-income countries [5,6].
On a broad population basis, many risk factors for Covid-19 are now well recognized. For example, the US Centers for Disease Control (CDC) identifies eleven conditions that augment risk for severe forms of Covid-19 [7]. Chronic conditions such as cancer and dementia are reported to be among the most important contributors [8][9][10]. It is apparent that older members of the population are at special risk of death, although to some extent age may be a surrogate for accumulated comorbidities. However, it is difficult to extrapolate from population risk to individual risk since many people exhibit various combinations of risk factors for Covid-19 mortality, and individual risks attributable to each condition are not necessarily additive. A robust model that considers relevant individual conditions and predicts mortality risk from Covid-19 infections would therefore be valuable.
Numerous groups have proposed individual risk models based on clinical outcomes in various populations studied early in the pandemic (e.g., [11]); however, a consensus model has yet to emerge [12][13][14]. From a practical perspective, prediction models based on readily available administrative data (e.g., ICD-10 codes) will be most useful since more granular information extracted from clinical health records are neither universally available nor easy to obtain. Our primary goal was therefore to estimate excess risk-specific mortality in people with probable and confirmed Covid- 19 infections.
An additional consequence of the Covid-19 pandemic has been public health quarantines that have severely disrupted healthcare delivery. The virus may therefore also have caused indirect mortality because patients with chronic diseases and acute exacerbations avoided seeking care due to fear of infection or because health services were overwhelmed or otherwise limiting access [15,16]. Furthermore, stress related to isolation could increase medical morbidity [17,18]. In addition, some causes of death such as accidents and homicides may have increased [19,20], but not among the elderly. The extent to which delayed and disrupted healthcare for non-Covid-related conditions, along with pandemic-related behavioral changes, contribute to mortality remains unclear. Our secondary goal was therefore to estimate whether changes in mortality occurred in people without probable or confirmed Covid-19 infections.
Because Covid-19 is especially lethal in older people, we considered stratification by various age ranges for both our primary and secondary analyses. We also separately considered people residing in the community from those in Long-Term Care facilities, who are expected to have a higher baseline mortality risk and thus may be especially susceptible to Covid-19 infections.

Methods
Data analysis was conducted on the Centers for Medicare and Medicaid Services (CMS) Research Identifiable File (RIF) data using SAS Enterprise Guide (Version 7.15) under a special Data Use Agreement (DUA). Data was handled consistent with this agreement which 1) prohibits identification of any individual in the database, and 2) requires suppression of metrics in downloaded tables for populations smaller than 11 individuals. This project was determined to be exempt from informed consent requirements by the New England Institutional Review Board. Final data analysis of the full cohort was conducted from January 10 to March 11, 2021. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cohort studies [21].
Individual subject data used for our analysis are available to certain stakeholders as allowed by federal regulations and CMS policy. Requests for access to data to replicate these findings require an approved research protocol and DUA with CMS. For more information, contact the Research Data Assistance Center (ResDAC, http://www.resdac.org). Aggregate data supporting the reported results, tables and figures in this manuscript are available in an Excel workbook on Harvard Dataverse (https://dataverse.harvard.edu), https://doi.org/10.7910/ DVN/GFBLK5. Mortality predictions and outcomes were referenced to an anchor date of Feb 29, 2020, just before the initial wave of documented Covid-19 cases and the week when the first case of potential community spread Covid-19 was reported by CDC. We recognize that scatter undiagnosed cases may have occurred previously.
We used full Medicare fee-for-service and dual eligible (Medicaid and Medicare) files one year before the anchor date through September 30, 2020, with mortality outcomes reported through November 30 th , 2020 for the primary analysis of Covid-19 outcomes. This dataset contains longitudinal health records for beneficiaries from across the entire US. We identified beneficiaries with confirmed Covid-19 diagnoses consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. Presumably most subjects with Covid-19 diagnoses were symptomatic, although some may have been tested and received a billing code because of risk or exposure. The confirmed Covid-19 diagnostic code U07.1 requires documentation of positive Covid-19 testing but does not specify which type of test was used.
Demographic characteristics (age, sex, ethnicity, location of care, zip code derived measures) and Medicare coverage information (dates of coverage, enrollment history) were extracted for each subject. Beneficiaries were classified as belonging to the long-term care/ skilled nursing facility (LTC/SNF) cohort if their claims history indicated that they had received services in a LTC/SNF setting at any time in February 2020. The remaining beneficiaries were designated as community dwelling.
We included all Medicaid and Medicare participants alive as of the study anchor date (N = 65,310,173). We excluded beneficiaries who had: 1) ages outside 18-99 years (214,767); 2) non-continuous coverage of Medicare Part A or B (12,445,567) because our predictor assumes a complete medical record in the year prior to the date of prediction; 3) records from insurance programs not in our source database at the time of the analysis [(i.e., Medicare Part C (Medicare Advantage program) coverage (23,633,391)]; 4) missing any variable used in the analysis tabulated in results (i.e., age, sex, race, low income or disability status, median household home of beneficiary's zip code, or RSI) (509,932); and, 5) inconsistent data from source files comprising the medical records [(i.e., inconsistent birth dates, death dates, sex, or race information (117,418)]. The resulting 28,389,098 beneficiaries included 677,100 with confirmed and 2,917,604 with probable diagnosis of Covid-19 (S1 Fig). Mortality was assumed to have occurred on the date-of-death listed in the CMS Common Medicare Environment which is continuously updated from various sources including the Social Security Administration. The causes of death were not available for this study.

Risk stratification
We used an adaptation of the Risk Stratification Index (RSI) to predict nine-month mortality using the prior year's administrative claims. Nine-month mortality was selected based on the availability of curated data at the time of analysis. A model was developed using the full fee-forservice 2018 population for training, with prospective validation of performance on the 2019 dataset as previously described [25]. In brief, RSI calculates a probability of mortality by applying coefficients to sociodemographic factors (i.e., sex, age, Medicaid enrollment (dual eligibility) status) and to indicator variables consisting of: 1) individual diagnostic or procedural codes; and 2) composite classes of clinical conditions or procedures. Specifically, the medical record is represented by individual ICD-10-CM diagnostic and ICD-10-PCS procedure codes, and their respective composite codes as defined by Clinical Classifications Software Refined (CCSR) [26]. More detailed descriptions of the model and prospective testing performance results are provided in S1 File and S2 Fig, respectively. The resulting RSI model was then prospectively applied to all eligible Medicare or dual-eligible beneficiaries as of February 29, 2020 to derive individual RSI scores as of that date-that is, before Covid-19 infections were confirmed in the United States.
For comparative purposes, a second model was similarly developed to predict nine-month mortality from the presence of 27 individual chronic conditions as defined by CMS [27]. Specifically, logistic regression (stepwise selection using p-in of 10 −3 , p-out of 10 −2 ) was used to select significant predictors from a pool of candidate features (i.e., 27 chronic conditions, age, sex and dual-enrollment status) to create a predictive model using the same training and prospective testing populations described in the previous paragraph. Performance using baseline RSI values or the individual chronic conditions model as predictors of outcomes in 2020 were then tested prospectively (S3

Analysis and selection of study cohorts
We conducted a progression of complementary inquiries. To test our primary hypothesis, we first identified the main study cohorts of beneficiaries with diagnoses of probable or confirmed Covid-19, and then subdivided them based on location of service (community or LTC/SNF) as of February 29, 2020. Within each cohort, we determined 9-month mortality between the anchor date and November 30, 2020.
Our approach was to first define associations between baseline demographic characteristics and health status as characterized by RSI with the risk of mortality following a Covid-19 diagnosis in the overall at-risk population and in pre-defined subpopulations. We initially compared differences in mean baseline RSI scores between survivors and non-survivors, then used univariable and multivariable regression modeling to estimate the relative importance of baseline demographic factors, chronic conditions, and RSI scores as independent predictors of mortality. A similar analysis was conducted to identify risk factors associated with a confirmed diagnosis of Covid-19. We also determined the association between RSI and observed mortality by beneficiary age group and location.

Estimation of expected mortality
Two independent methods were used to estimate expected 2020 mortality in our study population. A historical comparison allowed us to compare year-over-year changes in mortality in Medicare recipients and thus characterize overall effects of Covid-19 and quarantine-induced restrictions in healthcare access on mortality. A case-matched analysis provided an alternate estimate of Covid-19-related excess mortality within the 9 months of 2020 that we considered. A potential advantage of precise case matching (i.e., "digital twins") to form a concurrent control population with similar underlying mortality risk is that it allows attribution of Covid-19 infection as the primary factor responsible for any observed differences in outcome during the study period. This therefore complements the historical comparisons that may require adjustments for variations in population size, demographics, or mortality trends [5].

Historical comparison
As in previous studies [3,4], the first approach estimated expected 2020 mortality figures from historical records. The daily observed mortality for the Medicare population from 2017-2019 was used to prepare a model with optimal fit to capture seasonality and account for annual trends using a three-year moving average adjustment (S2 Fig). This approach better estimates expected mortality than relying on a single year-over-year comparison because the model better captures year-to-year fluctuations consequent to severity of yearly influenza outbreaks and other factors. We calculated predicted mortality for each individual and designated the sum as the historically expected mortality in each subpopulation. Excess deaths thus equaled the difference between observed 2020 deaths and the historical projection of expected deaths (actual minus expected).

Case matching, digital twins
A second method used case matching or "digital twinning" to estimate excess mortality in exposed subjects compared to concurrent controls who had closely matched health profiles. Beneficiaries receiving a diagnosis of probable or confirmed Covid-19 were pairwise exactly matched 1:1 on Feb 29, 2020 with beneficiaries without a Covid-19 diagnosis based on sex, age (within 1-year), ethnicity, location of services in Feb 2020 (community or LTC/SNF), along with RSI as a propensity matching factor (within 0.1%). Because the eligible Medicare population is large, we successfully matched almost the entire infected population (98.8% overall, 91.22 to 99.77% among subgroups analyzed). Excess deaths were estimated as the difference between the observed number of deaths in probable or confirmed Covid-19 subjects and their matched non-Covid-19 digital twins over the concurrent period. Matching may be more reliable than the historical comparison for estimating true excess mortality because it accounts for population variation over time and accounts for the impact of substantial disruptions in public health and lifestyle changes caused by the pandemic restrictions in 2020.

Statistical analysis
Patient demographic and clinical characteristics are summarized descriptively. Baseline characteristics were compared using t or χ 2 tests, as appropriate. Mortality rates within the study period post-Covid-19 diagnosis are reported as odds ratios with 95% confidence intervals. P values <0.05 defined statistical significance for both the primary and secondary outcomes. No adjustments were made for multiple comparisons. Sample size requirements were not estimated a priori because the intention was to include all qualifying 2020 beneficiaries available in the 100% nationwide Medicare files.

Patient characteristics and outcomes
As of Feb 29, 2020, a total of 28,389,098 Medicare or dual eligible beneficiaries met inclusion criteria for this study. Among them, 677,100 (2.4%) beneficiaries had a diagnosis of confirmed Covid-19 and 2,917,604 (10.3%) had a diagnosis of probable Covid-19 during the study period (S1 Fig). Among the confirmed cases, 472,329 were in the Community group while 204,771 received care in a long-term care setting.
Tables 1 and 2 compare demographic and clinical profiles for various subgroups. Compared to survivors, patients who died after a Covid-19 diagnosis were older, more often male, not white, received Medicaid, lived in zip-codes associated with lower median income, received services in February 2020 in a long-term care facility, and had higher baseline risk of mortality as defined by RSI. Age and baseline RSI scores were both strongly related to risk of infection and adverse Covid-19 outcomes. Residence in a LTC/SNF location and presence of end-stage renal disease were strong risk factors for acquiring a confirmed diagnosis of Covid-19 (S4 Fig). As shown in S5 Fig, RSI scores were associated with increasing mortality in a consistent rank ordered manner across each age group, thereby suggesting that RSI provides a significant and sensitive measure of co-morbidities and mortality risk that is independent of age.

Mortality risk prediction
Fig 1 presents the relative importance of factors that contributed to mortality in both univariable and multivariable models. Quintiles of RSI, age, LTC/SNF services status, sex, and race were the factors most associated with relative risk of mortality. Status of lung cancer and endstage renal disease appear to carry meaningful incremental risk after adjustment. Mortality prediction models based primarily on baseline RSI levels performed better than models based on the presence of individual chronic conditions for predicting mortality risk (S3 Fig). Case matching identified a cohort of beneficiaries from the general population who were closely matched with subjects who had a diagnosis of probable or confirmed Covid-19 based on their RSI scores as of Feb 29, 2020 ( Table 3). Characteristics of matched and unmatched subjects are tabulated in S1 Table. Excess mortality estimates The distribution of observed and expected mortality by diagnosis, category, and location of care is presented in Fig 2. As expected, subjects with high baseline mortality risk in the LTC/ SNF cohort had actual mortality that exceeded all other groups. Those with confirmed Covid-  19 showed similarly increased mortality above expected levels in both the LTC/SNF and community setting. Among community dwelling subjects, mortality also exceeded expected risk in subjects with possible Covid-19.
There was an estimated excess of 130,702 (historical comparison method) or 101,482 (case matching method) deaths attributable to probable or confirmed Covid-19 across the full population in the 9 months of 2020 that we considered. In the matched analysis, half the deaths (50,793) occurred in patients with a confirmed diagnosis of Covid-19 and half (50,689) occurred in those with a probable Covid-19 diagnosis. In contrast, 31,360 fewer subjects without a Covid-19 diagnosis died than expected, representing a 5.4% mortality reduction ( Table 4).

Public access
Our model is highly predictive for mortality in Medicare beneficiaries with documented Covid-19 infections. Because baseline RSI scores can help to identify Medicare beneficiaries at highest risk for mortality due to Covid-19, we make the models publicly available in the following formats: 1. Access to RSI risk calculators will be provided free of charge for authorized non-commercial uses via the HDAI API website (https://www.hda-institute.com/application-for-use-ofthe-hdai-api/).

2.
Medicare beneficiaries or their health advocates may access their personalized health history and risk assessment by signing into Health Picture (https://my.healthpicture.com). Health picture is an easy-to-use tool that allows Medicare beneficiaries and their family members a way to access their health histories and understand their Covid-19 risks.
3. Coefficients for a public version of a one-year RSI mortality model are provided at-(Risk Stratification Index | Cleveland Clinic).

Discussion
Age, sex, care location, and comorbidities were significant predictors of mortality. The strongest individual predictor following a diagnosis of Covid-19 across all age categories, and in both community and long-term care settings was the integrated measure of patient co-  Subjects were categorized as "LTC/SNF" if they received services in either a Long-Term Care (LTC) or Skilled Nursing Facility (SNF) in February 2020, otherwise they were categorized as receiving services in the "Community." Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. The baseline risk of 9-month mortality defined by the Risk Stratification Index (RSI) calculated on February 29, 2020. Two independent methods were used to estimate expected 2020 mortality as described in the footnote above. The case matching (digital twin) method utilized the baseline risk of 9-month mortality defined by the Risk Stratification Index (RSI). In this method, beneficiaries receiving a diagnosis of probable or confirmed Covid-19 were pairwise exactly matched 1:1 on Feb 29, 2020 with beneficiaries without a Covid-19 diagnosis based on sex, age (within 1-year), ethnicity, location of services in Feb 2020 (community or LTC/SNF), along with RSI as a propensity factor (within 0.1%). The results demonstrate that, within the Medicare population, Covid-19 had a considerable impact by increasing mortality well above what would have been expected based on age and co-morbidities alone.
677,100 (2.4%) beneficiaries had confirmed Covid-19 and 2,917,604 (10.3%) had probable Covid-19. 472,329 confirmed cases were community living and 204,771 were in LTC. Mortality following a probable or confirmed diagnosis in the community increased from an expected incidence of about 4.0% to actual incidence of 7.5%. In long-term care facilities, the corresponding increase was from 20.3% to 24.6%. The absolute increase was therefore similar at 3-4% in the community and in long-term care residents. But the percentage increase was far greater in the community (89.5%) than among patients in chronic care facilities (21.1%) who had high baseline risk. The long-term care population without probable or confirmed Covid-19 diagnoses experienced 38,932 excess deaths (34.8%) compared to historical estimates. Limitations in access to Covid-19 testing and disease under-reporting in long-term care patients probably contributed, although social isolation and disruption in usual care presumably also contributed. Remarkably, there were 31,360 fewer deaths than expected in community dwellers without probable or confirmed Covid-19 diagnoses, representing about a 5.4% reduction. Disruptions to the healthcare system and avoided medical care were thus apparently offset by other factors, representing overall benefit. The Covid-19 pandemic had marked effects on mortality, but the effects were highly context-dependent. https://doi.org/10.1371/journal.pone.0262264.t004 Using RSI as a composite measure of baseline mortality risk permitted precise case-control matching, thereby allowing us to estimate excess deaths attributable Covid-19 by two complementary methods in Medicare recipients with probable or confirmed Covid-19 diagnoses. Using the historical comparison, there was an increase from 215,359 expected to 346,062 actual deaths, representing 130,702 excess deaths and a 61% increase. Using matching, mortality increased from 236,119 expected to 337,601 observed deaths, representing 101,482 excess deaths and a 43% relative increase. Differences between these estimates results largely from excess deaths that occurred in the No Covid subgroup in LTC facilities from which matches were drawn. Nevertheless, both estimates far exceed the 15-20% excess mortality estimated in previous analyses that included younger populations. Our results are therefore consistent with the conclusion that older people with more comorbidities are at much higher risk for developing severe Covid-19-and of dying from it.
Overall, our historical model indicated that mortality following a probable or confirmed diagnosis in the community increased from an expected incidence of about 4.0% to actual incidence of 7.5%. In LTC/SNF's, the corresponding increase was from 20.3% to 24.6%. Therefore, the absolute increase in mortality was similar at 3-4% in the community and in long-term care residents. However, baseline risk (RSI) associated with all individuals in a care setting varied greatly, being only about 2.6% in the community versus 20.5% in long-term care facilities. As a percentage, the relative increase in mortality was thus far greater in the community (89.5%) than among patients in Long-Term Care facilities (21.1%).
Somewhat remarkably, overall mortality decreased in Medicare participants without probable or confirmed Covid-19 diagnoses. In fact, among community dwellers, there were 31,360 fewer deaths than expected, representing about a 5.4% reduction. Disruptions to the healthcare system and avoided medical care were thus apparently offset by other factors, representing overall benefit. Obvious health benefits of pandemic isolation include reduced exposure to other airborne illnesses such as influenza, fewer driving accidents and fewer homicides in the over 65-year-old population. However, none seems sufficient to explain the reduction alone. More subtle effects including reduced work or stress-related illness might contribute more, although there is no obvious reason to believe that the pandemic would reduce stress-especially in an over-65-year-old population. Cause of death information would help explain the reasons for decreased mortality in our cohort community dwellers without Covid-19.
The causes of reduced mortality in community dwelling Medicare participants remains unclear. However, our results suggest that inadequate care for chronic conditions and delayed care of acute events did not produce the feared outcome of higher nine-month mortality in the general population without Covid-19. But due to limited follow-up, we caution that disruptions in healthcare delivery may yet result in adverse longer-term outcomes due to delays in the diagnosis and treatment of new and existing chronic conditions. An additional consideration is that prolonged sequela after severe Covid-19 infections (Long Covid syndrome) appear substantial and is an area requiring urgent further study [30].
There was a distinct disparity between community dwellers and those in long-term care facilities with respect to historical mortality comparisons. In contrast to community Medicare participants, the long-term care population without probable or confirmed Covid-19 diagnoses experienced 38,932 excess deaths (35%) compared to historical estimates. We believe that limitations in access to Covid-19 testing and disease under-reporting in long-term care patients probably were responsible for this finding. It seems likely that many of the excess deaths in this vulnerable population were consequent to undiagnosed Covid-19 infections. But it is also probable that social isolation and disruption in usual care may have contributed as well. The higher-than-expected level of excess deaths observed in this cohort (subjects without a probable or confirmed Covid diagnosis) is reflected in our case matching results, which indicate a modest relative reduction in deaths in subjects with a Covid related diagnosis. This is most likely due to undiagnosed Covid cases included in the control population, but we cannot rule out the possibility that the focus on care for the Covid patients had an unintended adverse impact on the remaining population.

Limitations
We excluded less than 2.2% of the available population because of missing and inconsistent values. Because data were missing non-systematically, exclusion of these subjects was unlikely to introduce meaningful bias. We relied on administrative diagnostic claims for Covid-19 to assign exposure. Surely these are inexact, especially during our study period early in the pandemic. Furthermore, a new diagnostic code for confirmed Covid-19 (U07.1) was introduced on April 1, 2020, and we assume that there was some uncertainty regarding its proper application. However, Kadri et al. recently reported that this Covid-19 specific code showed high sensitivity and specificity compared with the PCR test results [31]. We elected to address this uncertainty in coding by analyzing subjects with a confirmed Covid-19 code separately from those with a probable or unconfirmed Covid-19 code. A second limitation is that we did not account for temporal changes in risk of exposure to Covid-19 in either setting, nor for improvements in treatment of infected individuals over time [32,33].
We assigned individuals to either community dwelling or long-term care subgroups based on coding in February 2020. Some participants undoubtedly changed their care settings during the analysis period. Skilled nursing facilities, for example, include patients who remain semipermanently along with others who stay for short periods such during rehabilitation from major orthopedic procedures before resuming community life. But among patients who died, 79% of those who were in a long-term care facility on the anchor date of Feb 29, 2020 had Long-Term Care charges within two months of death.
Our analysis was based on 28,389,098 adults enrolled in the US fee-for-service and Medicare/Medicaid program (43% of total Medicare eligible population in the US). The results are therefore broadly applicable to Medicare eligible adults. One potential selection bias to consider is that excluded subjects, especially those without continuous coverage and those enrolled in Medicare Advantage programs, may have had a higher baseline risk profile than those included in our sample. A consequence would be underestimating the full impact of Covid-19 across the entire US Medicare population. Although our sample included a fair number of dual eligible subjects below age 65, our results should only be cautiously generalized to younger and healthier populations.

Summary
Mortality following a probable or confirmed Covid-19 diagnosis in the community increased from an expected incidence of about 4.0% to actual incidence of 7.5%. In long-term care facilities, the corresponding increase was from 20.3% to 24.6%. The absolute increase was therefore similar at 3-4% in the community and in long-term care residents. But the percentage increase was far greater in the community (89.5%) than among patients in chronic care facilities (21.1%) who had high baseline risk.
The long-term care population without probable or confirmed Covid-19 diagnoses experienced 38,932 excess deaths (34.8%) compared to historical estimates. Limitations in access to Covid-19 testing and disease under-reporting in long-term care patients probably contributed, although social isolation and disruption in usual care presumably contributed. Remarkably, there were 31,360 fewer deaths than expected in community dwellers without probable or confirmed Covid-19 diagnoses, representing about a 5.4% reduction. Disruptions to the healthcare system and avoided medical care were thus apparently offset by other factors, representing overall benefit.
The Covid-19 pandemic had marked effects on mortality, but the effects were highly context-dependent. Among community dwelling Medicare participants with suspected or confirmed Covid-19 diagnoses, mortality nearly doubled, but from a relatively low baseline. Patients in long-term care facilities had a similar absolute increase in mortality, but because their baseline mortality was 20.5%, the relative increase was smaller. In contrast, community dwelling Medicare participants without COVID had about 5.4% lower-than-expected mortality.
Supporting information S1 Fig. Consort style waterfall flowchart detailing population selection methodology. Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. Subjects were excluded for missing data if values for any baseline characteristic used in the study were missing (i.e., age, sex, ethnicity, location of care, zip code derived measures, dates of coverage, or baseline risk of 9 month mortality assessed with the Risk Stratification Index (RSI).) Additionally, we excluded subjects whose records had inconsistent values among source files containing similar variables such as birth date and sex. The calibration plot displays the mean actual vs predicted 1 year mortality for populations clustered in increments of 1% probability of mortality. Dark green, light green, and red dots are populations of the lowest 95%, 95%-99%, and top 1% risk of mortality. The diagonal line identifies the domain of ideal performance where actual and expected mortality rates are equal for a population. The performance of this index is very close to ideal performance for approximately 99% of the population. Tabulated metrics: The sample size in this test set (N) was 11,871,985 with an incidence of 1yr mortality (Event_Test) of 4.4%. The Slope and Intercept (INT) fit of the data are 0.94 and 0.01, respectively. The area under the Receiver Operating Curve was 0.88. The Mean Average Error (MEA) from cluster coordinates (i.e., (expected, actual) couplets) to the identity line was calculated for the database divided into populations grouped from the riskiest to least risky subjects using cluster sizes ranging from 1 (i.e., each individual as a cluster) to 1000 neighboring subjects (e.g., MAE to MAE_1000). The 95% Confidence Interval (CI) for the fits of these populations to the identify line is tabulated (i.e., AE_CI to AE_CI_1000). Rsq_unit is a goodness of fit measure of individual results to the ideal line. Covid-19 populations. Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. (A,B) ROCs display the sensitivity vs. 1 -specificity in detecting patients who died within 9 months after prediction from February 29,2020 (baseline). The areas under each ROC, with their corresponding 95% confidence intervals, are tabulated in the lower right of each figure. Predictions using RSI yielded better performance (A) than those using a model based on age, sex and chronic conditions (B). (TIF) S4 Fig. Forest plot showing the relative risk and 95% CI of significant predictors of confirmed Covid-19 infection. Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. Subjects were categorized as "LTC/SNF" if they received services in either a Long-Term Care (LTC) or Skilled Nursing Facility (SNF) in February 2020, otherwise they were categorized as receiving services in the "Community." Predictors were assessed at baseline (February 29, 2020) and include quintiles of Risk Stratification Index (RSI), presence of chronic conditions, location of services (LTC/SNF vs Community), and demographic variables (i.e., age, sex, race, and quintiles of median household income imputed by zip code according to 2015 Census data.) Variables not remaining in the adjusted model are indicated by the presence of empty parenthesis under the adjusted odds ratio. Location of services, age, status of end-stage renal disease (ESRD) and RSI were the strongest (unadjusted) predictors of infection. Location of services and ESRD remained strong predictors following adjustment; however, risks associated with having chronic conditions were typically reduced when adjusted by the presence of other factors. (TIF)

S5 Fig. Observed mortality rates by age and RSI quintiles.
Rates of mortality within 9 months following baseline (February 29, 2020) in Medicare subpopulations categorized by age, location of services, infection status, and quintiles of the baseline risk of mortality assessed using the Risk Stratification Index (RSI). Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. Subjects were categorized as "LTC/SNF" if they received services in either a Long-Term Care (LTC) or Skilled Nursing Facility (SNF) in February 2020, otherwise they were categorized as receiving services in the "Community." As expected, subjects in quintiles with higher baseline risk of mortality had higher rates of observed mortality. For subjects without a Covid diagnosis, mortality rates were lower in the community setting compared to those in the LTC/SNF; however, for subjects with confirmed or probable Covid infection, mortality rates were typically higher in the community setting than in the LTC/SNF. (TIF) S1 Table. Comparison of characteristics for Covid-19 subjects matched versus unmatched with controls in community and LTC/SNF subgroups. Subjects were categorized as "LTC/ SNF" if they received services in either a Long-Term Care (LTC) or Skilled Nursing Facility (SNF) in February 2020, otherwise they were categorized as receiving services in the "Community." Confirmed Covid-19 cases were identified consistent with CMS guidance using ICD-10-CM codes for Covid-19 (B97.29 before April 1, 2020 and U07.1 thereafter) as a primary or secondary diagnosis between March 1, 2020 and September 30, 2020 [22]. Probable Covid-19 infection cases were identified using ICD-10-CM codes consistent with the CDC guidance (Z20.828) and WHO recommendations (U07.2) [23,24]. The baseline risk of 9-month mortality defined by the Risk Stratification Index (RSI) calculated on February 29, 2020. Beneficiaries receiving a diagnosis of probable or confirmed Covid-19 were pairwise exactly matched 1:1 on Feb 29, 2020 with beneficiaries without a Covid-19 diagnosis based on sex, age (within 1-year), ethnicity, location of services in Feb 2020 (community or LTC/SNF), along with RSI as a propensity factor (within 0.1%). The tabulated results demonstrate the expected results where the percentage of unmatched cases is generally highest in the subpopulations within a category that are either the smallest (e.g., American Indians/Alaskan Native) or that generally do not use LTC/SNF services (e.g., the youngest population). (PDF) S1 File. RSI development method for 9-month predictions.