Predicting patients with false negative SARS-CoV-2 testing at hospital admission: A retrospective multi-center study

Importance False negative SARS-CoV-2 tests can lead to spread of infection in the inpatient setting to other patients and healthcare workers. However, the population of patients with COVID who are admitted with false negative testing is unstudied. Objective To characterize and develop a model to predict true SARS-CoV-2 infection among patients who initially test negative for COVID by PCR. Design Retrospective cohort study. Setting Five hospitals within the Yale New Haven Health System between 3/10/2020 and 9/1/2020. Participants Adult patients who received diagnostic testing for SARS-CoV-2 virus within the first 96 hours of hospitalization. Exposure We developed a logistic regression model from readily available electronic health record data to predict SARS-CoV-2 positivity in patients who were positive for COVID and those who were negative and never retested. Main outcomes and measures This model was applied to patients testing negative for SARS-CoV-2 who were retested within the first 96 hours of hospitalization. We evaluated the ability of the model to discriminate between patients who would subsequently retest negative and those who would subsequently retest positive. Results We included 31,459 hospitalized adult patients; 2,666 of these patients tested positive for COVID and 3,511 initially tested negative for COVID and were retested. Of the patients who were retested, 61 (1.7%) had a subsequent positive COVID test. The model showed that higher age, vital sign abnormalities, and lower white blood cell count served as strong predictors for COVID positivity in these patients. The model had moderate performance to predict which patients would retest positive with a test set area under the receiver-operator characteristic (ROC) of 0.76 (95% CI 0.70–0.83). Using a cutpoint for our risk prediction model at the 90th percentile for probability, we were able to capture 35/61 (57%) of the patients who would retest positive. This cutpoint amounts to a number-needed-to-retest range between 15 and 77 patients. Conclusion and relevance We show that a pragmatic model can predict which patients should be retested for COVID. Further research is required to determine if this risk model can be applied prospectively in hospitalized patients to prevent the spread of SARS-CoV-2 infections.


Introduction
Coronavirus disease-2019 (COVID-19), the illness caused by the SARS-CoV2 virus has had widespread global effects and has caused significant strain on both inpatient and outpatient healthcare institutions [1,2]. Reports during the early phase of the pandemic showed significant nosocomial transmission of disease [3][4][5]. Therefore, a major consideration for health systems is mitigating the spread of virus within the hospital setting to uninfected patients and to healthcare workers. Another unique challenge of COVID-19 has been management of protective personal equipment and maintaining adequate rooming and facilities for patients hospitalized with the illness [6].
Many hospitals have enacted strategies to test patients directly in the emergency room prior to admission to a hospital unit with the goal of appropriately rooming COVID-positive patients on COVID-specific wards and provide appropriate personal protective equipment to healthcare workers [7]. One unstudied yet important population are patients who initially test negative for COVID and later retest positive for the virus [8]. Though COVID tests used in hospital settings are very specific, sensitivity is much lower with significant temporal variability of viral shedding; moreover, a recent systematic review reports a false negative rate of 13%, a number sufficiently high to be clinically meaningful [9][10][11]. Such patients may pose a significant risk especially in the hospital setting. These patients may be roomed with non-infected patients and thus may expose other patients, visitors, and healthcare workers to SARS-CoV-2. Moreover, nosocomial SARS-CoV-2 infections in hospitalized patients are concerning as hospitalized patients are often older, immunocompromised, and have multiple comorbidities which are all risk factors for severe COVID [12].

PLOS ONE
In this retrospective study, we evaluate this group of patients who initially test COVID negative per nasopharyngeal polymerase chain reaction (PCR) testing but subsequently retest positive to identify patient characteristics, vital signs, and laboratory tests that may predict a subsequent positive test for COVID. We develop a risk model for predicting a patient's COVID 'positivity' and apply it to the broader COVID-negative cohort to identify patients who will later have a positive test. We hypothesized that a model could be developed that would discriminate which patients who initially test negative for COVID may indeed have the infection, identifying a population for targeted re-testing.

Patients and setting
We included adult patients hospitalized at one of five hospitals within the Yale New Haven Health System (YNHHS) between 3/10/2020 and 9/1/2020 who received nasopharyngeal PCR testing for SARS-CoV-2 virus during the time period of their hospitalization. SARS-CoV-2 tests included several multiplex real time RT-PCR tests (GeneXpert-Cepheid; Siplexa-Diasorin; TaqPath-Thermo Fisher), transcription mediated amplification test (Panther-Hologic) and a singleplex real time RT PCR test (CDC-lab developed). Data regarding specific test used for each sample were not available for this study. YNHHS includes 6 hospitals across Connecticut and Rhode Island and includes a variety of settings, including academic/community, urban/sub-urban, and teaching/non-teaching.
The first 96 hours of a patient's hospitalization served as the observation period with the aim of limiting the analyses to patients who likely initially had COVID on presentation rather than patients who developed nosocomial COVID during their hospitalization. Patients who did not have any COVID tests during the observation period were excluded from analysis.
This study operated under a waiver of informed consent and was approved by the Yale Human Investigation Committee (HIC # 2000027733).

Variables and outcomes
We collected longitudinal data from the electronic health record including demographics, comorbidities, procedures, medications, laboratory results, and vital signs. All data were extracted from the data warehouse of our electronic health record vendor Epic (Verona, WI).
Patient variables were chosen pragmatically for those that would be simpler to embed into a clinical decision support platform either directly onto the EHR or as a web service. These variables were chosen as they contained very low (<10%) missingness for hospitalized patients within the first 24 hours of hospitalization. Variables included in the model included demographics (age, sex, race), comorbidities (congestive heart failure, chronic pulmonary disease, diabetes, obesity, history of arrhythmia, hypertension, alcohol use disorder, metastatic cancer, stroke, transient ischemic attack, HIV, and the Elixhauser comorbidity index), laboratory values (sodium, potassium, chloride, bicarbonate, blood urea nitrogen, creatinine, glucose, hemoglobin, platelet count, white blood cell count and lymphocyte percentage) and vital signs (temperature, systolic blood pressure, diastolic blood pressure, respiratory rate, and oxygen saturation). Comorbidities were defined as per the Elixhauser comorbidity index based on codes from the International Classification of Diseases-10 [13]. The first measurement for these variables were used in analyses.

Statistical methods
We used descriptive statistics to compare the populations of patients who initially tested positive, those who initially tested negative and later tested positive, and those who initially tested negative and remained negative throughout the hospitalization. Chi-square testing was used to compare categorical variables and the Kruskall-Wallis test was used for continuous covariates.
We trained a logistic regression model to predict COVID-positivity in patients with an initial positive COVID test (+/0) and those with an initial negative COVID test who were never retested (-/0). We then tested the performance of this model amongst individuals with an initial negative COVID test who were retested and negative (-/-) and retested and positive (-/+) within the first 96 hours of their hospitalization. This allowed evaluation of model performance among individuals that could clearly be classified as 'false negative' or 'true negative' at the time of initial testing. Variable importance in the logistic regression model were determined by the magnitude of the absolute value of the z-score.
Area under the operator receiver curve (AUROC) as well as the precision-recall curve (PRC) are reported regarding performance of the model on the validation set. Quantiles of probabilities from the logistic model were developed from the training set and then applied to test set probabilities to determine cut points for the prediction. We report quantile of probability which was chosen clinically to optimize the sensitivity of patients who would be appropriately identified as indeed having COVID while minimizing the 'number needed to test'.
All analyses were performed using R (Version 4.0.0, Vienna, Austria) [14]. Logistic regression models were developed using the glm function from the 'stats' package in R. We defined statistical significance at P<0.05.
This study utilized the Strengthening the Reporting of Observation Studies in Epidemiology (STROBE) guidelines.

Results
There were a total of 40,030 patients hospitalized at the five Yale-New Haven Health system hospitals between 3/10/2020 and 9/1/2020. Of these, 31,459 adult patients had a COVID test during the first 96 hours of hospitalization and were included in analyses (Fig 1). Of these patients, there were 2,666 patients who tested positive for COVID and 25,382 patients who tested negative and were never retested. This group of 28,048 patients served as the training population for modeling. The validation set was composed of 3,511 patients who initially tested negative for COVID and were retested, of which 61 (1.7%) retested positive.
We compared patients who were initially COVID-positive to those who were falsely negative on for their initial test (Table 1). These two populations were similar in terms of demographics, baseline vital signs, comorbidities, as well as initial laboratory values. On admission, COVID-negative patients were noted to have a higher Elixhauser comorbidity score, more diabetes, slightly elevated creatinine, and slightly lower hemoglobin. Characteristic of all patients are presented in S1 Table. Manual chart inspection was performed for the 61 patients who retested positive; reasons for subsequent test included high clinical suspicion despite negative test (51%), testing as part of disposition planning (5%) or prior to undergoing a procedure (7%), testing prior to hospital transfer (3%), inconclusive first COVID test (2%), as well as unclear reason for testing (31%). Clinical suspicion included a wide variety of symptoms and findings including abnormal imaging, new-onset fever, hypoxia, shortness of breath, and known contact with a patient with COVID-19. 40% of patients who retested positive did not have symptoms on admission. The mean number of days between first and second test was 2.5 days (IQR: 1-2 days).
A multivariable logistic regression to predict initial COVID positivity was performed with the full equation of the model with covariates supplied in in S1A and S1B Fig. The most important variables in the logistic regression, as measured by the absolute value of their zscore, to predict increased risk of COVID positivity were higher age, black race, lower initial oxygen saturation, higher initial temperature, and lower white blood cell count.
The model was then applied to predict which patients would retest as COVID positive in the validation cohort. The AUROC of the model to predict this outcome was 0.76 (95% CI 0.70-0.83) with AUROC curve displayed in Fig 2. The precision-recall curve is provided in

PLOS ONE
False negative SARS-CoV-2 testing  Based on the precision-recall curve, a cutpoint of >90 th percentile for the probability per the logistic model was used as the predictor for whether a patient who initially tested negative for COVID would retest positive. At this cutpoint, the model predicts that 536 patients in the validation cohort are COVID positive; 35/536 were indeed COVID positive on retest (6.5%) or one of every 15 patients; notably this would capture 57% of the total false negative patients. If this model threshold is applied over all initially COVID negative patients, 35/2,680 (1.3%) would be captured, equating to one true positive per 77 tests.

Discussion
In this study, we assessed the performance of a model for predicting which patients who are initially deemed COVID-negative may retest positive. Our model used variables which are routinely measured for hospitalized patients and displayed good performance to discriminate which patients, when retested, would retest positive. Several variables appeared important for predicting which patients may need to be retested for COVID; increased age, lower oxygen saturation, higher temperature, and lower white blood cell count were associated with COVID positivity. These predictive variables are concordant with previous models of COVID positivity [15,16].
We chose a cutpoint of model risk prediction that maximized the sensitivity of patients correctly identified while minimizing the number of patients who would need to be tested. At the 90 th percentile of model risk score, we determined a 'number needed to test' ranging from best to worst case scenario of 15 to 77 patients, respectively. The worst case assumes the unlikely scenario where zero of the patients who initially tested negative and never retested (-/0) truly had COVID; thus, the true number needed to test is very likely lower than this upper bound.
Our study has several strengths. First, our model was built and tested on a very large patient dataset with data from 6 hospitals capturing a broad diversity of patients and clinical settings. Second, we used readily available data elements from the EHR which promotes ease of integration of such a model, rather than more complicated modeling approaches which may require

PLOS ONE
False negative SARS-CoV-2 testing non-EHR solutions such as cloud computing to apply. Our model does not require measurement of biomarkers, cytokines, or other specialized clinical measurements. Third, our model had robust performance despite being trained over a very broad population of hospitalized adults with COVID tests and was validated in a fundamentally different population than that in which it was derived. We argue that the model is thus broadly generalizable for hospitalized patients. For ready deployment of the model, institutions may apply the model formula presented in S1A Fig and selecting a cutpoint that aligns with the goals and testing capabilities of the institution (as per above we highlight a cutpoint at the 90 th percentile).
Our study should be viewed in light of several weaknesses. First, our risk model demonstrated moderate performance, thus we do acknowledge that many patients would need to be retested to find a single COVID positive patient. Second, our model was built from and applied to patients who had vital signs, a basic metabolic panel, and a complete blood count measured on admission; thus the model would not be generalizable to patients who may not have vital signs or laboratory values obtained (e.g psychiatric patients or routine obstetric patients). Third, our study is retrospective in nature and we are unable to conclude the efficacy of the implementation of this model for retesting. Another limitation is that our model was evaluated on patients who were tested twice for COVID; there were many patients who were COVID negative on presentation and never retested, therefore we are unable to provide a clear number-needed-to-test as some of these patients may have been false negatives.
We propose that by building and embedding a model using variables commonly available in the EHR, hospitals could flag patients for targeted retesting, potentially reducing nosocomial spread of COVID-19. Testing between 15 and 77 patients to find a single COVID negative patient who is truly positive should be considered in light of several logistic concerns. On one hand, this is a large amount of testing which may bring about issues of false positive COVID tests and significant expenditure of resources. Conversely, if a health system has ample COVID testing capabilities or capabilities to consider pooled COVID testing, this approach may be reasonable. We also argue that the effects of missed COVID positive patients may be profound at an institution with potential infection of other patients within a ward or infection of healthcare workers and other hospital staff who may believe the patient is 'ruled out' for COVID. Further investigation is warranted to determine the cost effectiveness of an algorithm-guided retesting approach.

Conclusions
Our study is the first description of and model development for patients who are initially tested negative for COVID on hospitalization but are later retested and found to be COVID positive. We show that a pragmatic model can be constructed to predict which patients should be retested for COVID and found a reasonable number-needed-to-test between 15 and 77 hospitalized patients. Further research is needed to determine the cost-effectiveness of implementing a retesting approach as well as its efficacy in clinical practice. Precision is the positive predictive value of having a COVID positive test and recall is the sensitivity of having a positive COVID test. Sensitivity in our testing cohort is 0.57 and the positive predictive value is 0.61 as can be seen on this graph. (TIF) S1 Table. Demographic and clinical patient characteristics in cohort by COVID testing status.