Real-life clinical sensitivity of SARS-CoV-2 RT-PCR test in symptomatic patients

Background Understanding the false negative rates of SARS-CoV-2 RT-PCR testing is pivotal for the management of the COVID-19 pandemic and it has implications for patient management. Our aim was to determine the real-life clinical sensitivity of SARS-CoV-2 RT-PCR. Methods This population-based retrospective study was conducted in March–April 2020 in the Helsinki Capital Region, Finland. Adults who were clinically suspected of SARS-CoV-2 infection and underwent SARS-CoV-2 RT-PCR testing, with sufficient data in their medical records for grading of clinical suspicion were eligible. In addition to examining the first RT-PCR test of repeat-tested individuals, we also used high clinical suspicion for COVID-19 as the reference standard for calculating the sensitivity of SARS-CoV-2 RT-PCR. Results All 1,194 inpatients (mean [SD] age, 63.2 [18.3] years; 45.2% women) admitted to COVID-19 cohort wards during the study period were included. The outpatient cohort of 1,814 individuals (mean [SD] age, 45.4 [17.2] years; 69.1% women) was sampled from epidemiological line lists by systematic quasi-random sampling. The sensitivity (95% CI) for laboratory confirmed cases (repeat-tested patients) was 85.7% (81.5–89.1%) inpatients; 95.5% (92.2–97.5%) outpatients, 89.9% (88.2–92.1%) all. When also patients that were graded as high suspicion but never tested positive were included in the denominator, the sensitivity (95% CI) was: 67.5% (62.9–71.9%) inpatients; 34.9% (31.4–38.5%) outpatients; 47.3% (44.4–50.3%) all. Conclusions The clinical sensitivity of SARS-CoV-2 RT-PCR testing was only moderate at best. The relatively high false negative rates of SARS-CoV-2 RT-PCR testing need to be accounted for in clinical decision making, epidemiological interpretations, and when using RT-PCR as a reference for other tests.


Introduction
During the study period 4 March-15 April 2020, 22,821 individuals underwent SARS--CoV-2 RT-PCR testing with a total of 1,938 test positive specimens at HUSLAB laboratory, Helsinki University Hospital, Finland, which serves the Helsinki Capital Area in Finland. S1 Fig shows the number of daily specimens and proportion of positive specimens at HUSLAB during the study period.
We reasoned that the pretest probability, i.e. the probability for testing positive, would be different for inpatients on the COVID-19 cohort wards and outpatients and decided to study these two populations separately.
Outpatient cohort. During the study period, the tested patients were generally symptomatic but the criteria which prompted testing varied slightly over time (S1 Table). Initially, persons returning from recognized epidemic areas and exhibiting respiratory symptoms within 14 days of return were primarily tested. The criteria were soon expanded to include symptomatic persons with risk factors, and all symptomatic healthcare workers. Outpatients fulfilling testing criteria were recorded manually with some clinical details on a line list. These lists were the most systematically collected dataset for the tested outpatients so we chose to sample our outpatient cohort from these lists. We performed systematic (quasi-random) sampling by including every fifth individual from the line lists. Along with practical advantages, this approach decreased the probability of sampling dependent individuals, such as members of the same family. Besides the clinical details on the line lists, we checked electronic medical records for comorbidities and other demographic details. Other exclusion criteria for outpatients were age below 18 years and residence outside of the Helsinki and Uusimaa Hospital District. Altogether 1,814 eligible outpatients (mean [SD] age, 45.4 [17.2] years; 69.1% women; 41.2% healthcare workers) were included in the study (Fig 1, Table 1).
Inpatient cohort. Patients with fever, respiratory or gastrointestinal symptoms, and/or difficulty in breathing were suspected for COVID-19 and treated in designated cohort wards: 11 wards and 6 ICUs in eight hospitals (list of wards in S2 Table). All patients aged >18 years admitted to one of the cohort wards were eligible for the study and only patients without a SARS-CoV-2 RT-PCR performed at the HUSLAB laboratory were excluded. These inpatients formed a consecutive case series of 1,194 individuals (mean [SD] age, 63.2 [18.3] years; 45.2% women) (Fig 2, Table 1).
Altogether 12.3% (147/1194) of inpatients and 14.1% (256/1814) of outpatients were sampled for SARS-CoV-2 RT-PCR testing more than once during a specific disease episode ( Table 1). The descriptive statistics of the tested individuals and subgroups are presented in Tables 1 and 2

Index testing
SARS-CoV-2 RT-PCR testing was conducted by one of the following methods (gene targets): Cobas1 SARS-CoV-2 test kit on the CobasC1 6800 system (orf1ab and E) (Roche Diagnostics, Basel, Switzerland), Amplidiag1 COVID-19 test (orf1ab and N) (Mobidiag, Espoo, Finland,) and a laboratory-developed test based on a protocol recommended by WHO (N) (non-exponential amplification curves and amplification with cycle treshold values >34 in the laboratory-developed test were reanalysed with either Cobas or Amplidiag) [16]. The specifics and analytical performance of these methods in our laboratory setting have been described previously [4]. Samples were collected with nasopharyngeal swabs (FLOQSwab, Copan, Brescia, Italy) (proportion of nasopharyngeal samples: 60% inpatients, 58% outpatients) but oropharyngeal swabs were used in a proportion of patients (11.1% inpatients, 16.4% outpatients) due to global shortage of nasopharyngeal swabs. Other specimens types (0.7% inpatients, 0.2% outpatients) were tracheal, brochial, and sputum specimens, as well as sinus and lung biopsies The specimen type was unknown in 28.2% of inpatients and 25.5% of outpatients.
Samples were analysed in median 24 hours after collection. As per our laboratory's standard operating procedure, samples with failed results were reanalysed and only qualified results were included. 13 weak positive results which became positive in >35 PCR cycles (7 outpatients and 6 inpatients) were included.

PLOS ONE
Clinical sensitivity of SARS-CoV-2 RT-PCR

Reference standard used in the study
Since no gold standard for COVID-19 diagnosis exists, we decided to use high clinical suspicion for COVID-19 as the reference standard for the RT-PCR test. We systemically graded the clinical suspicion for COVID-19 based on a combination of symptoms, clinical findings, and recorded exposure to laboratory confirmed COVID-19 cases or travel history to epidemic areas. The criteria were based on CDC's and ECDC's case definitions for COVID-19 in April 2020. Electronic patient records or line lists were reviewed by a team consisting of senior residents in Infectious Diseases and Clinical Microbiology, medical students, and research nurses. Patients' medical history, symptoms, and epidemiological information were collected into a Microsoft Access1 database according to the pre-defined criteria. Chest X-ray and CT findings indicating chest infection were recorded according to radiologist's interpretation. The team collecting the data were aware of the SARS-CoV-2 RT-PCR test result when collecting the data. The clinical suspicion for COVID-19 disease was graded as follows: 1. 'Not suspected' patients were deemed by the clinician as non-compatible with COVID-19 disease or were diagnosed with another acute disease.
2. 'Not excluded' patients had no other diagnosis recorded explaining their current symptoms, and COVID-19 disease could not be excluded.
3. 'High suspicion' patients were considered to suffer from a probable COVID-19 if the physician in charge of the treatment recorded the suspicion on clinical grounds to the electronic patient record, OR the patient fulfilled at least one of the following criteria: a. respiratory symptoms and/or fever and/or diagnostic finding for infection in chest Xray/CT and travel history to epidemic regions at the time of the study i.e. Tirol/Austria, Northern Italy, Spain, Iran, South Korea, or China during the preceding 14 days.
b. respiratory symptoms and fever and diagnostic finding in chest X-ray/CT during April 2020 (time criterion based on the changed epidemiological situation).

Patient demographic and clinical characteristics by clinical suspicion of COVID-19 in inpatients (A) and outpatients (B).
Three patients in the inpatient cohort could not be classified. 'Not suspected': deemed by the clinician as non-compatible with COVID-19 disease or were diagnosed with another acute disease; 'Not excluded': no other diagnosis recorded explaining their current symptoms, and COVID-19 disease could not be excluded. 'High suspicion': physician in charge of the treatment recorded the suspicion on clinical grounds to the electronic patient record, OR the patient fulfilled a set of pre-defined clinical and exposure criteria (see Methods); 'Laboratory confirmed': tested positive from the first sample taken or with repeated testing with SARS-CoV-2 RT-PCR.    c. respiratory or gastrointestinal symptoms or fever or diagnostic finding in chest X-ray/ CT and a close contact with a laboratory confirmed COVID-19 patient during the preceding 14 days prior to disease onset.

'Laboratory confirmed' patients (regardless of their clinical presentation)
were those individuals that tested positive for SARS-CoV-2 RT-PCR during the study period. At the time of the study, only symptomatic individuals were tested, so the laboratory confirmed group does not include any asymptomatic cases.

Sample size calculation
We estimated the minimum sample size needed for outpatients based on Bujang et al. [17], with a minimal statistical power of 80% and type I error <0.05. Sample size calculation for sensitivity requires a prevalence estimation in the target population. During the study period, the median positivity rate was 9.6% (S1 Fig) so we estimated a 10% prevalence for the tested population. Published estimates from small cohorts [18,19] available at the time reported sensitivity of the SARS-CoV-2 RT-PCR to be on average 70%. Based on these estimates the minimum sample size of outpatients for null hypothesis of sensitivity of 70% was 1,550. We performed another sample size calculation by using the nomogram described by Carley et al. [20] which accounts for confidence intervals (CI): with CI of 93% and prevalence 10%, 70% sensitivity would require a minimum sample size of 1,600.

Group comparisons
To detect if the high suspicion and the laboratory confirmed groups were comparable and if there would be significant confounding factors between the groups that were used for sensitivity calculations, we compared demographic and clinical characteristics between them (Tables  1 and 2). To compare these two groups with respect to the categorical variables, we used the Pearson's Chi-squared test without or with Yate's correction for continuity or the Fisher's exact test, as appropriate. For the extensive contingency tables with the excess of small (expected) frequencies, we assessed the simulated p-value of the Fisher's test based on 20,000 replicates. The differences in the age distribution were assessed using the Mann-Whitney Utest. These comparisons were performed separately within the inpatients and outpatients.

Analysis of sensitivity
Two approaches were deployed to calculate SARS-CoV-2 RT-PCR sensitivity: repeat-tested laboratory confirmed patients, and patients with a high clinical suspicion of COVID-19. For the laboratory confirmed patients, the sensitivity values were calculated based on the first RT-PCR test of each patient. All patients who tested RT-PCR positive during a specific disease episode were considered laboratory confirmed. Of these, the first samples with a negative RT-PCR test result were considered false negatives, while the first samples with a positive result were considered true positives. The same disease episode would include samples taken �14 days apart.
For the high clinical suspicion group, the sensitivity was calculated by considering those patients that were graded as high suspicion but never tested positive as false negative cases.
The 95% CIs for (binomial) sensitivity were calculated by using the Wilson-Score method, which is based on inverting the z-test for a single proportion and provides more reliable coverage than the alternatives. We performed comparisons of sensitivity between the subgroups by using the independent sample tests for binomial proportions, including Chi-squared test without or with Yate's correction for continuity or the Fisher's exact test, as appropriate. For the extensive contingency tables with the excess of small (expected) frequencies, we assessed the simulated p-value of the Fisher's test based on 20,000 replicates. We set the confidence level at 5%. All calculations were performed using the R software.

Demographics of the study population and clinical comparison between study groups
In all, 3,008 individuals were eligible for this study (Figs 1 and 2): 1,814 outpatients and 1,194 inpatients. Altogether 83 eligible outpatients (4.6%) and 3 inpatients (0.3%) were excluded from the final analysis due to insufficient data for the grading of clinical suspicion.
The inpatients were on average older than outpatients, comorbidities were more common, and the male sex was slightly overrepresented (Table 1). Healthcare workers and women were overrepresented in the outpatient population reflecting the distribution of the whole tested population, as reported before [21].
All patients were categorized by a clinical grade of suspicion (not suspected / not excluded / high suspicion / laboratory confirmed) for COVID-19 based on criteria described in Methods (Table 1). To detect if our grading created systematic bias or if there were significant confounding factors present between the groups, we compared test negative patients that were deemed as high suspicion to laboratory confirmed patients ( Table 2). There were no significant differences in sex or age distribution between these groups, but patients treated in the intensive care unit were overrepresented in the laboratory confirmed hospitalized patients (Table 2A). Laboratory confirmed patients were also more often febrile and had had contact with laboratory confirmed symptomatic COVID-19 cases (Table 2A and 2B). Laboratory confirmed patients also had more often gastrointestinal symptoms than patients in the high suspicion group (Table 2A and 2B).
In the outpatients, the high suspicion group had a higher proportion of females and healthcare workers compared to laboratory confirmed cases. This was expected based on the overall higher testing rate of both [21]. Again, the laboratory confirmed cases were more often febrile. Since our grading criteria included exposure to symptomatic COVID-19 patients or travel to epidemic areas, these factors were more common in the high suspicion group that tested negative, than in the laboratory confirmed group (Table 2B).

Sensitivity of the first SARS-CoV-2 RT-PCR in inpatients and outpatients
The sensitivity of SARS-CoV-2 RT-PCR was calculated with two different denominators (Table 3). We first calculated the sensitivity with laboratory confirmed cases, i.e. repeat-tested patients as a denominator, yielding the highest sensitivity estimates in this study, as follows: 85.7% for inpatients; 95.5% for outpatients, and 89.9% for all. Due to low number of repeattested patients (N = 11), the calculation for outpatients here is unreliable.
The sensitivity was then calculated by including in the denominator patients that were graded as high suspicion but never tested positive (from one or more tests conducted within the study period), yielding the following sensitivity values: 67.5% for inpatients, 34.9% for outpatients and 47.3% for all. Thus, the lowest calculated sensitivity estimate in this study was for outpatients with high suspicion.
The delay between disease onset and testing was longer for inpatients than outpatients ( Table 2). We could not detect a significant difference in the delay to first test between the laboratory confirmed cases and the high suspicion group in either cohort (S3 Fig, Fisher's Exact Test p = 1 when "No data" category excluded). However, for inpatients, information on the delay was missing more often in the high suspicion group (2.3%) compared to the laboratory confirmed (0.9%, p = 0.026) ( Table 2). For outpatients, information on the delay was missing less often in the high suspicion group (10.5%) as compared to the laboratory confirmed (12.2% p = 0.026) ( Table 2).
We could not detect a significant difference between the sensitivity of nasopharyngeal and oropharyngeal samples in the inpatients (p = 0.51, Chi-squared test), outpatients (p = 0.22) or all patients (p = 0.66) (S3 Table; S4 Fig). However, data on the specimen type was missing in 20.4% (inpatients) and 17.4% (outpatients) of the cases.

Delay between symptom onset and positive test result
To estimate the delay from disease onset for highest clinical sensitivity for SARS-CoV-2 RT-PCR, we calculated sensitivities for different time frames. To achieve reliable group sizes, both cohorts were pooled together. There was no significant difference in the test sensitivity according to delay from onset, calculated for the laboratory confirmed cases alone, and with the high clinical suspicion group included (P = 0.1013 Fisher's Exact Test for Count Data with simulated p-value, based on 20000 replicates; Fig 3). Detailed sensitivity calculations per delay from disease onset are presented in S4 Table. Discussion Wide-spread testing and contact tracing together with social distancing has been promoted as the tool that prevents new lockdowns-without clear understanding of how well the SARS--CoV-2 RT-PCR test performs. Here, we used clinical suspicion as the gold standard to estimate the clinical sensitivity of the test. We also included a sensitivity calculation based on the repeat-tested individuals where RT-PCR acts as the gold standard for itself.
A previous large-scale sensitivity estimate for SARS-CoV-2 molecular testing was based only on repeat-tested individuals [9,22]. This approach overestimates the sensitivity. Repeated testing is done mostly on inpatients who have a strong clinical suspicion, rendering high pretest probability. We sought to overcome this limitation by including a large cohort of outpatients. From an epidemiological point-of-view, understanding the clinical sensitivity for mild cases is important. An RT-PCR sensitivity of 64% for exposed family members systemically tested with serology was recently reported [23]. This is in line with our sensitivity estimation for inpatients. Another small study found an 86.2% RT-PCR sensitivity in symptomatic COVID-19 patients in comparison with convalescent antibody [24]. Interestingly, a recent Cochrane meta-analysis on thoracic imaging of COVID-19 patients found an 87.9% pooled sensitivity for chest CT and 80.6% for chest X-ray [25]. However, diagnostic imaging is mainly performed on inpatients that can overestimate the sensitivity. Our analysis was done in a low prevalence setting. Thus, the negative predictive value for the RT-PCR test was high (89%) for the outpatients even though the clinical sensitivity was low (35%), assuming all COVID-19 excluded cases were true negatives. High false negative rates reduce the negative predictive value of testing. This is particularly problematic when the prevalence of the disease increases. In such settings, it will impair effective use of wide-spread testing. For health-care facilities the message of our data is different: a single negative result cannot be trusted to rule out COVID-19 in patients with suitable symptoms. Our data show that the sensitivity of the repeat-tested inpatients was high (86%), and in line with previous reports on repeated testing [9,22]. When the sensitivity of the COVID-19 PCR test was judged based on the laboratory confirmed and high clinical suspicion patients the estimated sensitivity of the test dropped to around 68%. Our results emphasize the importance of repeated sampling but it also highlights the importance to evaluate the patient's clinical presentation carefully.
This study estimated test sensitivity both with repeat-tested patients and by using clinical suspicion as a gold standard. The estimated sensitivity (89.9%) for repeat-tested patients is probably an overestimation: samples from a single individual are not independent and there is often a clear clinical rationale, i.e. high clinical suspicion for COVID-19 that necessitates the repeated testing, leading to increased pretest probability. In addition, hospitalized patients often present with a more severe disease with higher viral loads and longer viral nucleic acid shedding than individuals with mild symptoms [26], again leading to overestimation of test sensitivity. Our data shows that laboratory confirmed cases in both outpatients and hospitalized patients were more often febrile than the high clinical suspicion cases even though almost all other symptoms were comparable between the groups. This could indicate that the more severe cases were more often detected with RT-PCR. In contrast, the estimate which included both laboratory confirmed and high suspicion outpatients (34.9%), is likely an underestimation as COVID-19 symptoms are shared with other respiratory infections. In all, we conclude that the group which included both laboratory confirmed and high suspicion inpatients likely yielded the most realistic sensitivity estimate (67.5%).
Generally the first symptomatic days are considered best for virus detection from the upper airways [27,28]. Due to the limited sample size our analysis could not detect a definitive timepoint for highest sensitivity. However, even with this under-powered estimation, we should have detected major trends.
The study had several limitations. All patients were considered symptomatic so the estimates cannot be generalized to asymptomatic patients. The clinical criteria were set based on the information available in April 2020. While the core symptoms have remained the same, understanding of COVID-19 presentations has since increased. The clinical diagnosis of COVID-19 is notoriously hard as the symptoms are variable and overlap with many other similar conditions. In the hospitalized patients the testing coverage for other viral pathogens was extensive, and circulation of influenza and RSV was very limited at the time. However, most outpatients in this study were not tested for other potential viral pathogens. Potential information bias was introduced by the sometimes undetailed clinical records of outpatients. Reporting bias for more detailed symptoms most likely exists, especially for the laboratory confirmed outpatient cases. The specimen types recorded in the sample referrals may have contained errors. While pre-defined clinical criteria were used for grading, the data were collected retrospectively and the data collectors were aware of the index test result.
Large scale molecular testing has permanently changed the practice of clinical microbiology. RT-PCR for SARS-CoV-2 detection has many limitations as a labor intensive test with a relatively slow throughput. This has led to unbearable delays in results. Multiple solutions are being developed: point-of-care viral antigen detection [29], sample pooling [30], and self-sampling [31]. All these approaches, which use RT-PCR as a reference, quite consistently report lower sensitivity than RT-PCR. It is thus evident that all our current testing options are far from optimal in detecting all COVID-19 cases. In controlling of the ongoing pandemic, we need focused research to find an appropriate balance in the tradeoff between test sensitivity, and speed and ease of testing in each epidemiological setting.  Table. List of COVID-19 cohort wards. Inpatients were admitted to these COVID-19 cohort wards during the study period. Both laboratory confirmed and suspected COVID-19 patients were admitted to the cohort wards, except for two wards (COV_KNKINF and COV_-KITEHO) to which only laboratory confirmed cases were admitted. (DOCX) S3 Table. Estimated SARS-CoV-2 RT-PCR sensitivity values in the laboratory confirmed and high suspicion group combined according to specimen type in the first SARS-CoV-2 RT-PCR test.