Performance verification of the Abbott SARS-CoV-2 test for qualitative detection of IgG in Cali, Colombia

Background Adequate testing is critically important for control of the SARS-CoV-2 pandemic. Antibody testing is an option for case management and epidemiologic studies, with high specificity and variable sensitivity. However, characteristics of local populations may affect performance of these tests. For this reason, the National Institute of Health (INS) and regulatory agencies in Colombia require verification of diagnostic accuracy of tests introduced to the Colombian market. Methods We conducted a validation study of the Abbott SARS-CoV-2 test for qualitative detection of IgG using the Abbott Architect i2000SR. Participants and retrospective samples were included from patients with suspected SARS-CoV-2 infection, age ≥18 years, and ≥8 days elapsed since initiation of symptoms. Pre-pandemic plasma samples (taken before October 2019) were used as controls. We estimated the sensitivity, specificity and agreement (kappa) of the Abbott IgG test compared to the gold standard (RT-PCR). Results The overall sensitivity was 83.1% (95% CI: 75.4–100). Sensitivity among patients with ≥14 days since the start of symptoms was 85.7%, reaching 88% in samples collected from patients with COVID-19 symptoms onset >60 days. Specificity was 100% and the kappa index of agreement was 0.804 (95% CI: 0.642–0.965). Conclusions Our findings show high sensitivity and specificity of the Abbott IgG test in a Colombian population, which meet the criteria set by the Colombian INS to aid in the diagnosis of COVID-19. Data from our patient groups also suggest that IgG response is detectable in a high proportion of individuals (88.1%) during the first two months following onset of symptoms.

Data from our patient groups also suggest that IgG response is detectable in a high proportion of individuals (88.1%) during the first two months following onset of symptoms.
In order to provide critical information for the use of serological tests in Colombia, we evaluated the performance, in terms of sensitivity and specificity, of the Abbott IgG test to aid in the diagnosis of COVID-19 in patients who consult at the Fundación Valle del Lili (FVL) University Hospital in Cali, Colombia. Our primary aim was to determine whether the Abbott IgG test achieved the goals of sensitivity �85% and specificity �90% for detection of SARS-CoV-2 infection, with reference to the gold standard of RT-PCR, as mandatory performance parameters defined by the Colombian INS. In addition, we sought to estimate the concordance between the performance of the Abbott IgG test and the gold standard of RT-PCR.

Ethics statement
This research was approved and monitored by the Institutional Ethical Review Board (IRB) of Fundación Valle del Lili -FVL-(approval number: 286-2020) and Centro Internacional de Entrenamiento e Investigaciones Médicas -CIDEIM-(Approval number: 10-2020). Written informed consent was obtained from all prospectively recruited participants or their legal representatives, when unable to consent due to clinical reasons (e.g., ICU hospitalization). A waiver of consent for retrospective stored samples was approved by the IRBs. Medical decisions were not affected by participation in this study.

Study design and population
We conducted a diagnostic secondary validation (verification) [11] study with prospective and retrospective participants. The prospective enrolment of participants was carried out at FVL, and retrospective (stored samples) were selected at CIDEIM and FVL in Cali, Colombia. Reporting of the study follows the Standard for the Reporting of Diagnostic Accuracy Studies (STARD 2015) guidelines (S1 Table) [19]. Two groups of participants were planned for enrolment in this study: Group 1 (patients with suspected SARS-CoV-2 infection): inpatients and outpatients �18 years of age seeking healthcare in FVL, with suspected COVID-19 and �8 days after symptom onset, were invited to participate in the study (prospective participants). In addition, stored serum samples from patients with RT-PCR confirmed infection with SARS-CoV-2 were selected for inclusion in the study if relevant clinical information was available (retrospective participants). Exclusion criteria included immunosuppression (including HIV infection), autoimmune diseases and immunosuppressant drug treatment.
Group 2 (pre-pandemic controls): plasma samples obtained from EDTA anti-coagulated peripheral blood from patients consulting during January 2015 -September 2019, for unrelated pathologies (cutaneous ulcers), were selected from the biological specimen collection of CIDEIM. These corresponded to samples from patients �18 years of age, with diagnosis of cutaneous leishmaniasis (CL) and a negative HIV test.
Clinical samples: nasopharyngeal (NP) aspirate samples were obtained from all patients with suspected SARS-CoV-2 infection (i.e. Group 1), using 3mL sterile saline buffer. One 15 mL sample of venous blood was collected from each study participant with onset of symptoms �14 days. All serum samples were kept refrigerated until analysis (time to assay ranged between 1 and 6 hours after sample procurement). chemiluminescent reaction measured by relative light units (RLU). The presence of SARS--CoV-2 IgG antibodies is determined by comparing the RLUs in the reaction containing the evaluated sample vs. the RLUs in the calibration control. The signal/cut off index (S/C) used for this study was 1.4 as recommended by the manufacturer (<1.4 negative, and �1.4 positive for anti-SARS-CoV-2 antibodies). Assay performance metrics met or exceeded manufacturer specifications as per product insert.

Reference standard: SARS-CoV-2 RT-PCR
The RT-PCR assays were performed according to routine procedures at the FVL clinical laboratory. The SARS CoV-2 RT-PCR assays Allplex™ SARS-CoV-2 (Seegene, run on Bio-Rad CFX96 platform), AccuPower1 SARS-CoV-2 Real-Time RT-PCR (Bioneer, run on Bioneer ExyStation™ 16 platform), GeneFinder™ COVID-19 Plus RealAmp (OSANG Healthcare, run on ELITe InGenius™ platform), BD SARS-CoV-2 (Becton, Dickinson, run on BD MAX™ platform), or VIASURE SARS-CoV-2 Real Time PCR™ (CerTest Biotec, run on QIAGEN Rotor-Gene-Q platform), were performed as reference standards. All RT-PCR assays complied with INS and FVL laboratory performance criteria. Only samples with a signal above the threshold in the relevant RT-PCR viral gene target regions for each assay were considered positive, as per the manufacturer instructions and internal standardized operating procedures (SOPs) at FVL. All assays and reagents were stored and handled following manufacturers' instructions. All operators performed the assays following the internal SOPs of FVL. Staff performing the index test were blinded to the results of RT-PCR while staff performing the reference standard were blinded to results of the index test.

Clinical and laboratory data collection
Clinical and demographic data, including time since start of symptoms, disease severity and outcomes, history of autoimmune diseases and immunosuppression, were collected from the clinical records of patients or at the time of enrollment using a questionnaire. Laboratory results were entered into a case report form, sourced from the original laboratory information system records by a research assistant. All data were captured in a dedicated database utilizing Research Electronic Data Capture (REDCapTM, www.project-redcap.org), hosted at the FVL data center.

Sample size
Based on the Colombian INS guidelines for performance verification studies for SARS-CoV-2 diagnostic tests [20,21], a minimum of 16 symptomatic RT-PCR positive patient samples are required to meet regulatory criteria, considering goals of sensitivity �85% and specificity �90% for detection of SARS-CoV-2 infection, with reference to the gold standard of RT-PCR. For the power calculation, we used parameter values from the test insert [13]: 100% for sensitivity and 99.6% for specificity. We also assumed that study participants would be recruited prospectively from a group in which 9% would be symptomatic and RT-PCR positive. This group needed to number 178 in order to achieve the required sample size for sensitivity (9% of 178 is 16).
Time constraints imposed by the unfolding pandemic required this recruitment plan to be changed, and assessment of specificity was expedited by using a set of 61 banked pre-pandemic plasma samples as described above. Simulation of 100,000 repeated samples of this sample size, with 99.6% specificity, showed a power of 99.8% to establish specificity more than the 90% threshold, using the statistical methods described in the following section.

Statistical analysis
For the primary objective, one-sided 95% confidence intervals are presented (since the hypotheses about sensitivity and specificity were one-sided, �85% y �90%, respectively), as well as two-sided 90% confidence intervals, using Wilson's method [22]. This analysis was repeated for subgroups defined in terms of time since the onset of symptoms. To determine agreement between the index test and the reference standard, as well as the crude agreement (proportion of cases in which the tests agree), the kappa index was calculated, the latter being a chanceadjusted measure of agreement [23]. Landis & Koch's descriptors for values of kappa were used [24]. Predictive values were calculated as functions of sensitivity, specificity and prevalence by standard identities [25]. Patients with invalid results or missing data on results of reference or index tests were excluded from analysis. Analysis was done using the R software, version 3.6.3.

Results
Between September 17 th and October 9 th , 2020, participants with presumed SARS-CoV-2 infections were invited to participate in this study. Overall, 260 potentially eligible retrospective participants were identified, of whom 48 had stored and available serum or plasma samples for IgG testing. In addition, 87 patients with RT-PCR confirmed SARS-CoV-2 infection were eligible to participate by donating a blood sample; among them, 49 were included in the study. Reasons for exclusion are detailed in Fig 1. No adverse events from performing the index test or the reference standard were reported by the specimen donors.
Most of the Group 1 participants (prospectively or retrospectively recruited patients with symptoms of COVID-19) were male (60.2%), with a mean age of 51.5 years (SD = 16.0). Fever, cough and difficulty in breathing were the most common symptoms (Table 1). Time from the initial symptoms until serologic testing was � 14 days for 92.8% of participants, and > 60 days for 50.6% of participants. Median time between symptom initiation and respiratory specimen sampling for molecular testing was 7 days. Regarding the clinical presentation, most of the patients were classified as uncomplicated COVID-19 cases (35/83, 42.1%), followed by patients with severe disease (36.1%). Uncomplicated cases were defined as patients who did not consult the emergency room and were managed on an outpatient basis. Mild disease severity was considered as those patients that were hospitalized and did not require intensive care, and severe cases corresponded to patients who required intensive care. Four patients (4/83; 4.8%) had a fatal outcome.
For the pre-pandemic group, 61 eligible plasma samples from adult patients diagnosed with cutaneous leishmaniasis were identified in the biological collection of CIDEIM; of these, 59 met the quality standards for inclusion in the study. Most plasma samples (75%) were obtained from male participants with average age of 38 years (Fig 1).

Sensitivity, specificity and agreement
The sensitivity of the index test (in Group 1), relative to RT-PCR, is shown in Table 2: overall it was 83.1%. The pre-specified subgroups, in terms of time elapsed since symptom onset, were 8-13 days and �14 days. A majority (93%) were in the latter category, and among these the sensitivity was 85.7%. Of samples collected from patients with onset of COVID-19 symptoms >60 days, 88% tested positive by the index test ( Table 2), indicating that IgG responses can be tracked in a high proportion of individuals during the first two months following onset of symptoms.
The specificity of the test (in Group 2) was 100%, with all 59 samples testing negative on the index test. This means that the point estimate of the positive predictive value (PPV) is 100%.  . For a prevalence up to 33.3%, the NPV is over 90%.
As shown in Table 3, the kappa index of agreement was 0.804 (95% CI 0.642-0.965), which is in the range considered by Landis & Koch as "almost perfect" [24]. The crude agreement

Post-hoc analyses: Exploring potential contributors to discrepancies between tests
As of October 2020, Abbott released technical information for implementation of a "grey zone" S/C index threshold (�0.49 and <1.4) in defining positive samples [26].  Table 2 (83 and 100% respectively), as a function of prevalence as determined by RT-PCR [25].
https://doi.org/10.1371/journal.pone.0256566.g002 We also explored possible relationships between Ct values, IgG S/C index and surrogates of disease severity (C-reactive protein -CRP-, D-dimer, lactate dehydrogenase -LDH-, ferritin, and interleukin 6 -IL6-), as possible contributors to the discrepant RT-PCR and Abbott IgG tests results, but no correlations were found. Sample quality was excluded as a possible contributor to discrepant RT-PCR and IgG results, since positive and significant correlations were found between different parameters of disease severity [27], within our patient cohort (Fig 3): CRP levels were positively and significantly correlated to the neutrophil/lymphocyte index (NLI), LDH and D-dimer (p<0.00001, Spearman ρ = 0.556, 0.703, 0.538, respectively), while ferritin levels were positively and significantly correlated to LDH and IL6 (p<0.01, Spearman ρ = 0.622, 0.536, respectively).

Discussion
With the increasing availability of anti-SARS-CoV-2 vaccines and the growing implementation of national vaccination strategies, quantitative and qualitative detection of anti- SARS-CoV-2 antibodies has re-emerged as a central tool for epidemiological surveillance, in addition to supporting the diagnosis of infection. Here, we have verified the performance of the Abbott IgG test to aid in diagnosis of COVID-19 in adult patients consulting FVL, a highlevel referral university hospital in Cali, Colombia. Our data show high sensitivity and specificity parameters when compared to the standard RT-PCR, indicating that Abbott IgG test meets the criteria set by the Colombian INS to aid in the diagnosis of COVID-19. It also meets the WHO criteria for specificity (�97%) in tests aimed to detect prior SARS-CoV-2 infection [28]. Agreement with the molecular test was very good (kappa 0.804, 95% CI: 0.642-0.965), and the estimated sensitivity of the test remained above the threshold of �85% (as established by the Colombian guidelines for antibody testing), two months after infection, supporting its use in this population.
These results are similar to those reported in another study conducted by the Colombian INS, which showed sensitivity of 85.2% of the Abbott IgG test in RT-PCR positive symptomatic patients (n = 260), increasing to 97.2% in those patients with history of hospitalization due to COVID (n = 147) [20]. This study followed the protocol for secondary validation of the INS [11,29], which sets a standard for the evaluation of diagnostic technologies for SARS-CoV-2 in the country and contributes to comparability of evaluations of tests across laboratories. Disease severity is a known factor related to the ability of current tests to detect antibody against SARS-CoV-2 [30]; in our study, most of the cases were mild and only 36% had severe disease, which may partly explain the estimated sensitivity in this study.
Another factor influencing the agreement between RT-PCR and serological tests is the time of sampling for each of the tests: samples obtained during the first 5-10 days after onset of symptoms are better suited for RT-PCR-based diagnosis as viral loads will likely be near their peak, while informative serum samples for IgG detection are best obtained after 14 days following onset of symptoms, in accordance with the time required for mounting an antibody response to SARS-CoV-2 infection [3,31,32]. In our cohort, among the 14 patients with RT-PCR positive/IgG negative results, only 3 corresponded to samples obtained between 8-13 days after onset of symptoms. Thus, other factors beyond time-to-sampling may contribute to this difference.
Based on the point estimates of sensitivity and specificity from our study, for a prevalence up to 33.3%, the NPV is over 90%. Colombia has recently completed the first nation-wide multicentric seroprevalence study of total (IgG/IgM) anti-SARS-CoV-2 antibodies, conducted between October and November, 2020 [33]. Estimated seropositivity frequencies ranged between 27% and 59% [33,34], using the SARS-CoV-2 Total (COV2T) Advia Centaur-Siemens chemiluminescent immunoassay [34]. Considering the result of seroprevalence in Cali, which was 27% (CI 95%: 22-32%) [34], we estimate the negative predictive value of this test to be currently close to 93%.
Decline of antibody titers and seropositivity has been reported for different tests, including the Abbott IgG [30,[35][36][37][38]. In our study, sensitivity of the test in the subset of patients >60 days from symptom onset was 88%, indicating that IgG responses can be tracked in individuals during the first two months. This coincides with previous reports of a median half-life of antibodies detected by Abbott of 86 days [35], and a decay in seropositivity after 90 days reported in a Brazilian population evaluated with this test [39]. These aspects should be considered when analyzing and interpreting serosurveys conducted globally with the Abbott assay, since the true cumulative number of SARS-CoV-2 infections may be underestimated. However, knowledge of the test's performance and antibody dynamics would help to account for these aspects in estimation of seroprevalence [36,40].
An important aspect of performance verification studies is the selection of the "reference standard". For antigen, as well as antibody-based diagnostic tools for COVID-19, the reference standard has been defined as RT-PCR by national and international guidelines [29,41]. However, a number of RT-PCR gene targets, commercial (and in-house) kits and amplification platforms are available for viral detection. Due to variability in the methodological conformation of these processes, these tools also differ in the amplification efficiencies and limits of detection (eg. lower limit of detection -LOD-or lower limit of quantitation-LLOQ-) which, as an example, can range from 3.8 to 23 copies/mL (LOD95) [42]. These differences introduce limitations that need to be considered for generalizability of the results and contrasts with other studies.
Ten of the 14 patients with discordant results were diagnosed by amplification of E and RdRP targets, while the Abbott IgG tests detects antibodies against the viral nucleocapsid protein. Differences in the molecular targets could contribute to the observed discrepancies and call for a more refined definition of what the reference standard for COVID-19 diagnostics is.
A limitation of our study was that separate groups of samples were used for the calculation of sensitivity and specificity, the latter coming from patients without respiratory symptoms, and lack of inclusion of asymptomatic patients which limits our ability to make inferences in this important group. Plasma and serum samples were used for this study, as indicated by the manufacturer. These samples are aligned with the acceptable characteristics of the WHO Target Product Profiles for antibody tests. However, samples that are the easier to collect (e.g. blood spots) can be developed as innovations to facilitate sampling and accessibility of the technology [28]. Presence of SARS-CoV-2 variants, and their potential effects on test performance, was not assessed in this study. Despite this, the population enrolled for testing was representative of the spectrum of patients and age groups seeking care in a reference facility in Colombia, with a breadth of clinical presentations (from mild to severely ill). Results from this study provide information on the specific use of this antibody test in the population for which it is intended to be used and highlight some important limitations in the interpretation of results based on the current reference standard definitions and duration of seropositivity.
Supporting information S1