Evaluation of SARS-CoV-2 antibody point of care devices in the laboratory and clinical setting

SARS-CoV-2 antibody tests have been marketed to diagnose previous SARS-CoV-2 infection and as a test of immune status. There is a lack of evidence on the performance and clinical utility of these tests. We aimed to carry out an evaluation of 14 point of care (POC) SARS-CoV-2 antibody tests. Serum from participants with previous RT-PCR (real-time polymerase chain reaction) confirmed SARS-CoV-2 infection and pre-pandemic serum controls were used to determine specificity and sensitivity of each POC device. Changes in sensitivity with increasing time from infection were determined on a cohort of study participants. Corresponding neutralising antibody status was measured to establish whether the detection of antibodies by the POC device correlated with immune status. Paired capillary and serum samples were collected to ascertain whether POC devices performed comparably on capillary samples. Sensitivity and specificity varied between the POC devices and in general did not meet the manufacturers’ reported performance characteristics, which signifies the importance of independent evaluation of these tests. The sensitivity peaked at ≥20 days following onset of symptoms, however sensitivity of 3 of the POC devices evaluated at extended time points showed that sensitivity declined with time. This was particularly marked at >140 days post infection. This is relevant if the tests are to be used for sero-prevalence studies. Neutralising antibody data showed that positive antibody results on POC devices did not necessarily confer high neutralising antibody titres, and that these POC devices cannot be used to determine immune status to the SARS-CoV-2 virus. Comparison of paired serum and capillary results showed that there was a decline in sensitivity using capillary blood. This has implications in the utility of the tests as they are designed to be used on capillary blood by the general population.


Introduction
The emergence of SARS-CoV-2 has resulted in rapid research and development of commercial diagnostic tests by both laboratories and commercial manufacturers. The unusually fast pace of this development risks potential compromises on quality in the absence of rigorous independent evaluation and validation. It is therefore essential to ensure adequate performance of tests for population wide or individual use to prevent roll out of devices which add no, or at best, minimal clinical value to individual patients or the wider population. At worst, inadequately performing diagnostic tests can produce misleading clinical information with potential for harmful consequences.
The current most widely used diagnostic test for SARS-CoV-2 is based on real-time polymerase chain reaction (RT-PCR) amplification of viral RNA from an upper respiratory tract sample [1,2]. However, due to a limited time window of active infection, capacity constraints, and access to these tests mostly restricted to symptomatic patients, cases determined using this method underestimate the true burden of infection. In contrast, serological assays test for previous infection and are therefore a key additional tool for monitoring prevalence of infection within the population. Antibody tests can also be used as an aid in diagnosis where COVID-19 is suspected clinically but the PCR time window has passed [3]. Significant interest exists in the potential for use of these tests at an individual level to provide an indication of immune status and act as an 'immunity passport'. For countries where vaccine availability is limited, prescreening the population with antibody tests in order to identify individuals who may either not require vaccination or be suitable for a reduced vaccine dosing regimen may help optimise the use of limited vaccine resources.
Reliable antibody tests also have the potential to identify hospitalised COVID-19 patients who may benefit from the use of monoclonal antibody treatment. Ronapreve, a combination of Casirivimab and Imdevimab, is a monoclonal antibody treatment directed at the spike protein receptor binding domain on SARS-CoV-2 [4] which has been shown to significantly reduce 28 day mortality in seronegative hospitalised patients [5].
A large number of commercially available immunoassays have been developed to detect SARS-CoV-2 IgG, IgM, IgA and total antibodies [6]. Although the majority of antibody production is directed towards the more abundant N (nucleocapsid) protein, the S (spike) protein contains the receptor-binding domain responsible for host cell attachment, and antibodies to the S protein are therefore predicted to be neutralising [7].
In contrast to laboratory-based immunoassays, which require venous sampling and transport to centralised testing sites, lateral flow immunoassays (LFA) offer the potential to allow rapid, cheap, mass population antibody testing on capillary samples in the home environment. In order to offer clinical utility at a home population level, and relieve pressure on clinical services, the LFA must be able to reliably operate using capillary whole blood samples; the test must also offer sufficient ease of use and interpretation of results to be acceptable to the general 'lay-person' population. Additionally, if LFAs are to be used for population sero-surveillance they will have to be sensitive enough to detect the presence of antibodies in those who only suffered from mild disease or were asymptomatic. Of even greater importance is the test specificity when testing at a population level where the pre-test probability is low. Without adequate specificity the chance of a positive result being a false positive is considerable and this is of particular concern if these tests were to be used as evidence of immune status or for 'immunity passports'. It is currently unclear if the detection of antibodies to the COVID 19 virus by LFA confers immunity to the virus, and some manufactures fail to disclose whether the assay is directed towards the N or S protein.
Worldwide, over 200 LFAs have been produced to date [6]; for the majority of these devices only manufacturer reported performance data is available. CE marking of these devices does not require external validation, and the manufacturer's in-house validation process is often not publically available. The UK Medicines and Healthcare Products Regulatory Agency (MHRA) advise that any SARS-CoV-2 LFA should have sensitivity and specificity > 98% (95% CI 96-100%) in samples collected � 20 days post symptom onset irrespective of whether they are performed as a home self test kit or by a trained health care professional [8]. Despite claims by many manufactures to have achieved this target (see S1 Table), UK external validations of LFAs have so far failed to reproduce this [9][10][11].
As part of a nationwide evaluation of SARS-CoV-2 diagnostic tests, we carried out an independent validation of 14 POC SARS-CoV-2 antibody tests, including 13 LFAs. We assessed their specificity and sensitivity against the manufacturer's claims and determined compatibility with MHRA criteria. For tests that performed well against the MHRA criteria, we studied changes in sensitivity with increasing time from infection. To investigate suitability for home use, we evaluated ease of use and interpretation of these tests when performed directly by patients. Neutralising antibody status was determined for samples to establish whether the detection of antibodies by the LFA correlated with immune status.

Point of care (POC) device selection and the evaluation process
National Service Scotland, a Non-Departmental Public Body which provides advisory services to NHS Scotland, was approached by a series of commercial manufacturers for external evaluation of their diagnostic devices. The manufacturer claims and costing were reviewed and manufacturers who passed these initial checks were invited to send up to 100 POC test kits for initial evaluation. 14 POC devices designed to detect antibodies to the SARS-CoV-2 virus were evaluated. The tests included in the evaluation were: AbC-19, Alpha Pharma, Biomerica, Biozek, Fortress, Jiangsu, Lepu, Menarini Healgen, Mologic, Pharmact, Roche, Wuhan Easy Diagnosis, Wuhan Life Origin Biotech (Syzbio), and LumiraDX. 13 of the POC devices were LFAs, consisting of immunochromatography based cassettes, while the LumiraDX assay used a POC microfluidic immunofluorescence assay. Each device measured IgG and IgM except AbC-19, which measured IgG only, and LumiraDX, which gave an antibody result of unspecified subclass. Target antigens (either S or N) were not always disclosed by the manufacturers. The tests were performed as per the manufacturer's instructions. Typically, 2.5μL to 10μL of serum or capillary sample was pipetted into the sample well followed by a pre-specified volume of buffer supplied by the manufacturer (see S1 Table). Serum samples were analysed at room temperature in the laboratory at the Royal Infirmary of Edinburgh. The read time varied from 10 to 20 minutes depending on manufacturer's instructions. Photographic records of serum results were taken. Capillary samples were run as part of a research clinic. Antibody results were visually read and recorded as positive, weak positive, or negative for IgG and IgM. Positive and weak positive results were deemed an overall positive result.
In the first stage of the process up to 50 serum samples from RT-PCR confirmed SARS--CoV-2 positive individuals (mainly hospitalised patients) and up to 50 pre-pandemic SARS--CoV-2 negative samples were run. The number of days the sample was collected post symptom onset was recorded for the positive cases. Initial sensitivity and specificity were calculated and if the test kit performed within agreed parameters (typically defined as an IgG specificity of >98% and sensitivity of >95% at �day 20 post symptom onset) the company was taken forward to the second stage of evaluation and asked to provide further test kits. Although the MHRA definition suggests that SARS-CoV-2 LFA devices should achieve >98% sensitivity, the cut-off was reduced to >95% sensitivity for the purposes of this study as only one manufacturer's device was able to achieve the >98% cut-off. Further stages of the evaluation process which were carried out for a subset of kits included more stringent specificity testing, determination of batch to batch variation, sensitivity as time from infection increased, comparison of capillary to serum results, and evaluation of ease of use by study participants in a research clinic.

Overview of participants and samples
Serum samples from hospitalised patients with RT-PCR confirmed SARS-CoV-2 were collected between March and November 2020. Excess serum was stored at the point of discard from hospitalised COVID-19 patients using ethical permissions obtained through NRS BioResource.
Out-patients with previous RT-PCR confirmed SARS-CoV-2 infection were also recruited to two COVID convalescent research studies where participants were invited to donate serial blood samples (SR1407 BioResource study, n = 112 participants) and to provide capillary and serum blood samples (COVID-19 Antibody Test Evaluation (CATE) Study, n = 82 participants). For both studies the date of symptom onset and positive RT-PCR test result were recorded. Inclusion criteria were those aged >16 years old with previous RT-PCR confirmed diagnosis of COVID-19. Only 5 participants required hospitalisation following infection, and none of these required intensive care. Due to the limited access to COVID-19 testing at the start of the pandemic when most study recruitment occurred many study participants were health care workers. Frail or shielding individuals were excluded due to the study requirements for travel. Each participant provided between 1 and 5 serum samples. For participants providing serial samples these were collected between 29 and 224 days post PCR confirmed infection. Capillary and paired serum samples were collected as part of the CATE study research clinic between September and November 2020. Capillary samples were tested in the clinic setting with paired serum samples subsequently run on the same POC devices in the laboratory to assess concordance on capillary and serum samples. A subset of CATE study participants were asked to carry out one of the POC tests themselves including finger prick testing with the aid of the manufacturer's printed instructions. Health care workers intervened if the participant needed support in completing the testing and any intervention was recorded. Afterwards participants were asked to complete a questionnaire on their experience of performing the test. For a subset of other test kits capillary results were interpreted by both the participant (with the aid of a labelled diagram) and the researcher.
Negative controls consisted of venous samples collected prior to December 2019. For more stringent specificity testing, samples positive for rheumatoid factor, seasonal coronaviruses, other respiratory pathogens, CMV (cytomegalovirus) or EBV (Epstein-Barr virus) were used as negative controls along with samples from the national antenatal screening programme and SNBTS (Scottish National Blood Transfusion Service).
Wherever possible the same panel of samples were run on each POC device but due to limitations in sample volume not all devices were assessed with the same panel of samples. However, a similar variety of samples types were used to test each kit.

Neutralisation assay
Neutralising antibody assays were performed on serum samples using a pseudotyped chimeric virus, expressing the SARS-CoV-2 spike protein, as described elsewhere [12].

Statistical analysis
The primary outcome was the sensitivity and specificity of each POC test. For sensitivity, tests were compared against PCR-confirmed SARS-CoV-2 infection. Specificity of each POC test was evaluated against pre-pandemic negative samples, with all positives counting as false positives. The analysis included all available data for the relevant outcome and is presented with the corresponding 95% confidence intervals (Clopper Pearson).
For samples where neutralising antibody levels are available positive predictive value (PPV) and negative predictive value (NPV) are calculated at two NT 50 thresholds; � 50 and � 160.
For the comparison of the performance of each POC test between clinic capillary and serum samples we calculated the sensitivity and 95% confidence interval. The McNemar test was used to assess for significant difference between dependent groups. Agreement between serum and capillary samples was measured using the Cohen's Kappa. Results of the Kappa were interpreted as previously described (<0, poor agreement; 0.00-0.20, slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and >0.8, almost perfect agreement) [13].
All data were analysed using PRISM (Version 9) and SPSS, and a p value<0.05 was considered significant.

Ethics
Ethical approval for access to pre-pandemic stored samples and hospitalised COVID-19 patient samples was obtained through the NRS BioResource (

Comparison of POC device performance using serum samples
14 POC devices were evaluated for sensitivity and specificity including 13 LFAs and one microfluidic immunofluorescence assay (Lumira DX). In the first stage the sensitivity of the devices was assessed using a panel of 50 serum samples from RT-PCR positive individuals. The specificity of the device was assessed using a panel of 50 serum samples collected prior to December 2019 for routine virological investigations. Adequately performing devices (IgG specificity > 98% and IgG sensitivity >95% � 20 days post symptom onset) then progressed to stage 2, where they were evaluated against an expanded panel of serum samples. In particular, a more stringent specificity panel was used, including serum samples taken from individuals with a positive PCR result for the endemic coronaviruses, other respiratory pathogens, and samples with positive rheumatoid factor, CMV or EBV serology.
The overall results for each device are summarised in Table 1  Only the LumiraDX POC assay achieved the MHRA criteria for sensitivity and specificity when all testing was considered (> 98% for both), although a number of the LFA devices (Biomerica, Biozek, Fortress, Menarini and Roche) surpassed the specificity criteria, even when tested against the high stringency specificity serum samples. One empirical finding was that there was extensive variation in the strength of the staining between the kits, which could affect For each manufacturer the n, overall sensitivity, sensitivity at � 20 days post symptom onset, and specificity is shown, with 95% confidence intervals (Clopper-Pearson).
The devices are grouped according to which stage of the evaluation they reached. For kits that progressed further than stage 1, the additional analyses performed are

PLOS ONE
Evaluation of SARS-CoV-2 antibody point of care devices in the laboratory and clinical setting infection for the Biomerica, Biozek, Menarini, Roche and LumiraDX devices and the results are shown in Table 2 and Fig 2. AbC-19 and Fortress were not included in this part of the analysis as sensitivity data at proximal time points to infection were not available for these kits. This reflected the times the different kits became available, the amount of serum sample that was available when the kits became available, and our wish not to duplicate the work that was being done by other groups at the time. All the devices tested showed an initial increase in sensitivity, peaking at �20 days post symptom onset, with sensitivities of � 90% obtained for all devices from this time point onwards. Between 8-20 days post symptom onset, performance was more variable, with Roche showing the lowest sensitivity, and Menarini the highest. It was important to determine the sensitivity of the POC devices at later time points post onset of symptoms, as it has been demonstrated that antibody levels wane over time following the primary antibody response to SARS-CoV-2 [14][15][16]. In order to examine this, we used a set of samples taken from the non-hospitalised convalescent cohort, since the majority of patients in the general population were not hospitalised with SARS-CoV-2 infection, and serial samples were available. Sensitivity was analysed across three time windows �20 days post symptom onset for the AbC-19, Biomerica, Roche and LumiraDX predominantly using samples from the convalescent cohort, and the results are shown in Table 3 and Fig 3. The LumiraDX assay was the only assay evaluated that maintained its sensitivity at 98% across the time interval studied. The three LFA devices (AbC-19, Biomerica, Roche) showed variable drops in sensitivity with increasing time from symptom onset, with Biomerica dropping from 91% to 54% over the time points studied.
For the convalescent cohort samples, neutralising titres were also available, and the POC result (for AbC-19, Biomerica, Roche and LumiraDX) could be compared with the half maximal neutralising titre (NT 50 ). Initially, the NT 50 result for each serum sample was plotted, dividing each device by positive and negative IgG (or for LumiraDX, antibody) result (Fig 4).
There was a statistically significant difference in NT 50 level between the positive and negative result for all assays. However, for the positive result, a large range of NT 50 values was observed. To examine this further, the sensitivity and specificity of the device was evaluated at two NT 50 thresholds; � 50 and � 160 (Table 4). These two thresholds were selected based on a detectable NT 50 (� 50) and a threshold which may confer protection from reinfection (� 160) [17].

PLOS ONE
Evaluation of SARS-CoV-2 antibody point of care devices in the laboratory and clinical setting The sensitivity of all the POC devices for detecting an NT 50 � 50 (of the 280 serum samples, 222 had an NT 50 � 50 and 58 had an NT 50 < 50) was at least 90%, and for Roche and Lumi-raDX it was > 98%. However, the NPV varied between 40 and 80%, as a result of the small number of samples that were negative by the POC devices relative to the number of positive samples (Fig 4). Specificity performance was also poor with a large number of false positive results (POC positive, but NT 50 < 50). In particular, the LumiraDX assay had a specificity of < 5%, whereas the AbC-19 assay had the highest specificity of 62%. These results impacted the PPV in this high prevalence setting, ranging from 90% for the AbC-19 assay and 84% for the LumiraDX assay.
When the NT 50 threshold was raised to � 160 (where 142 of the serum samples had an NT 50 � 160 and 138 had an NT 50 < 160), the sensitivity for all the POC devices rose to at least 96%, with a consequential increase in NPV to at least 90%. However, the specificity of all of the devices fell, with the AbC-19 kit having the highest specificity at 42%. This affected the PPV, which also fell from the lower NT 50 threshold, to between 55 and 64%.

PLOS ONE
Evaluation of SARS-CoV-2 antibody point of care devices in the laboratory and clinical setting

Comparison of POC device performance in serum and capillary samples
A major advantage of the POCT devices over immunoassays performed within accredited laboratories is the lack of requirement for phlebotomy. However, many evaluations of POC performance have not compared serum and capillary samples. Furthermore, it has been suggested that LFA could be suitable for home use, outside of a healthcare setting, and this would require the lay participant to interpret their test result. In this study, paired serum and capillary samples collected on the same study visit were available for a number of participants in the convalescent cohort. The performance of seven POC assays (AbC-19, Biomerica, Biozek, Fortress, Menarini, Roche and LumiraDX) was compared. The serum sample was processed in the laboratory, and the result read by a health care worker (HCW), while the capillary sample result was read by the study participant (with the aid of a labelled diagram or instruction leaflet) and a second HCW in the research clinic. Sensitivity findings for the 3 matched interpretations are shown in Table 5 and Fig 5. A common feature of the POC results in capillary samples was a reduction in sensitivity compared to the serum samples. This was seen in 5 out of the 7 devices analysed, however the magnitude of the effect varied. Both Biozek and Roche showed significant decreases between serum and capillary sensitivity when interpreted by either HCW or participants. This decrease in sensitivity ranged from 19.1 to 34.2%. AbC-19 showed a significant decrease in sensitivity between serum and capillary results read by participants, decreasing from 73.3% to 33.3%. The Fortress LFA showed a small increase in sensitivity in capillary samples compared to serum but this was not statistically significant. The LumiraDX assay showed 100% sensitivity for both samples (n = 11) but a large number of test fails (n = 18) occurred on capillary samples at the research clinic. The concordance between serum, HCW and participant read capillary results are displayed in Table 5. Concordance between the devices varied. For example, of the 45 AbC-19 results, 24 were interpreted as positive by a HCW but only 15 by the study participants. In contrast, for the Fortress LFA, there was a single discrepancy between the HCW and participant interpretations, where the participant reported a positive result, while the HCW reported a negative result.
Participants who performed self-testing using the AbC-19 LFA in the clinic also completed a questionnaire that addressed aspects of self testing using the kits (n = 35 participants). Their responses are summarised in Fig 6. Overall, 89% of participants reported it was "very easy" or "easy' to understand the leaflet explaining the results. 75% reported it was "very easy" or "easy" to understand the instructions. Any issues appeared to be predominantly associated with taking the capillary samples and using the test kit, with 20% of participants reporting it was "difficult" or very difficult' to take the sample.

Discussion
The SARS-CoV-2 pandemic has generated unprecedented demand for testing both to confirm acute infection and past virus exposure. In particular, serological assays measure prior

PLOS ONE
Evaluation of SARS-CoV-2 antibody point of care devices in the laboratory and clinical setting exposure to the virus and are being used in high volume settings to measure seroprevalence at a population level. The role for these tests at an individual level is still poorly understood, and there are huge risks associated with the use of poorly performing tests. For example, using a test with a inadequate specificity in a low prevalence population would result in a higher proportion of people with a false positive result compared to a true positive result, leading people to believe they were antibody positive, when they are not.
This study was designed as part of a national evaluation of SARS-CoV-2 antibody POC tests within NHS Scotland, and aimed to address several questions regarding the performance of the POC tests. To be effective, POC devices should have similar performance in samples from patients who experience relatively mild illness compared to patients requiring hospitalisation. This was addressed using a cohort of convalescent patients, the majority of whom did not require hospital treatment during their illness. This cohort was also used to assess the performance of the POC device with increasing time from infection, and to examine how well they were able to predict neutralising antibody titres.
Another major requirement is for POC devices to perform well in capillary samples compared to serum. This point is critical since a major strength of the POC devices is that they could be used on capillary samples, thus reducing the requirement for phlebotomy.
In this work, 14 POC devices were evaluated on serum samples, and a number of these tests underwent further evaluation on capillary samples. From the serum evaluation, it became clear that specificity performance of many of the POC devices was good-with 11 out of 14 reaching the MHRA specificity target of > 98% for IgG, or in the case of LumiraDX, total antibody. However, only a single device (LumiraDX) reached the MHRA sensitivity criteria of >98% � 20 days post symptom onset. The MHRA criteria state that both sensitivity and specificity should be �98% therefore only LumiraDX met these standards. The sensitivity panel that was used to assess these POC devices in the first instance consisted predominantly of hospitalised patients, who tend to have higher antibody titres than patients with milder disease [18][19][20][21][22][23], making disease severity a less likely explanation for these observations. The reduced sensitivity of the devices compared to manufacturer's claims is not unique to the methodology; the sensitivity of a laboratory analyser immunoassay on this cohort was 93% at > 20 days post symptom onset. Therefore at relatively early time points post symptom onset, a negative result cannot rule out SARS-CoV-2 exposure. Indeed, for the subgroup of POC devices where sensitivity with time was examined it was clear that, at early time points, sensitivity increased with time reaching maximal sensitivity after 20 days post symptom onset. This was in keeping with other reports that have studied the time to reach maximal antibody titres [18,[24][25][26].
An equally important question regarding timing of testing was how long post infection the device could detect an antibody response. The issue of sero-reversion is important for seroprevalence studies, and the ideal test for this purpose would have a high sensitivity for a long period of time. Otherwise, sero-reversion may result in underestimation of seroprevalence, and models to account for this trend of waning antibodies themselves risk over-or under-correcting for this effect [27,28]. To examine this, a cohort of SARS-CoV-2 convalescent individuals for whom longitudinal serum samples were available were used. Crucially, these patients were most representative of the majority of SARS-CoV-2 infections in so far as only 5 participants providing serial samples required hospitalisation during their initial COVID-19 infection, and none required treatment in intensive care. In this study, the LumiraDX assay was the only assay to maintain a high sensitivity at time points from 21 to >140 days post symptom onset; the other assays showed variable levels of declining sensitivity. This was most marked for the Biomerica LFA, which had a sensitivity of 91% at 21-79 days post symptom onset, dropping to 54% at � 140 days post symptom onset.
The relative importance of specificity compared to sensitivity at least in part depends on the intended use. For example, for serosurveillance, particularly in low prevalence populations, specificity should be maximised to reduce the false positive rate. However, in clinical settings, specificity may be compromised in favour of sensitivity if there were adequate follow up tests to identify true positives.
A major question remains over the significance of a positive antibody test result in terms of protection from reinfection, or protection from infection following vaccination. This is pertinent given the continued discussions regarding "immunity passports" and identifying nonimmune individuals, through antibody testing, for priority vaccination where vaccine supply is limited. In this study, we compared the POC device result with the NT 50 level (Fig 4 and Table 4). It was apparent that a positive result was associated with a wide range of neutralising antibody levels, and that a number of positive samples had low NT 50 levels (< 50). This adds to the concern that a positive POC result may not provide relevant information about the likelihood of protection from infection. The risk of using these tests for the purpose of 'immunity passports' is further exacerbated by concerns over specificity and the circulation of new variants that may be able to evade existing humoral responses [29,30]. Whilst some of the better performing devices may be suitable for sero-prevalence studies, they should not be used to draw conclusions on an individual's protection from reinfection.
A major advantage of POC devices stems from their potential to relieve requirement for phlebotomy and to be performed outside of a healthcare setting. However, for this potential to be realised, additional factors including POC device performance in capillary samples and ease of use should be considered. In this study, a head to head comparison of POC device performance on serum and capillary samples was performed, and for the LFAs HCW and participant interpretation of the test result was also compared. There were differences in performance on serum and capillary samples, with the majority of the devices showing reduced sensitivity on capillary samples compared to serum. Concordance between serum and capillary results was sometimes poor, indicating that the performance of a test on serum under laboratory conditions cannot be assumed to equate to performance on a capillary sample. Only the LumiraDX device was equally sensitive on both capillary and serum samples.
There was good agreement between capillary results interpreted by HCW and participants for the Roche, Menarini, Biozek and Fortress kits, indicating that these LFA tests may be best suited for use as home test kits. For other LFAs, such as AbC-19, many study participants had difficulty noticing the faint lines produced by the test kit and interpreted the result as being negative when it was interpreted as positive by a HCW.
A disadvantage of the POC devices compared to laboratory immunoassays is that they are strictly qualitative (with the exception of LumiraDX, where a numeric value is converted to a qualitative result). This means the possibility of introducing equivocal zones is not available. Further issues that came to light during the course of this evaluation work included the possibility of batch-to-batch variation in kits from the same manufacturer. For example two different batches of Menarini kits showed marked differences in sensitivity (76.7% on one batch verus 94.9% on another batch). Whilst we did not observe any failed tests during our evaluation of the LFA using serum or capillary samples, a high failure rate was seen using the Lumi-raDX device with capillary samples with 18 out of 29 participants having a failed test result. Possible explanations for this include sample clotting in the device prior to analysis or unrefined design leading to inadequate capillary action to draw up sample from the application point. Unpublished data available from another POC device evaluation site in Scotland (NHS Tayside), where a newer version of the LumiraDX device was being trialled, indicated that there were no issues with failed SARS-CoV-2 antibody capillary tests. This suggests that this issue has now been resolved by the company. However, strategies to identify such issues should continue to be employed by the manufacturer and/or the end user to ensure accurate performance on capillary samples.
The strength of this study lies in its contribution to our understanding of POC device performance; in particular performance at different time points post infection, confirmed evidence of batch to batch variation with some POC kits, and demonstration of the general lack of correlation between a positive device result and the presence of neutralising antibodies. The data also demonstrates that the results obtained in the lab using serum cannot automatically be assumed to be representative of finger-prick capillary test results. Furthermore, a significant proportion of this research used a convalescent cohort of patients with relatively mild disease, making the findings applicable to community-based studies.
However, the study does have limitations. Due to sample availability constraints and the large number of devices evaluated, not all of the kits could be evaluated on the same panel of positive and negative samples. In particular, the numbers of samples where paired serum and capillary sample results were available to assess concordance was limited for the majority of the devices examined. For one of the test kits ease of use was assessed but as many of the study participants providing capillary samples in this study were healthcare workers, the data obtained through this may not be representative of the general population.

Conclusion
Our results highlight a wide variation in performance of SARS-CoV-2 antibody test kits and illustrate the importance of verifying multiple different aspects of test performance. Checking for batch to batch variation, changes in sensitivity as time from infection increases, correlation with neutralising antibodies, and performance on capillary samples should all be evaluated. This is essential prior to considering the utilisation of these tests for 'immunity passports' and identification of hospitalised patients who would benefit from monoclonal antibody treatment, as well as utilisation of these devices to enable targeted vaccine distribution in areas of the world of vaccine inequity.