Interlaboratory comparison for the Filovirus Animal Nonclinical Group (FANG) anti-Ebola virus glycoprotein immunoglobulin G enzyme-linked immunosorbent assay

The need for an efficacious vaccine against highly pathogenic filoviruses was reinforced by the devastating 2014–2016 outbreak of Ebola virus (EBOV) disease (EVD) in Guinea, Sierra Leone, and Liberia that resulted in over 28,000 cases and over 11,300 deaths. In addition, the 2018–2020 outbreak in the Democratic Republic of the Congo currently has over 3,400 cases and over 2,200 deaths. A fully licensed vaccine and at least one other investigational vaccine are being deployed to combat this EVD outbreak. To support vaccine development and pre-clinical/clinical testing a Filovirus Animal Nonclinical Group (FANG) human anti-EBOV GP IgG ELISA was developed to measure anti-EBOV GP IgG antibodies. This ELISA is currently being used in multiple laboratories. Reported here is a characterization of an interlaboratory statistical analysis of the human anti-EBOV GP IgG ELISA as part of a collaborative study between five participating laboratories. Each laboratory used similar method protocols and reagents to measure anti-EBOV GP IgG levels in human serum samples from a proficiency panel consisting of ten serum samples created by the differential dilution of a serum sample positive for anti-GP IgG antibodies (BMIZAIRE105) with negative serum (BMI529). The total assay variability (inter- and intra-assay variability) %CVs observed at each laboratory ranged from 12.2 to 30.6. Intermediate precision (inter-assay variability) for the laboratory runs ranged from 8.9 to 21.7%CV and repeatability (intra-assay variability) %CVs ranged from 7.2 to 23.7. The estimated slope for the relationship between log10(Target Concentration) and the log10(Observed Concentration) across all five laboratories was 0.95 with a 90% confidence interval of (0.93, 0.97). Equivalence test results showed that the 90% confidence interval for the ratios for the sample-specific mean concentrations at the five individual labs to the overall laboratory consensus value were within the equivalence bounds of 0.80 to 1.25 for each laboratory and test sample, except for six test samples from Lab D, two samples from Lab B1, and one sample from Lab B2. The mean laboratory concentrations for Lab D were less than those from the other laboratories by 20% on average across the serum samples. The evaluation of the proficiency panel at these laboratories provides a limited assessment of assay precision (intermediate precision, repeatability, and total assay variability), dilutional linearity, and accuracy. This evaluation suggests that the within-laboratory performance of the anti-EBOV GP IgG ELISA as implemented at the five laboratories is consistent with the intended use of the assay based on the acceptance criteria used by laboratories that have validated the assay. However, the assessment of between-laboratory performance revealed lower observed concentrations at Lab D and greater variability in assay results at Lab B1 relative to other laboratories.


Introduction
The filoviruses (family Filoviridae) from the genera Ebolavirus and Marburgvirus are etiologic agents of sporadic viral hemorrhagic fever outbreaks in humans with high mortality rates. An unprecedented outbreak of Ebola virus (EBOV; species Zaire ebolavirus) disease that began in Guinea during December 2013 [1] subsequently spread into neighboring West African countries of Sierra Leone and Liberia, prompting the World Health Organization (WHO) to declare the epidemic a public health emergency of international concern (http://www.who.int/ mediacentre/news/statements/2014/ebola-20140808/en/). Phylogenetic analysis of viral isolates from this epidemic suggests a single transmission event introduced the virus, named the EBOV Makona variant [2], from an undetermined natural reservoir into humans in Guinea, followed by transmission between humans to spread the virus throughout Guinea and into Sierra Leone and Liberia [3]. Implementation of containment measures such as patient isolation and improved burial practices eventually controlled the epidemic, which resulted in 28,616 reported cases with a mortality rate of approximately 40% (http://www.who.int/csr/ disease/ebola/en/).
The severity of this epidemic and principle transmission from human to human underscored the need for efficacious vaccines (and therapeutics) against EBOV, accelerating the placement of candidate EBOV vaccines into clinical safety trials [4][5][6]. This need for safe and efficacious vaccines was again evident with the onset of the 10 th and largest outbreak in the Democratic Republic of the Congo (DRC) from 2018-2020. The 11 th outbreak of EVD continues in the Western DRC.
The characteristics of filovirus infection, where infected patients are contagious only after manifestation of symptoms, allows one to use a ring vaccination strategy for disease containment. Ring vaccination strategy relies on the combination of contact tracing for case identification and a rapid effective vaccine for use in contacts and contacts of contacts of infected patients. The application of this strategy led to the approval of rVSV-ZEBOV (ERVEBO 1 ), a single dose vaccine, using the safety and efficacy data from the clinical trial during the 2014 outbreak in West Africa by the Food and Drug Administration in December 2019 (https:// www.fda.gov/news-events/press-announcements/first-fda-approved-vaccine-preventionebola-virus-disease-marking-critical-milestone-public-health). The effectiveness of ERVEBO in a ring vaccination response provides an important countermeasure for public health but does not address all unresolved questions in filovirus vaccine utilization including duration of protection, alternate dosing regimens, and the effectiveness of filovirus vaccines based on other viral platforms or alternative strategies. The development of multiple countermeasures against a disease necessitates the use of a common assay based on a surrogate of protection which can be used to compare the elicited immune response between vaccines and provide valuable information as to the effectiveness and durability of protection. Ideally, this assay is not only informative but simple, reproducible, species independent, and transferrable between labs. For example, during the development of countermeasures against anthrax, a lethal toxin neutralization assay was developed and used by many laboratories [7].
The development of vaccine candidates for Ebola virus disease prophylaxis [8] continues today, including deployment of a heterologous prime boost vaccine with European Commission Market Authorization during the last outbreak. However, the demonstration of efficacy for new filovirus vaccines will be complicated in the absence of a large outbreak and may require evaluation under the FDA Animal Rule or via non-inferiority trials against ERVEBO. Regulatory evaluation using these approaches is only possible with a correlate of protection and a well-developed assay that can measure the response in well-characterized animal challenge models as well as in human clinical trials. The species-neutral ELISA is ideal for bridging data between humans and animal models. Also, since the assay likely will be utilized in multiple experiments at many sites, it is important to demonstrate that the assay is reproducible among different laboratories.
In order to facilitate the development of additional vaccine countermeasures and to address such questions as the durability of immunity, the FANG has supported the development of a human anti-EBOV GP IgG ELISA. This study describes the FANG efforts to determine if the performance of the human anti-EBOV GP IgG ELISA [9] is acceptable for sample evaluation across five participating laboratories. Each laboratory used an anti-EBOV GP IgG ELISA to measure levels of binding in human serum samples from a FANG designed human proficiency panel. The panel consisted of ten human serum samples created by the differential dilution of human serum lot number BMIZAIRE105 (pool of serum with an approximate anti-GP IgG concentration of 1,000 ELISA units/mL) with control human serum (BMI529) without antibody activity. The concentration of the proficiency panel samples ranged from 0.00 ELISA units/mL to approximately 800 ELISA units/mL.
Each participating laboratory received sufficient volume of the proficiency panel for initial testing plus repeats and used their own anti-EBOV GP IgG ELISA established assay. The assay was validated at some laboratories and qualified at others [9]. Data from the participating laboratories were compared by statistical analysis. Both intra-laboratory and interlaboratory analyses were performed to evaluate repeatability, intermediate precision, dilutional linearity, and accuracy. This paper summarizes both the intra-and inter-laboratory analysis of the results generated in the five separate laboratories. Results from the laboratories are de-identified in the analysis and reported as Laboratory A through E. The repeatability estimate for Laboratory B was greater than the acceptance criteria as established in laboratories that validated the anti-EBOV GP IgG ELISA with human serum, and, as a result, the proficiency panel assay runs were repeated. Results from both the original and repeated runs were included in the analysis and labeled as being from Laboratory B1 and B2, respectively.

Assay method
A common assay method [9] was tech-transferred to the participating laboratories, but there were minor variations in equipment/materials/procedures between laboratories. The analysis of the proficiency panel in the ELISA was performed similarly at Labs A, B1, and B2. All three used two separate operators on separate days. Samples were analyzed using a starting dilution of 1:62.5 and followed the plate layout as illustrated in Table 1. These plate layouts represent 15 plates with specific proficiency panel samples on each plate. All 15 plates were run twice for a total of 30 plates for each of Labs A, B1, and B2.
The analysis of the proficiency panel in the ELISA was performed at Lab C by two separate operators over three days and at Lab D by two separate operators over five days. Samples were analyzed using a starting dilution of 1:50 and followed the plate layout as illustrated in Table 2. These plate layouts represent 12 plates with specific proficiency panel samples on each plate. All 12 plates were run at least twice for a total of 24 plates for each of Labs C and D.
The analysis of the proficiency panel in the ELISA was performed at Lab E by two separate operators over four days. Samples were analyzed using a starting dilution of 1:50 and followed the plate layout as illustrated in Table 3. This plate layout represents six plates with specific proficiency panel samples on each plate. The six plates were each run four times for a total of 24 plates. For all laboratories, some samples were analyzed three times on the same plate [denoted with "X (3)" in the plate layouts]. These contributed to assay repeatability.
Samples on a given plate were excluded from analysis if the within-assay CV of at least three dilution-adjusted concentrations determined for that sample was greater than 20%. Samples were also excluded if the plate including that sample failed to meet system suitability criteria. Some samples and plates that failed to meet the sample suitability criteria or system suitability criteria were repeated on later days. The ELISA concentrations of each qualification test sample by laboratory are provided in the supplemental information (S1-S6 Tables). This study, and specifically the use of human serum samples, was approved in writing by the Battelle Institutional Review Board in April of 2015 (approval number HSRE 0223-100062052). Human serum samples were collected from subjects by the sponsor (Crucell Holland) via written consent according to their IRB-approved protocol. These samples were not specifically collected for this interlaboratory study but rather for a different study. Battelle nor any authors were affiliated with this initial study. The sponsor subsequently provided Battelle volumes of these samples for the purposes of conducting the study described in this manuscript. Throughout its analysis of human biological materials and reporting, Battelle had no access to volunteer subjects' identifiers nor any access to any code-key that would allow Battelle researchers to attribute any results of analysis to the original volunteer human research subjects.

Statistical methods
Inter-laboratory analysis was performed using the combined results across all laboratories. A mixed-effects analysis of variance (ANOVA) model was fitted to the base-10 log-transformed concentrations to evaluate both inter-laboratory precision (i.e., between lab precision) and intra-laboratory precision (i.e., within-laboratory precision). The model included a fixed effect for test sample and random effects for laboratory, test date nested within laboratory, and plate nested within day. Here, test operator was excluded as a random effect because this variable was indistinguishable from test day in most laboratories. Because of this confounding of effects, any variability attributable to test day may also be due to the different test operators.
Results were screened for outliers within each laboratory separately. Deleted studentized residuals were computed for each observation. If the absolute value of the deleted studentized residual was greater than four, then the observation was considered a statistical outlier and removed from the inter-laboratory analysis.
Variability associated with the random effects as well as intermediate precision, repeatability, and total assay variability were estimated separately for each lab using model-based percent coefficient of variation (CV). The percent CV for each source of variance was calculated using Tan's [10] relative standard deviation as 100 � ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi e lnð10Þ 2 �s 2 À 1 p where σ 2 is the model-estimated variance for the specific variance source. The percent CV associated with the residual variance served as an estimate for the assay repeatability. The percent CV associated with the test day and plate effects served as an estimate for the intermediate precision of the assay. Total assay variability was estimated using all variance components from the model (both inter-and intra-run variability).
The model intercept was obtained for each test sample from the mixed effects ANOVA model to serve as test sample consensus values across the laboratories. Agreement among laboratories was evaluated by comparing individual assay results from each laboratory to the consensus values. Boxplots were produced for each test sample to show the distribution of concentrations by laboratory in relation to the corresponding consensus value. The ratio of individual test results to consensus values was calculated by test sample to evaluate the level of agreement among laboratories based on two one-sided tests (TOST) of equivalence.
To assess dilutional linearity, a random coefficients linear regression model was fitted to the log-transformed observed concentrations versus the log-transformed target concentrations. The model included both a random intercept and slope effect for each laboratory, along with random effects for laboratory, test day nested within laboratory, and plate nested within laboratory. The random slope coefficients were modeled as laboratory-specific differences from the overall slope. The overall slope was used to assess the dilutional linearity based on a test of equivalence (TOST) and random slope coefficients were used to evaluate the level of agreement among the laboratories.

Results
Across all six laboratory runs, there were some false positive observations for Sample 18, a sample with a known negative concentration. All reportable values from Sample 18 were excluded from the statistical models. Table 4 lists five outliers that were removed from their respective intra-laboratory analyses that were also removed from this inter-laboratory analysis. One outlier each were removed from Laboratories B1 and B2. Three outliers were removed from Laboratory C.  (8.9) while Laboratory A had the lowest %CV for repeatability (7.2) and total assay variability (12.2). Laboratory B1 had the highest repeatability and total assay variability (23.7%CV and 30.6%CV, respectively) while Laboratory D had the highest %CV for intermediate precision (21.7). Table 6 shows the consensus values (geometric means) along with 95% confidence intervals for each test sample generated from the mixed model ANOVA fitted to the data. Boxplots by sample of the reportable values from each laboratory, with each plot including a horizontal line for the consensus value estimate for the given sample, are provided in the supplemental information (S1-S9 Figs). Table 7 shows the ratio of the mean concentration for each of the six individual laboratory runs to the consensus value for a given sample along with a 90% confidence interval for the ratio. Agreement among laboratories implies that these ratios should be close to one, indicating that the average concentrations are about the same as the consensus values. The ratios range from 0.95 to 1.08 for Laboratory A; from 0.96 to 1.19 for Laboratory B1; from 0.83 to 1.12 for Laboratory B2; from 0.96 to 1.16 for Laboratory C; from 0.71 to 0.97 for Laboratory D; These observations were deleted from both intra-and inter-laboratory analyses.
https://doi.org/10.1371/journal.pone.0238196.t004  Following this equivalence criteria: two intervals from Laboratory B1 (corresponding to BMI-ZPP-13 and BMI-ZPP-20) had an upper bound greater than the upper acceptance limit of 1.25 (1.30 and 1.39); one interval from Laboratory B2 (corresponding to BMI-ZPP-19) had a lower bound less than the lower acceptance limit of 0.80 (0.79); and six intervals from Laboratory D (corresponding to BMI-ZPP-12, BMI-ZPP-13, BMI-ZPP-14, BMI-ZPP-16, BMI-ZPP-17, and BMI-ZPP-20) had a lower bound less than the lower acceptance limit of 0.80. Furthermore, three of the six intervals are entirely below the lower acceptance bound of 0.80. These findings indicate that mean concentrations observed at Laboratory D are not equivalent to the other laboratories for six of the nine test samples. Table 8 presents the estimated slope across the five laboratories and the corresponding 90% confidence interval obtained from the random regression model fit to assess the relationship between log 10 (observed concentration) and log 10 (target concentration). The overall slope was estimated to be 0.95 with a 90% confidence interval of (0.93, 0.97). An equivalence test was conducted to determine if the overall slope was equivalent to 1.00 (perfect dilutional linearity). An equivalence interval of 0.80 to 1.25 for the overall slope was used. Because the 90% confidence interval for the overall slope was completely within the interval (0.80, 1.25), the concentrations were found to be dilutionally linear across the laboratories. The slope estimates  specific to each laboratory ranged from 0.94 to 0.96 (Table 8) and were consistent with the overall slope.

Discussion
The value of an assay as a regulatory tool is dependent on its accuracy, consistency, simplicity, and reproducibility. An assay that is relevant, is species independent, and replicable among laboratories is a powerful tool for product development. The data from a number of clinical trials utilizing ERVEBO strongly suggest that the anti-EBOV GP IgG ELISA provides data that correlate with product efficacy against Ebola infection. The development of new vaccines, or the evaluation of durability or alternative dosing regimens will be based on interpretation of data using the human anti-EBOV GP IgG ELISA. Our ability to use, or trust the data generated from non-clinical studies in different laboratories and clinical trials carried out with sera evaluated at different sites will require an understanding regarding the consistency and reproducibility of the assay among laboratories. In particular, assays using material from animal studies may be performed in laboratories different from those where the assay was performed to evaluate clinical trials. If the assay performance is not consistent among species and across laboratories, then data interpretation will not be possible. This interlaboratory study provided a direct head-to-head comparison of the ELISA performance in five different laboratories. The results from this study confirm the assay can be a universal tool for Ebola virus vaccine evaluation since results were similar when using the assay at multiple labs. However, the small differences in assay performance reinforce that for regulatory purposes, it is still ideal to rely on only one test site where the assay is fully validated.
Intermediate precision for the six laboratory runs ranged from 8.9 to 21.7%CV and repeatability ranged from 7.2 to 23.7%CV. The total assay variability %CVs range from 12.2 to 30.6. As a point of reference, laboratories that validated the anti-EBOV GP IgG ELISA have used the following precision acceptance criteria: (1) The intermediate precision of the assay for samples within the analytic range of the assay must be no larger than 25% CV; and (2) the repeatability of the assay for samples within the analytic range of the assay must be no larger than 20% CV. The repeatability estimate for Laboratory B1 was greater than the upper acceptance bound as established in laboratories that validated the anti-EBOV GP IgG ELISA with human serum. However, a repeat of the proficiency panel run at this laboratory following additional training of laboratory staff resulted in a repeatability estimate less than the upper acceptance bound; thus, illustrating the importance of rigorous training of laboratory staff and the strict adherence to assay procedures to ensure consistent results between runs.
Similarly, laboratories that validated the anti-EBOV GP IgG ELISA have used the following dilutional linearity (relative accuracy) acceptance criteria: the 90% confidence interval for the slope from the random regression model fit to data between the limits of quantitation and relating log 10 (concentration) to log 10 (spike level) will be entirely within (-1.20, -0.80). The interlaboratory study models dilutional linearity as log 10 (observed concentration) to log 10 (target concentration) resulting in a positive relationship between the two variables. Therefore, to conclude that dilutional linearity is acceptable in relation to the validation in human serum, the 90% confidence interval for the slope should be positive and fall entirely between 0.80 and 1.20. The overall slope was 0.95 and has a 90% confidence interval estimate of (0.93, 0.97); thus, the dilutional linearity is within the acceptance criteria as established in the assay validation with human serum.
Agreement among laboratories implies that the ratios of the mean concentration for the five individual labs to the overall laboratory consensus value for a given sample should be close to one. The ratios range from 0.95 to 1.08 for Laboratory A; from 0.96 to 1.19 for Laboratory B1; from 0.83 to 1.12 for Laboratory B2; from 0.96 to 1.16 for Laboratory C; from 0.71 to 0.97 for Laboratory D; and from 0.90 to 1.06 for Laboratory E. Equivalence test results showed that the 90% confidence interval for the ratio were within the equivalence bounds of 0.80 to 1.25 for each laboratory except for samples BMI-ZPP-13 and BMI-ZPP-20 in Laboratory B1, BMI-ZPP-19 in Laboratory B2, and six samples in Laboratory D.
The assessment of between-laboratory performance revealed lower observed concentrations at Lab D and greater variability in assay results at Lab B1 relative to the other laboratories. The lower observed concentrations at Lab D illustrate the importance of monitoring assay performance and harmonizing across laboratories. Given the inherent differences from subjectto-subject in clinical trials and animal-to-animal in non-clinical studies, these differences observed at Lab D relative to the other laboratories are not likely to affect interpretation of study results. The variability in assay results at Lab B1 was mitigated by additional laboratory staff training.
The evaluation of the proficiency panel at these laboratories provides a limited assessment of assay precision (intermediate precision, repeatability, and total assay variability), dilutional linearity, and accuracy. This limited evaluation suggests that the within-laboratory performance of anti-EBOV GP IgG ELISA as implemented at the five laboratories is performing consistently with the intended use of the assay based on the acceptance criteria used by laboratories that have validated the assay.