Multiplex Microsphere Immunoassays for the Detection of IgM and IgG to Arboviral Diseases

Serodiagnosis of arthropod-borne viruses (arboviruses) at the Division of Vector-Borne Diseases, CDC, employs a combination of individual enzyme-linked immunosorbent assays and microsphere immunoassays (MIAs) to test for IgM and IgG, followed by confirmatory plaque-reduction neutralization tests. Based upon the geographic origin of a sample, it may be tested concurrently for multiple arboviruses, which can be a cumbersome task. The advent of multiplexing represents an opportunity to streamline these types of assays; however, because serologic cross-reactivity of the arboviral antigens often confounds results, it is of interest to employ data analysis methods that address this issue. Here, we constructed 13-virus multiplexed IgM and IgG MIAs that included internal and external controls, based upon the Luminex platform. Results from samples tested using these methods were analyzed using 8 different statistical schemes to identify the best way to classify the data. Geographic batteries were also devised to serve as a more practical diagnostic format, and further samples were tested using the abbreviated multiplexes. Comparative error rates for the classification schemes identified a specific boosting method based on logistic regression “Logitboost” as the classification method of choice. When the data from all samples tested were combined into one set, error rates from the multiplex IgM and IgG MIAs were <5% for all geographic batteries. This work represents both the most comprehensive, validated multiplexing method for arboviruses to date, and also the most systematic attempt to determine the most useful classification method for use with these types of serologic tests.


Introduction
Arthropod-borne viruses (arboviruses) are responsible for considerable morbidity and mortality worldwide. Those most heavily affected live at tropical latitudes where mosquitoes are most active and difficult to control [1]. Human vaccines are available for yellow fever (YF), Japanese encephalitis (JE) and tick-borne encephalitis (TBE) viruses, and long-sought vaccine candidates for dengue are in various stages of clinical trials [2]. However, for most of the world's population, vaccines for these viruses are currently either unavailable or too expensive.
Clinical presentations can be ambiguous and diagnoses notoriously difficult based on symptoms alone. Laboratory confirmation is therefore often critical for diagnosis. While arboviral infections could potentially be treated using antivirals such as Ribavirin [3], and is occasionally treated with IVIG, currently the usual treatment is supportive therapy only. The presence of viral RNA in blood is typically fleeting, so antibody testing is often the method of choice to provide a laboratory diagnosis or to help rule in or rule out other more treatable infections.
A variety of techniques have been developed over the past 40 years for the serodiagnosis of arboviruses. These include immunofluorescence assay, complement fixation test, hemagglutination inhibition assay, plaque reduction neutralization test (PRNT) [4], and IgM and IgG enzyme-linked immunosorbent assays (ELISAs) [5,6]. The most recent addition to the menu of tests is the microsphere immunoassay (MIA) [7,8]. Currently, ELISAs and MIAs are generally used as screening tools to separate those specimens that are negative to the arboviral antibody tested for, from those that should receive confirmatory testing. In a known outbreak situation, IgM and IgG assays are sometimes performed without using confirmatory methods. A combined approach enables the broadest spectrum of information to be captured and interpreted in light of the clinical picture, any travel history of the patient, and timing of specimen collection.
A critical part of arboviral laboratory diagnosis pertains to the serologic testing for related viruses. Antibodies to one virus of a particular genus will frequently cross-react with heterologous antigens within the genus [7]. Much effort has been put into the development of tests and reagents that reduce or remove this cross-reactivity [9]. If successful, such methods would reduce the need for confirmatory testing with PRNT. However, the cross-reactivity seen using currently available reagents can be taken advantage of. The possession of an understanding of the cross-reactivities of these viruses both inform diagnoses, and help in the recognition of viruses formerly absent from a particular geographic region [10]. At the CDC Arboviral Diseases Branch, laboratory diagnosis typically employs the geographic approach to testing. The initial screening incorporates a number of arboviruses known to be present in the region of the world where the patient resides or has recently traveled to.
Microsphere-based immunoassays (MIAs) have been used as screening tools for arboviruses over the past 5 years. A number of US State and government labs including the CDC have used a duplex IgM tests for detection of antibodies to West Nile (WN) and St. Louis encephalitis (SLE) viruses [7], and have participated in proficiency testing using this method. This test was recently adapted for the detection of IgG to WN and SLE viruses (unpublished data). The speed and ease of use of these methods have made them attractive for expansion to other arboviruses, where viral antigens of interest can be incorporated into the testing battery. The body of work presented here elaborates on these methods and capitalizes on the multiplexing capability of the Luminex platform. The creation of IgM and IgG multiplex MIAs allows for a comprehensive array of arboviral infections to be tested for concurrently. The large and complex data set that results from extensive multiplexing necessitates a thorough investigation of classification methods in order to identify the most effective technique. It must allow both flexibility of use and the ability to accommodate the inherent cross-reactivity of these viruses.
Here we report the development of multiplex microsphere immunoassays for detection of IgM and IgG to 6 flaviviruses, 6 alphaviruses, and 1 bunyavirus of human importance, incorporating validation results for its practical use in geographic batteries.

Ethics statement
The Division of Vector-Borne Diseases Human Subjects Advisor to the Centers for Disease Control and Prevention Institutional Review Board reviewed the procedures for "Multiplex Microsphere Immunoassays for the Detection of IgM and IgG to Arboviral Diseases" and confirmed that they do not meet the definition of research involving human subjects specified by 45 CFR 46.102(f). CDC IRB review was not required because specimens involved in this study were originally collected as part of standard CDC diagnostic operations and are archived expressly for development and testing. These specimens had all donor identification material removed at the time they entered the archive. Because data will be non-identifiable, this activity does not involve human subjects.
The suckling mouse brain antigens used in this study were made at the Centers for Disease Control and Prevention under the guidance of the Centers for Disease Control and Prevention-Fort Collins Institutional Animal Care and Use Committee (IACUC), protocol 11-013. Pain and suffering was minimized by hypothermia to effect during inoculation followed by return of the animals to their mother; euthanasia was performed at the first signs of illness including reduced milk intake. Animals were euthanized using isofluorane by inhalation to effect and hypothermia to effect as specified by the IACUC. Antigens produced under this protocol were not made specifically for this study but were made in accordance with the specific mission of the Centers for Disease Control and Prevention to provide reference quantities of reagents for arboviruses, and are made widely available.

Specimens
Identifying information was removed from serum and cerebrospinal fluid (CSF) specimens obtained from the DVBD diagnostic archives (Tables 1 and 2). The samples had previously been screened for anti-arboviral antibodies using one of the currently employed diagnostic IgM protocols (specifically IgM-capture ELISA (MAC-ELISA) [5], WN/SLE IgM MIA [7], or EEE IgM MIA (unpublished method)) or one of the IgG protocols (IgG-ELISA [6] or WN/SLE IgG MIA (unpublished method)) for serum only. A confirmatory plaque reduction neutralization test (PRNT) [4] was previously performed for all serum samples that produced positive IgM screening results, and for all samples that were submitted specifically for vaccine titer confirmation. Panels of serum samples were assembled to determine the initial classification parameters for the multiplex IgM microsphere immunoassay (multiplex IgM-MIA) and multiplex IgG-MIA. Infecting viruses were: flaviviruses dengue (DEN), Japanese encephalitis (JE), Powassan (POW), SLE, WN, and yellow fever (YF); alphaviruses eastern equine encephalitis (EEE), Mayaro (MAY), Chikungunya (CHIK), Ross River (RR), Venezuelan equine encephalitis (VEE), and California serogroup bunyavirus La Crosse encephalitis (LAC). While alphavirus western equine encephalitis virus (WEE) antigen was included in the tests, samples were unavailable. A panel of samples previously resulting in negative arboviral diagnoses was assembled (NEG). Numbers of sera per virus are listed in Table 1 in the "Initial" serum columns. Additional samples were assembled to investigate the use of geographic batteries (Table 1 "Geo Val" columns) and finally, samples submitted to the DVBD Arbovirus Diseases Activity diagnostic lab in the summer of 2011 were analyzed to validate the methods (Table 1 "2011" serum columns). No samples were available for WEE virus, and the VEE and YF samples were from vaccinees. Additional panels of sera were compiled for syndromes clinically similar to some arboviruses as follows: syphilis (N=58), Lyme disease IgM (N=10), Lyme disease IgG (N=10), rheumatoid factor (N=13) and anti-nuclear antibody (N=22). To determine if the multiplex IgM method was useful for the testing of CSF a panel of 131 samples comprising IgMpositives to WN (N=24), LAC (N=11), and JE (N=3) and POW (N=3) viruses and YF vaccine (N=1), plus antibody-negative samples (N=89) was assembled.

Assay controls
Controls were developed to confirm that components had been added correctly to the tests and to confirm sample characteristics and system integrity. Flavivirus group-reactive MAb DEN 4G2 [11], alphavirus group-reactive MAb EEE 1A4B-6 [12], and California serogroup bunyavirus groupreactive MAb LAC 10G5.4 [13] were each conjugated to phycoerythrin by Prozyme Inc., San Leandro, CA. These were used to test for antigen reactivity by adding to a well containing the other test components. A negative serum sample, previously tested for antibodies to all arboviruses, was added to a well on the plate containing all the other test components to serve as a baseline for the assay. The following internal control sets of microspheres, made by Radix BioSolutions, Georgetown, TX, were added to all wells of the assay: control to monitor nonspecific binding of detection reagents, instrument reporter laser control, serum addition control (IgM + IgG), IgM reporter addition control (IgM test only), IgG reporter addition control (IgG test only) and rheumatoid factor control (RF, IgM test only). The internal controls were used with the summer 2011 evaluation samples only.

Coupling of microspheres to monoclonal antibodies
Because purified antigens were not available for many of the viruses involved in the multiplex MIA, capture of the antigens was achieved using monoclonal antibodies (MAbs) coupled to the microspheres. Three MAbs were used: flavivirus groupreactive 6B6C-1 [14], alphavirus group-reactive 2A2C-3 [15], and anti-LAC 807-22 [13]. For each viral antigen, 25 µg of the appropriate purified antibody was covalently coupled to 5.4 million carboxylated MicroPlex® microspheres (Luminex Corporation, Austin, TX), using standard carbodiimide methodology. Hence, 6 microspheres sets of different spectral addresses were coupled to 6B6C-1 to accommodate the flaviviral antigens, 6 sets were coupled to 2A2C-3 (for the alphaviral antigens), and a single set was coupled to 807-22 (for the bunyaviral antigen). All coupled microsphere sets were adjusted to a concentration of 5x10 6 microspheres/ml and stored at 4°C for up to 18 months thereafter, as determined empirically.

Addition of antigens to coupled microspheres
Viral antigens were prepared in either suckling mouse brain (SLE, POW, YF, VEE, MAY, RR, CHIK, EEE, WEE, LAC) or were engineered recombinants expressed in COS-1 cells (WN [16], DEN 2 [17] combined with DEN 3 (unpublished), and JE [18]. Antigens were produced at the CDC/DVBD with the exception of the WN viral recombinant antigen, which was provided as a gift by Hennessy Research, Shawnee, KS. The optimal amount of viral antigen to add to the coupled microsphere sets varied depending on the individual antigen and varied between lots. This was initially determined by titration using known antibody-positive serum controls to each virus and compared to a negative serum control, and the working dilution was chosen to yield a median fluorescent intensity (MFI) of approximately 2000 for the positive controls. For the IgM assays, positive and negative sera were IgGdepleted using protein G sepharose prior to use. Antigen volumes per ml of antigen/coupled microsphere stock preparation ranged from 0.75 µl to 160 µl. In addition, negative (mock) antigens (suckling mouse brain (NSMB) and recombinant (NREC)) were added to separate sets of microspheres. The volumes used for the negatives were equivalent to those of the antigens for each virus family that required the most volume and were included in the test to identify nonspecific background reactions of the serum with both the coupling monoclonal antibody and any non-viral protein in the antigen preparations [19]. Coupled microspheres and antigens were mixed together in Candor Antibody Stabilizer (Boca Scientific, Boca Raton, FL) on a rotating platform for 1 h at room temperature, after which they were stored in the dark at 4°C for a minimum of 12 hours prior to use and up to 6 months thereafter, as determined by previous stability experiments (unpublished). Microspheres were at a concentration of 5 x 10 5 /ml in these stock solutions, which were generally made in 2 ml batches. The same preparation of antibody-coupled microspheres/antigen was used for both IgM and IgG MIAs for each virus. The microsphere regions (sets) used in these experiments were as follows:

Preparation of samples
For the IgM-MIA, it was desirable to remove potentially interfering IgG from the serum. The samples were diluted 1:20 in PBS then reacted with protein G sepharose in a 96-well filter plate Millipore Corporation, Billerica, MA) for 30 min at room temperature as previously described [7]. The plate was filtered using a vacuum manifold (Pall Corporation, Ann Arbor, MI) and the filtrates captured in a round-bottomed 96-well plate. Samples were further diluted to 1:400 [7] using 50% Low Cross Buffer in PBS pH 7.2 (Boca Scientific, Boca Raton, FL) prior to use. For the IgG-MIA, no prior treatment was necessary and samples were diluted to 1:400 in 50% LCB in PBS prior to use. To minimize pipetting errors and to make transfer to the MIA plates as convenient as possible, samples were placed identically on the serum dilution plates for IgM and IgG, in the order and position that they would appear on the MIA plates. Cerebrospinal fluid samples required no protein G treatment and were diluted 1:5 in 50% LCB in PBS.

IgM and IgG-MIA
To maximize efficiency and conserve supplies, IgM and IgG-MIAs were prepared concurrently. Two filter plates were prewetted with 150 µl PBS, one for the IgM and one for the IgG assay. A cocktail of viral antigens/antibody-coupled microspheres was made that included all 13 regions of microspheres associated with the viral antigens. A volume of 5 µl per microsphere region for each assay well was added to a single polypropylene tube, and undiluted LCB was used to make up the total volume so that 150 ul/well of the cocktail could be added to both plates. Similarly a cocktail containing the negative antigens/antibody-coupled microspheres was made that included all 4 negative antigen sets, using 5 µl of each set for each well, and undiluted LCB was used to make up the volume so that for 50 µl/well could be added to both plates. The negative cocktail was vortexed thoroughly and divided into 2 equal parts: one for the IgM assay and one for the IgG assay. For the IgM test 0.25 µl/well each of the internal controls: nonspecific control (region 53), serum verification M +G control (region 30), instrument reporter laser control (region 97), rheumatoid factor (RF) control (region 42), and reporter control M (region 47) was added to the negative antigen cocktail. For the IgG test, 0.25 µl/well each of the internal controls: nonspecific control (region 53), sample control M+G (region 30), reporter laser control (region 97), and reporter control G (region 33) was added to the negative antigen cocktail. Antigen detection controls were prepared in 50% LCB in PBS (4G2-PE at 8 µg/ml; 1A4B-6-PE and 10G5.4-PE at 4 µg/ml). A negative control serum was diluted to 1:400 in 50% LCB in PBS. The PBS was suctioned from the plates using a vacuum manifold. During all vacuum and wash steps, care was taken so that the filters did not completely dry out, which can cause aggregation of the microspheres and inconsistent results. The viral antigen/antibody-coupled microsphere cocktail was vortexed, and 150 µl was added to all control and test wells on both plates. This was immediately suctioned through the plate and the wells washed twice with 150 µl of PBS. Fifty microliters per well of vortexed IgM negative antigen cocktail plus internal controls were added to the IgM-MIA plate, and similarly the IgG negative antigen cocktail plus internal controls were added to the IgG plate. The addition of the negative antigens as a separate step was performed in order to avoid any contamination of the negatives with unbound viral antigens that would occur if the cocktails of viral and negative antigens on their respective antibody/beadsets were prepared in one tube. The wells were washed twice with PBS using the vacuum manifold, and the undersides of the plates were blotted to prevent capillary leakage in the next steps. To the IgM plate, 50 µl per well of 4 µg/ml donkey anti-human IgM Rphycoerythrin (Jackson Immunoresearch, West Grove, PA) in 50% LCB in PBS was added. Fifty microliters of the antigen detection controls 4G2-PE, 1A4B-6-PE and 10G5.4-PE were added to the first 3 wells on the plate in that order, and 50 µl of the negative serum control was added to the 4 th well. The IgGdepleted test serum samples at 1:400 and the CSF samples at 1:5 were transferred from the preparation plate to the subsequent wells on the plate at a rate of 50 µl/well. To the IgG plate, the antigen detection and negative controls were added to the first 4 wells, and the test serum samples at 1:400 were transferred to the subsequent wells. The undersides of both plates were blotted and the wells were covered with plate sealer. The plates were vortexed for 10 seconds on a flat surface vortexer to mix the well contents, the undersides blotted, the plates covered with aluminum foil-lined lids, and placed on a rotary plate shaker. The IgM plate was shaken for 1.5 hours at room temperature. The IgG plate was shaken for 45 minutes at room temperature, washed twice with PBS, and 50 µl/well of donkey anti-human IgG R-phycoerythrin (Jackson Immunoresearch, West Grove, PA) in 50% LCB in PBS was added. The underside of the plate was blotted followed by vortexing to mix the contents of the wells. The plate was shaken a further 15 minutes then washed twice with PBS. The underside was again blotted and 100 µl/well of BioPlex sheath fluid (BioRad, Hercules, CA) was added. The contents of the wells were resuspended and the median fluorescent intensity (MFI) values were obtained for the individually identifiable microsphere sets corresponding to the different antigens in each well using a calibrated and validated BioPlex 100 machine (BioRad, Hercules, CA). During results acquisition for the IgG plate, the incubation step for the IgM-MIA was completed and wells were washed twice with PBS, the underside of the plate blotted, and 100 µl/well BioPlex sheath fluid was added. The plate was placed in the dark until the IgG-MIA results acquisition was finished, after which the contents of the IgM plate were resuspended, the plate blotted, and results acquired. This method was used to test the initial serum samples and CSF samples detailed in the specimens section (Table 1).

Analysis methods
We implemented and evaluated 8 classification methods to select the approach that would provide the best performance over the range of data generated for the initial serum sample set. The methods considered were 1) simple classification by determining which antigen yielded the highest V/N (MFI of sample reacted on viral antigen /MFI of sample reacted on negative antigen) (MAX.V); 2) highest P/N (MFI of sample reacted on viral antigen /MFI of negative control reacted on viral antigen) (MAX.P); 3) individual antigen receiver operating characteristic (ROC) curve analysis; 4) linear discriminant analysis (LDA); 5) multinomial logistic regression (MLR); 6) support vector machines with linear basis (SVM-LIN); 7) support vector machines with radial basis (SVM-RAD); and 8) LogitBoost, a specific boosting method (LOGITBOOST) [20]. All methods were implemented in the R statistical software package (R Development Core Team (2012)) [21]. In order to apply the different classification schemes, each sample in the dataset was labeled according to its known infecting virus based on previous diagnostic results using IgM-ELISA, IgG-ELISA, and PRNT. The known negatives were so labeled. These labels constituted "truth" with respect to classification status. Models were developed for each classification method using truth versus the multiple, multivariate MFI measurements for the samples. Each classification method was fit to the full dataset and was then used to predict the infecting virus for each sample. A prediction error occurred when the model predicted a result other than "truth". To provide a further measure of predictive performance, cross-validation was used, where the full dataset was divided in half randomly, ensuring that each virus set was equally split. One half of the data, the test set, was removed, and all the classification methods were fit on the remaining data -the training set. The resulting fits were then used to predict the infecting virus or negativity of the samples in the test set, and the resulting prediction was recorded and compared with the truth. This random split was repeated 10 times, each of which provided an estimate of the expected error rate for all the classification methods. Finally, these were averaged and recorded as cross-validation error estimates (and associated confidence intervals) for each of the methods. The method with the lowest cross-validation errors for both true positives (sensitivity) and true negatives (specificity) as well as for the overall error rate was chosen as the classification method for the multiplex MIA. If two methods were effectively tied in this performance, the method that was easier to interpret and to implement in an Excel Add-in (macro) for data manipulation was chosen.

Validation based on geographic batteries
The IgM and the IgG-MIAs as described above contain 13 viral antigens. As a preliminary investigation into the use of geographic batteries, the panel of antigens in each multiplex was divided into 3 smaller panels: WN, SLE, POW, EEE, WEE and LAC for United States of America and Canada (US); WN, POW, DEN, JE, YF and CHIK for Asia/Africa/Europe (AAE); WN, SLE, DEN, YF, VEE, MAY, EEE and WEE for Central/ South America (CSAM). The RR antigen was not included in any of the batteries. The data from the classifying sample set were allocated to all batteries containing the infecting virus. All 8 classification methods were applied, where the data used included only the information pertinent to the antigens in the assigned geographic battery. Separate classification rules were obtained for each analysis method for each geographic battery. An additional smaller sample set obtained from the DVBD diagnostic archives containing samples that were not included in the original serum set was assembled as a means to evaluate a modified laboratory methodology and to determine classification parameters specific to the geographic batteries (see Table 1 "Geo Val" columns for specimen details). The methods described above were modified to include only the specific geographic battery antigens in the antigen/bead cocktails. Hence, 3 viral antigen cocktails were prepared. Because the AAE and CSAM batteries did not include LAC, the bunyavirus control 10G5.4-PE was not included for these tests. The only other modifications for this set were that the reduced number of antigens allowed for the volumes of viral antigen cocktails per well to be reduced to 100 µl, and the wash volumes to be reduced similarly. For diagnostic purposes the usual setup would be to incorporate all 3 batteries on one IgM-MIA and one IgG-MIA plate in series, with the antigen detection and negative controls preceding each sample set for each battery. Results were analyzed based on the classification parameters created by splitting the initial sample set into geographic batteries.

Summer 2011 samples used to validate the multiplex MIAs
Diagnostic serum and CSF submissions during the period May 2011 to September 2011 were tested for evidence of arboviral antibodies using current methodology to confirm positive reactions. After testing and reporting was complete, remaining samples were archived, de-identified, and used to validate the multiplex MIAs. A total of 419 samples were tested in the IgM multiplex MIAs; 228 samples were tested in the IgG multiplex MIAs (see Tables 1 and 2 for details). Results were analyzed using the most successful of the classification schemes (Logitboost and MLR) described above.

Results
The multiplex MIAs were initially performed to analyze serum and CSF antibodies reactive with 13 arboviral antigens, and to determine which classification method might be most successful for these data. The raw untransformed data showed that for many samples there was a discernible MFI difference between reactions with the homologous virus and the other arboviral antigens. Examples of these reactions for the IgM multiplex can be seen in Table 3. In the IgM test, approximately 91% of samples in the negative group had MFIs of less than 100 to viral antigens, whereas samples in the viral-positive groups had MFIs on a continuum from 100 to 10,000 to homologous viral antigens. The inherent background of the IgG test was higher than for the IgM test, with only 62% of negative samples exhibiting MFIs less than 100 to viral antigens, with the remainder varying up to 2,000. Samples in the viral-positive groups had MFIs of roughly 500 to 24,000. The overlap in MFI values between high negative and low positive samples, and between reactions to antigens of heterologous viruses are the scenarios in which choosing an appropriate classification scheme becomes particularly important. The overall variation and separation of the V/N and P/N measurements of samples tested on individual antigens in both IgM and IgG tests is illustrated in Figure S1a-d.

Choice of classification method
To obtain an initial indication of which classification method would fit the multiplex MIA data the best, we surveyed 8 approaches that ranged from simple (such as highest V/N wins) to more rigorous methods that require complex computation. The methods evaluated were: 1) MAX.P; 2) MAX.V; 3) ROC; 4) LDA; 5) MLR; 6) SVM-LIN; 7) SVM-RAD; and 8) LOGITBOOST. Methods 3-8 used both V/N and P/N values, as it appeared that some viruses performed better with V/N than P/N and vice-versa, as illustrated in Figure S1a-d. Each measurement (V/N and P/N) was included so that their contributions to classification could be evaluated. Prediction error rates for the IgM multiplex MIA generated using these methods (except for ROC to which overall error rates do not pertain) showed that for the full dataset, the ranking of overall error rates were: LOGITBOOST < SVM-RAD < MLR < SVM-LIN < MAX.P < MAX.V < LDA. When cross-validation was applied to the data, overall error rates rankings were: LOGITBOOST < SVM-LIN < MAX.P < MAX.V < SVM-RAD < MLR < LDA. For the IgG multiplex MIA, the order of error rates were LOGITBOOST < MLR < SVM-RAD < SVM-LIN < LDA < MAX.P < MAX.V. Error rates from cross-validation of the IgG test were ranked: LOGITBOOST < SVM-LIN < LDA < SVM-RAD < MAX.P < MLR < MAX.V. Logitboost therefore yielded the lowest error rates for both assays for both the full data set and the cross-validation of the data sets. Using Logitboost, overall error rates of 0.4% and 0% resulted from the full data set for IgM and IgG assays respectively, and error rates of 11.5% and 8.3% resulted from the cross-validation for the IgM and IgG assays respectively. The higher error rates seen for cross-validation ("test") are expected and provide a measure of anticipated prediction error for future use. Individual error rates for the viral antigens and overall rates for all the methods except ROC are shown in Tables 4 and 5. Based on this initial survey of methods where 13 viral antigens are included in the IgM and the IgG analyses, LOGITBOOST was the most successful of the classification methods that considers all virus groups simultaneously. It should be noted that to report sensitivities and specificities for these methods would be misleading because these metrics do not pertain to analyses of multiple viruses; hence error rates are a more informative and intuitive representation of truth. Because ROC compares individual virus groups to the negative group only, it differs from the other classification methods, and error rates for this method are reported in Table  6. The ROC analysis included individual error rates calculated based on calculated cutoff values for V/N and P/N results, and also results using V/N and P/N values of 2 as a cutoff, in order to compare the calculated cutoffs to the traditional ELISA cutoff of 2 for all viruses. In the event that geographic batteries were not appropriate to use, ROC would be the method of choice. The decision to use V/N or P/N with a calculated ROC cutoff or a cutoff of 2 for the IgM test is virus-dependent, although V/N with a calculated ROC cutoff was generally more successful than the other variants. By contrast, clearly the best ROC method across all viruses in the IgG multiplex was to use V/N with a calculated cutoff. As might be anticipated, the instances where a cutoff of 2 improved the error rates compared to the data-derived cutoff, the calculated cutoff was close to 2. This indicated that as a general rule, cutoffs calculated for the individual viruses were more useful.

Geographic batteries
Immunoglobulin M and IgG multiplex MIA data for each of the samples from the initial sample set (used to derive the classification parameters) were assigned to one of the 3 geographic batteries (US, AAE or CSAM). A map of the proposed batteries is shown in Figure 1. Only the data from the antigens within the batteries were used. New classification parameters (cutoffs or probabilities) were derived for each of the antigens within the batteries, and the error rates for each method using the full and test (cross-validation) sets are listed in Table 7 in rows labeled "Init (full/test).". Error rates were calculated using both V/N and P/N values together. For the both the IgM and IgG analyses of the geographic batteries, LOGITBOOST gave the lowest error rates for both the full data set and cross-validations. Error rates for LOGITBOOST ranged from 0.4% (US)-1.9% (AAE) for the full set and 8.94% (AAE)-11.67% (CSAM) for the cross-validation in the IgM test; and from 1% (US)-1.2% (CSAM) for the full set and 7.9% (AAE)-11.6% (CSAM) for the cross-validation in the IgG test.
A separate group of samples with roughly equal numbers spread between the geographic batteries were used to validate the classification parameters derived above (Table 1 columns "Geo Val"). Results in Table 7 (rows labeled "GeoVal") show that the error rates are higher than those of the initial data set for most of the methods.

Summer 2011 evaluation
The classification parameters derived for the geographic batteries on the initial sample set were applied to IgM multiplex MIA results from 327 archived samples tested initially by the traditional methods over the summer of 2011, and to 228 IgG multiplex MIA results. The samples were divided into geographic groups based on their origins (Table 2). This dataset included samples that were equivocal, had dual infection, had unknown vaccination status, or were indeterminate. Analyses were performed on those samples that were diagnostically-conclusive using the 7 multivariate classification methods in order to generate error rates for the sample set (Table 7, rows "2011"). This served to identify any specific trouble spots. Multiplex MIA results from diagnostically indeterminate samples were reserved for later analysis (see the following section). Overall, LOGITBOOST gave the lowest error rates, but these rates were notably high for some of the viruses within the batteries, in particular LAC in the US IgM battery, and the negative groups in the US and CSAM batteries. The source of these errors is illustrated in Figure 2, where the V/N values of each sample in the original set, the geographic validation set, and the summer 2011 set for the IgM test are plotted for each virus. Using Lac as an example, the V/N's of the original and geographic validation samples are higher overall than for the summer 2011 set. This caused several false negatives to be generated. False positives to SLE and EEE were also observed.

Combination of samples to form the final models
To address the errors illustrated in Figure 2, all 3 datasets were combined, effectively doubling the number of data points. In addition, the amassed data were reviewed manually to identify any samples that yielded obviously spurious results, to avoid creating final classification parameters that would be unduly biased resulting in unnecessary error. These samples were removed from the analysis. The models were refitted on the resulting larger sets. Error rates for the geographic batteries using LOGITBOOST and MLR were improved considerably ( Table 7 rows "Full set"), and are depicted in Figure 3. Error rates are also shown to illustrate the negative consequences of over-fitting the data using LOGITBOOST ("Iterations"), to confirm that the number of fitting iterations used in the final model was near-optimal. Because LOGITBOOST can result in a tie, error rates are also shown for when ties predict the correct (tie right) and incorrect (tie wrong) infecting virus. Indeterminate samples from 2011 (equivocals and dual infections) were analyzed using LOGITBOOST and MLR and results are illustrated for IgM only (Table S1). The applications of post-processing methods are needed to identify these types of samples during clinical use of the multiplex MIAs. Overall, LOGITBOOST (where both V/N and P/N values were included in the model as separate measures) was more successful than MLR, and was therefore selected as the classification method of choice.

Cerebrospinal fluid
Thirty seven CSF samples were tested in the 13-virus IgM multiplex MIA format and analyzed using LOGITBOOST and MLR in the geographic batteries in which the infecting virus appeared, less two that were diagnostically indeterminate. In addition, 71 (US), 7 (AAE) and 3 (CSAM) CSF's that were part of the summer 2011 validation sample set were analyzed using LOGITBOOST and MLR, with 12 indeterminate samples omitted. The infecting virus was identified correctly for 91% (US), 94% (CSAM) and 83% (AAE) of the samples using MLR. The correct virus identification was made for 90 (tie wrong)-92% (tie right) (US), 89-97% (CSAM) and 86-95% (AAE) of samples using LOGITBOOST. Details are shown in Table 8.

Non-arbovirus serum samples
Serum samples known to be positive for syphilis, Lyme disease IgM, Lyme disease IgG, rheumatoid factor and antinuclear antibody were tested using the multiplex IgM and IgG MIAs using the entire 13-virus panel to look for any crossreactivity. No samples showed evidence of reactivity with any of the viral antigens in either of the multiplexes.

Assay controls
Assay controls were added to the IgM and IgG MIAs for each geographic battery. A negative serum control served as the denominator for P/N calculations. The negative serum controls from the initial sample plates were compared to determine the variability between plates (intra-class correlation (ICC)) and whether there would be a need for subsequent test plates to be standardized against the historical controls. We evaluated the V/N values for the negative controls on all the viral antigens for within plate and among plate consistency by computing ICCs and associated 95% CIs. ICC values ranged from 0.87-0.99 with the confidence limits varying from 0.67 (WN) to 0.97 (LAC).
Genus-specific monoclonal antibodies coupled to phycoerythrin served as antigen verification controls. Means and standard deviations (SDs) were calculated for these controls on the viral antigens of their genus, and acceptable ranges were established (data not shown). These ranges will form the basis for quality control of subsequent diagnostic assays, once the method is introduced into the laboratory on a routine basis, and will be included in the assay analysis software that is currently under construction. Means, SD's and 95% content and 95% upper and lower tolerance limits of the MFIs were calculated for internal control bead sets that were placed in each test well. Any sample MFI that fell below the lower 95% tolerance limit for one or more of the instrument reporter laser, serum or conjugate controls would be repeated. Sample MFIs for the nonspecific bead reaction control that were greater than the 95% confidence limit for that bead region would also be repeated. MFI values for RF were informational only (See Table S2 for details). Of the 419 serum and CSF samples tested for IgM in 2011, numbers of samples out of range were 2 (serum), 12 (RF), 2 (reporter control) and 2 (nonspecific bead reaction). Of the 228 serum samples tested for IgG in 2011, numbers of samples out of range were 1 (serum), 7 (reporter control) and 0 (nonspecific bead reaction).

Dengue and yellow fever
The antigens used for DEN were a combination of recombinant DEN 2/3 serotypes; YF was 17D (vaccine strain) antigen made in suckling mouse brain. The individual performances of the known positive dengue and yellow fever (vaccine) serum samples in the13-virus multiplex MIA tests were assessed to determine a) whether the dengue antigen combination is sufficient to detect all serotypes from both primary and secondary infections, and b) to assess whether yellow fever vaccine recipients could be misclassified as being infected with alternate flaviviruses. Results from dengue serum samples of all 4 serotypes comprising both primary and secondary infections were analyzed using LOGITBOOST. Correct classification was achieved for 98% and 99% of results from the IgM and the IgG multiplexes, respectively, indicating that all 4 serotypes, regardless of whether they are primary or secondary, are capable of being classified correctly using LOGITBOOST. The single false classification for each test resulted from a secondary infection (IgM) and a primary infection (IgG). In addition, 91% of IgM samples gave classification probabilities of >90% for DEN. In the IgG test, 89% of samples gave classification probabilities >90% for DEN. Thus, the likelihood of a DEN infection being misclassified as another flavivirus was shown to be minimal using this analysis method. Using ROC (the individual method of analysis) 94% and 97% DEN-positive sera had values of V/N and P/N respectively that were greater than the calculated ROC cutoffs (Table 6) in the IgM assay. In the IgG assay, 95% and 84% had V/N and P/N values respectively that were greater than the DEN ROC cutoffs. In the IgM test, 1 primary and 4 secondary samples had V/N values below the cutoff; 1 primary and 1 secondary were below the P/N cut off. In the IgG test, 4 primary samples gave V/N values below the cutoff; in the IgG test 9 primary and 0 secondary samples were below the P/N cutoff. To obtain a measure of cross-reactivity between the flaviviruses as measured by ROC analysis, V/N values for the known DEN-positive samples tested on heterologous flavivirus antigens were analyzed. Values were found to be greater than the ROC cutoffs for WN (68%), SLE (35%), POW (2%), JE (60%), and YF (60%). When the absolute V/N values were compared, DEN V/N's were 7 to 18 times higher to the dengue antigen than to the other viruses. The same trend was true for the IgG test, where 100% of DEN V/N's were 2.5 to 5 times

Cross-reactivity of flaviviruses in the multiplex MIA versus ELISA
To partially evaluate whether the multiplex MIA demonstrated flavivirus cross-reactivity greater or less than the standard screening ELISA, an Arbovirus Diseases Branch database search was performed for IgM and IgG ELISA results of sera from confirmed SLE cases, because SLE represents one of the most serologically cross-reactive viruses in the genus. Results were compared from SLE antibody-positive samples where antibodies to DEN and WN viruses were also tested for, and where any samples with P/Ns of >2 were considered positive. The cross-reactivity's seen for IgM ELISA using SLE-IgM positive samples were: DEN 42% (N=54); WN 85% (N=108). The cross-reactivity's seen for IgG ELISA using SLE-IgG positive samples were: DEN 31% (N=26); WN 85% (N=64). By comparison, the multiplex MIA results using ROC cutoff's shown in Table 6 gave cross-reactivity's using SLE-IgM positive samples of: DEN 78% (N=54); WN 68% (N=54). The cross-reactivity's seen for multiplex MIA IgG using ROC cutoff's with SLE-IgG positive samples were: DEN 86% (N=59); WN 78% (N=59). LogitBoost results for the IgM multiplex MIA where the probabilities were >20% of the 59 SLE-positive samples being classified as DEN and WN were 13% and 0% respectively and where the probabilities were >10% of being classified as DEN and WN were 20% and 0% respectively. For the IgG multiplex MIA, the probabilities at the 20% Logitboost level were 2% and 2% for DEN and WN respectively; at the 10% level, probabilities were 17% and 5% respectively.

Discussion
The multiplexing capability of the BioPlex (Luminex) platform allows for a single small sample to be simultaneously tested against multiple viral antigens, which is advantageous over methods such as ELISA because results are generated at the same time under the same conditions. The ability of these assays to incorporate internal controls further validates the results. From a practical standpoint, the ability to prepare reagents for several months of testing at one time streamlines the routine use of the multiplex MIAs. To facilitate the practical setup of these multiplex tests, an Excel® workbook was devised to calculate the amounts of reagents needed per test based on sample origin, to guide sample/plate orientation, to track lot numbers of reagents and to provide specific operating procedures. The multiplexes also reduce buffer usage and plastics consumption.
The challenge when dealing with the large amount of data produced by these assays is to devise a method that successfully harnesses the power of the multiplexing arrangement, produces an accurate result output, and is programmable for everyday use. Quadratic discriminant analysis, used previously [7], was unsuitable in the context of these expanded multiplexes. Therefore we compared 8 different analysis methods in IgM and IgG multiplexes, and evaluated the best of these by using additional samples assigned to geographic batteries based on the sample origin.
The method that emerged as most useful was LogitBoost. For practical purposes, the probabilities generated by this method can be used not only to indicate the infecting virus, but also to rank close contenders that might have probabilities greater than for being negative. This informs the decision of which cross-neutralization tests should be performed, if applicable. LogitBoost has the advantage of potentially producing tied results, useful for identification of dual infections or equivocals. The resulting algorithm is relatively easy to program in Visual Basic® in Excel® (Microsoft Corp., Redmond, WA). The comprehensive output generated for the multiplexing method by LogitBoost captures and presents the data in a way that individual antigen analysis methods cannot accomplish, and the comparative error rates for the analysis methods underscore this utility.
The serodiagnostic portion of the clinical case definition adopted by the CDC for these viruses takes the following general format: Fourfold or greater change in virus-specific serum antibody titer (in quantitative tests between acute and convalescent specimens), or virus-specific IgM in cerebrospinal fluid (CSF), or virus-specific IgM demonstrated in serum and confirmed by demonstration of virus-specific IgG in the same or a later specimen by a different type of serological assay. A case will be classified as probable if confirmatory test results are not obtained [22]. Results from IgG testing using ELISA or MIA do not factor into the clinical case definitions for these viruses. However, the data from IgG tests are useful in a) corroborating results of the IgM tests, b) providing evidence of a previous arboviral infection when IgM is not detectable, c) indicating an anamnestic response, especially in secondary DEN viral infections, and d) pointing the diagnosis toward an unsuspected arboviral infection due to the presence of intragenus cross-reactive IgG antibodies. For the past 15 years, the Arboviral Diseases Branch at CDC has used ELISA to test for IgM and IgG to arboviruses [5,6]. Testing involves the reaction of an aliquot of diluted serum on a separate plate for each arbovirus indicated by the domestic or travel history of the patient, which may amount to 6 or more plates. This can be an inefficient use of time, reagents, sample and supplies. Recently, MIAs have been developed to detect IgM antibodies to WN and SLE viruses in a duplex arrangement [7], and also to detect IgM to EEE virus (unpublished). To perform the IgM and IgG multiplex MIAs in their entirety for each sample on a routine basis would be excessive and wasteful of reagents. The method used for triaging samples in the Arbovirus Diseases Branch diagnostic lab at the CDC is to test them according to a battery of arboviruses known to circulate in particular geographic locations. Currently, there are 9 geographic batteries (western US, eastern US, Europe, Asia, Central America, Africa, South America, South Pacific and Australia). To strike a balance between efficiency in use of reagents and complexity of test setup, the number of batteries for the MIAs was reduced to 3 (US, Asia/Africa/Europe (AAE), and Central/South America (CSAM), which took advantage of the large degree of duplication of viruses within these expanded batteries. Australia was not included as the only antigen in the multiplex specific to Australia is RR, and this can be performed ad hoc. An antigen preparation suitable for detecting anti-Murray Valley encephalitis virus antibodies using MIA was not available, so this and some other rarely needed tests will remain as ELISAs for the time being in our laboratory. The new geographic battery virus allocation appeared to work well and will simplify workflow.
The increased error rates seen when the geographic validation sets were tested is largely due to the relatively small numbers of samples for each virus within the groups; therefore one wrong result can make a large difference in error rate. As with any statistically-based model, the more data points there are in a set, the more accurate the predictions. To achieve this, all 3 data sets were combined to produce a final working model for use in the laboratory. This improved the error rates considerably for some groups. It should be noted that some viruses such as MAY were poorly represented and the strength of the models for these were not as great. In situations where only a few known positives with high V/N and P/N's were available to establish the classification rules, there is the possibility that true positives with much lower values could occur. These may be classified incorrectly as negative, as the cutoff, regardless of classification method, would be impossible to determine accurately. To address this issue and those of background reactions and equivocal results, post-processing of results will be implemented within the context of an Excel® add-in which is currently in development. This will integrate with the Excel output of the BioPlex instrument to generate probabilities by using LOGITBOOST. Results for a specimen reacted on each viral antigen in the test batteries will be ranked, where the antigen with the greatest classification probability is reported as the infecting virus. The ROC data reported here gives individual V/N and P/N cutoffs for each viral antigen, and these can be used as a secondary measure to cross-check the results. For viruses where ROC cutoffs were derived from very small numbers of samples (e.g., MAY, POW, WEE) V/N and P/N cutoffs of 2.0 may be used, as confidence was low in the empirically-derived cutoffs and 2.0 is a number that has traditionally been used with ELISA, despite the low error rates seen with the calculated cutoffs. Additional postprocessing will be used to identify the following categories: a) background reactions due to nonspecific activity of the samples to antigens causing false positive results, where V/N < ROC cutoff and P/N > ROC cutoff; b) indeterminate results where V/N > ROC cutoff but P/N < ROC cutoff; c) V/N and P/N are both > ROC cutoff, but LOGITBOOST probabilities are too close to call such that the infecting virus cannot be identified; d) equivocal results where the highest probability is close to that of the negative group. It should be noted that the outputs of the multiplexed arboviral MIAs are not quantitative in terms of comparing the amount of specific antibody in a sample.
The reasonably common situation arises where a sample needs testing for a virus that does not appear in the geographic battery related to its origin or is newly recognized as being important, for example Jamestown Canyon in the US and Canada [23] or recently, where DEN has been shown to be transmitted in the Florida Keys [24]. In this type of situation, viral antigens can easily be tested for on an ad hoc basis in addition to any geographic battery, and ROC can be used to determine reactivity to these antigens outside of the Excel addin. Arboviral serosurveys that usually involve the testing of samples with only one or two antigens can utilize ROC cutoffs alone. However, LogitBoosting can conceivably be used under these circumstances with some further programming for use with flexible batteries, an ideal we are pursuing.
The ICC data suggested a small degree of plate to plate variation but the decision to normalize plates to mitigate this effect will be made when the Excel® Add-in had been completed, so that results can be compared for some positive samples using the finalized algorithm. It was observed that when the magnitude of the MFIs of the negative controls on a plate varied, the test samples varied similarly; hence the need for plate to plate comparison may be mitigated.
This study contains data regarding previously-understood but unpublished information regarding the degree of crossreactivity between arboviruses, which is useful for purposes of test development and interpretation of results. This was discussed briefly in relation to DEN and YF. In addition, a glimpse into the cross-reactivity of ELISA versus multiplex MIA was illustrated by looking at the results when SLE-positive samples were tested in WN and DEN assays. The ROC cutoff method for MIA showed that cross-reactivity is detected in the multiplex as much if not more than in the ELISA (possibly due to a marginally greater sensitivity of the MIA), but that LogitBoost can be expected to yield greater viral specificity when applied to the MIA. In addition, these data may provide insight regarding the capability of combined IgM and IgG testing results to reduce the need for confirmatory PRNT's in some instances. An in-depth analysis of both of these facets, while of great interest, is outside the scope of the current paper.
The Luminex platform has been used extensively for multiplexed testing for viruses in the diagnostic arena. The vast majority of these tests are for identification of virus-specific nucleic acid material [25,26]. Recently, multiplexing methods for serodiagnosis have been developed [27,28]. Many of these, such as Lammie et al. 2012 [29], take advantage of the method to test for etiologic agents of a disparate nature where antibodies are easily identified from one another. The choice of classification method may not be particularly critical under these circumstances. Here, we focus on very closely-related arboviruses, the antibodies to which frequently cross-react among the genuses, thus rendering analysis more challenging. This work represents both the most comprehensive, validated multiplexing method for arboviruses to date, and also the most systematic attempt at determining the most useful classification method for use with these serologic tests.