Critical Appraisal of Four IL-6 Immunoassays

Background Interleukin-6 (IL-6) contributes to numerous inflammatory, metabolic, and physiologic pathways of disease. We evaluated four IL-6 immunoassays in order to identify a reliable assay for studies of metabolic and physical function. Serial plasma samples from intravenous glucose tolerance tests (IVGTTs), with expected rises in IL-6 concentrations, were used to test the face validity of the various assays. Methods and Findings IVGTTs, administered to 14 subjects, were performed with a single infusion of glucose (0.3 g/kg body mass) at time zero, a single infusion of insulin (0.025 U/kg body mass) at 20 minutes, and frequent blood collection from time zero to 180 minutes for subsequent Il-6 measurement. The performance metrics of four IL-6 detection methods were compared: Meso Scale Discovery immunoassay (MSD), an Invitrogen Luminex bead-based multiplex panel (LX), an Invitrogen Ultrasensitive Luminex bead-based singleplex assay (ULX), and R&D High Sensitivity ELISA (R&D). IL-6 concentrations measured with MSD, R&D and ULX correlated with each other (Pearson Correlation Coefficients r = 0.47–0.94, p<0.0001) but only ULX correlated (r = 0.31, p = 0.0027) with Invitrogen Luminex. MSD, R&D, and ULX, but not LX, detected increases in IL-6 in response to glucose. All plasma samples were measurable by MSD, while 35%, 1%, and 4.3% of samples were out of range when measured by LX, ULX, and R&D, respectively. Based on representative data from the MSD assay, baseline plasma IL-6 (0.90±0.48 pg/mL) increased significantly as expected by 90 minutes (1.29±0.59 pg/mL, p = 0.049), and continued rising through 3 hours (4.25±3.67 pg/mL, p = 0.0048). Conclusion This study established the face validity of IL-6 measurement by MSD, R&D, and ULX but not LX, and the superiority of MSD with respect to dynamic range. Plasma IL-6 concentrations increase in response to glucose and insulin, consistent with both an early glucose-dependent response (detectable at 1–2 hours) and a late insulin-dependent response (detectable after 2 hours).


Introduction
Interleukin-6 (IL-6) is a cytokine that is released from a multitude of sites under a wide range of conditions. IL-6 secreted by immune cells, adipocytes and endothelial cells plays a well known role in the chronic low-grade inflammation characteristic of obesity [1,2], diabetes and cardiovascular disease [3], as well as the acute immunological crises of infection and sepsis [4]. However, more recent studies have challenged the notion that the actions of IL-6 are either entirely immunological or wholly detrimental. IL-6 is released from contracting skeletal muscle before and after exercise, including moderate ''non-damaging'' exercise recommended by health professionals [5]. Furthermore, increases in plasma IL-6 concentrations directly stimulate both glucose [6] and lipid metabolism [7]. The additional finding that plasma IL-6 also rises in response to both acute hyperglycemic clamp and pulse [8], as well as hyperinsulinemia [9] highlights the potential role of this cytokine in substrate metabolism.
In normal healthy subjects free of inflammation, IL-6 concentrations are typically quite low, in the range of 0.2-7.8 pg/mL [10,11] but can exceed concentrations of 1600 pg/mL in sepsis [12]. More modest increases in IL-6 concentrations are associated with age [13], hyperglycemia [8] and the physiologic stress of acute exercise [5]. As IL-6 is detectible in plasma it therefore has the potential to reflect systemic inflammatory, metabolic, and physiologic stimuli. To elucidate the multiple biological pathways in which IL-6 is involved, it is essential to have the ability to precisely quantify it across a broad dynamic concentration range and to have confidence in the face validity of the measure, i.e., that it is measuring what it is purported to measure.
Therefore we designed a study to assess the performance metrics and face validity of cytokine concentrations generated by three different IL-6 immunoassays. In designing this study, we proposed several criteria a priori to judge the performance of each particular immunoassay. At a minimum, the dynamic range of the assay needed to be broad enough to measure both the low levels of IL-6 found in normal healthy individuals as well as the high levels characteristic of altered homeostasis associated with many disease or pre-disease conditions, ideally without the need for diluting samples to bring their values into range. Second, it was particularly important that assay reproducibility be high, not just with minimal variability within and between plates, but across different kit lots produced at different times in order to perform meta-analyses of data derived from multiple studies over time. Third, it was important that the values produced by a particular assay fulfill face validity criteria by showing the ability to detect potentially biologically relevant changes in plasma IL-6 in response to appropriate stimuli. This assessment required a sample set in which IL-6 concentrations would be expected to change in a predictable way in response to physiologic stimulation. Hyperglycemia [8] and hyperinsulinemia [9] have each been shown to raise plasma IL-6 levels, although with different response times. Therefore we chose to evaluate IL-6 in a set of serial samples obtained from healthy, but obese, middle-aged subjects during a frequently sampled intravenous glucose tolerance test (IVGTT). We hypothesized that single infusions of glucose and insulin would result in a measureable elevation in plasma IL-6 concentrations. Additionally, our goal was to identify a method for IL-6 quantification that met all three pre-specified assessment criteria.

Participants and Ethics Statement
On the basis of availability of sufficient volumes, samples from a total of 14 subjects were selected from the control arm (no exercise intervention) of the Studies Targeting Risk Reduction Interventions through Defined Exercise (STRRIDE) [14]. The purpose of STRRIDE was to assess the effect of the volume and intensity of exercise training on insulin sensitivity in a population of overweight, sedentary, non-diabetic, middle-aged adults. Informed written consent was obtained from all subjects, and all procedures were approved by the Institutional Review Board of Duke University Medical Center.
Pooled plasma from four healthy subjects served as a control specimen. For all assays, the mean of the pooled control sample plus or minus 2SDs was defined as the acceptable precision limits. Any plates in which the control falls outside of this range are repeated. No repeat plate analyses were required in this study based on this criterion. The dynamic range of each assay was defined by the highest and lowest concentrations of calibrators specified in each kit. Of note, the pooled control sample was within the manufacturer's published dynamic range for each assay ( Table 1). The range of sample measurements was defined by the highest and lowest IL-6 concentrations in IVGTT plasma samples obtained via each method. Quantifiability, or the percentage of samples that were in the range of each assay, was defined as the ratio of the number of samples yielding concentrations within the assay range/total number of samples assayed; samples outside the measureable ranges were denoted as those that were either above or below the upper or lower limits of quantification. For purposes of graphical representation only, samples with IL-6 values above or below the range of detection were substituted with values twice the upper limit of quantification or one-half the lower limit of quantification, respectively, as determined by the highest and lowest concentrations of the standard curve.
Reproducibility was reported as percent coefficient of variation (%CV), calculated as 100*SD/Mean. Intra-plate variability was calculated using duplicate measure of manufacturers' calibrators and plasma IVGTT samples, based on availability as follows: for MSD, all calibrator curves and 163 IVGTT samples; for R&D, all calibrator curves and 162 IVGTT samples; for LX all calibrator curves (except ULX) and 22 IVGTT samples; and for ULX all calibrator curves except LX. Inter-plate and inter-lot variability was assessed using a pooled plasma sample (collected from four individuals) measured in duplicate on MSD and R&D and measurement of 100 beads from individual wells on LX and ULX. Two lots were compared for MSD and LX, three lots were compared for R&D, while only lot was available for ULX. To assess responsiveness of IL-6 to IVGTT, serum samples were measured in duplicate for MSD and R&D (unless sample was limited, as indicated) and in the case of the bead based assays, LX and ULX, a minimum of 100 beads were analyzed from individual wells.
Pearson correlation coefficients were calculated using IL-6 measurements from plasma samples. To assess agreement between assays we performed Bland Altman tests [16] of z score normalized data where z = (x -m)/s, where x is the raw concentration, and m and s are the mean and standard deviation of all concentrations for that assay. This was necessary due to the fact that the units of measure for LX were much greater than for the other assays. To evaluate responsiveness to glucose, untransformed IL-6 concentrations at t = 180 minutes were compared to baseline for each subject using the paired t test. In four subjects (1, 7, 8, and 13) baseline sample was unavailable for assay by ULX, therefore comparisons were made to sample collected at 2 or 6 minutes. Statistical analysis was performed using GraphPad Prism, with significance defined as p,0.05.
Reproducibility (both within plates and between plates and lots) was similar and acceptable for both MSD and R&D, but lower for LX and ULX. MSD, R&D, and ULX (but not LX) consistently detected changes in IL-6 concentration upon stimulation by glucose administered during the IVGTT (Table 1 and Figures S1,S2,S3,S4, S5,S6,S7,S8,S9,S10,S11,S12,S13,S14). Although similar in many respects, a notable difference between these assays was the dynamic range. While both R&D and ULX were designed to detect the very low levels of IL-6 typically found in healthy individuals, both assays failed to measure one low value (0.6% and 1% respectively). Additionally, R&D was constrained by an upper limit of 10 mg/ml, yielding out of range (too high) values for 6 (3.7%) samples. Although the upper range of the LX assay was presumably sufficient to capture these high values, it was constrained by a lower limit of detection of 9.47 pg/ml, thus failing to measure 35% of the samples assayed here. Only MSD detected IL-6 in all samples measured, and had the capacity to yield results on the first determination for samples with very high concentrations thus minimizing the need for sample dilution and reassay. Obtaining high-end measurements with R&D and ULX could potentially require dilution of samples, with the consequences of both higher costs (due to need for multiple repeat measurements) and higher technical variability in the studies.
Calibrators (IL-6 standards) from each manufacturer were tested on each of the other assays ( Figure 1) and produced generally parallel standard curves. However, there was notable variability in the measured signals between different calibrators that were expected to contain similar concentrations of IL-6. This variability was generally least at low IL-6 concentrations and greater at higher concentrations. Variability between calibrators was lowest for R&D compared to the three other immunoassays, and highest for ULX. Additionally, the four assays displayed Correlations of IL-6 concentrations ( Table 2) between MSD and both R&D and ULX assays were strongest (Pearson correlation coefficient r = 0.94, p,0.0001; r = 0.90, p,0.0001, respectively), and weaker between R&D and ULX (r = 0.47, p,0.0001). LX correlated poorly with ULX (r = 0.31. p = 0.0027) and not at all with MSD and R&D (r = 0.15, p = 0.13; r = 20.17, p = 0.097). While Bland Altman plots (displaying the means vs. the differences of sample measurements) are the standard method of  assessing agreement between assays, the hundreds-fold higher values returned by LX compared with the other three assays necessitated comparison across assays using z scores that normalize each set of values and express them as standard deviations from the mean. Bland Altman plots revealed the highest agreement (narrower limits of agreement) between MSD, R&D, and ULX, and essentially no agreement between LX and any of the other assays ( Figure 2). To assess the responsiveness of the IL-6 immunoassays to changes in IL-6, we measured IL-6 in serial plasma samples derived from IVGTTs. We hypothesized that an assay capable of detecting the modest changes in IL-6 concentrations would meet criteria for biological plausibility and face validity; namely that the assay would be capable of detecting biologically relevant variation in IL-6 concentrations under a wide range of conditions and would be measuring what it purports to measure. Over the course of the 180 minute IVGTT, IL-6 increased significantly compared to baseline, as detected by MSD, R&D, ULX, but not LX. These increases in IL-6 were discernible when the IVGTT time course of each individual subject was plotted ( Figures S1,S2,S3,S4,S5,S6,S7, S8,S9,S10,S11,S12,S13,S14). To further characterize the concentrations of IL-6, the timecourse of mean glucose and insulin concentrations during the course of the IVGTT were plotted (Figure 3, representative data derived from MSD assay). As expected, mean glucose rose after glucose infusion peaking three minutes after the beginning of the IVGTT at a mean (SD) concentration of 253 (45.8) mg/dL before returning to baseline within one hour. Insulin, via endogenous release, increased immediately following glucose infusion, reaching an initial mean peak (SD) concentration of 72.3 (38.0) pmol/L at 4 minutes, and a subsequent peak mean (SD) concentration of 270.4 (99.5) pmol/L at 22 minutes following insulin infusion at 20 minutes, then returned to a concentration equivalent to baseline concentrations by 90 minutes. The mean (SD) IL-6 concentration, 0.90 (0.49) pg/ mL, was equivalent to baseline concentration until 60 minutes after the start of the IVGTT, became significantly different from baseline at 90 minutes (1.2960.59 pg/mL, p = 0.049) and continued to rise steadily until 180 minutes (4.2563.67 pg/mL, p = 0.0048) when the IVGTT was terminated. These characteristic changes in IL-6 during an IVGTT confirm the face validity to the MSD immunoassay and the R&D and ULX assays that showed a similar pattern of IL-6 change.

Discussion
In this validation study, we sought not only to quantify the dynamic range and reproducibility of each method, but importantly, to establish the face validity of the IL-6 immunoassays through demonstration of biologically plausible change during the course of an IVGTT. Based on results of Esposito et al. [8], and the fact that we used similar glucose pulse conditions, we expected to see increases in plasma IL-6 levels during the course of an IVGTT study. MSD, R&D and ULX all detected changes in IL-6 concentrations in response to glucose and insulin, and were comparable with regard to other assay metrics, with the exception that MSD had a broader dynamic range than ULX or R&D. Correlation of concentrations, and agreement as assessed by modified Bland-Altman tests were strong between the three assays but weak with LX, suggesting that the three assays are indeed measuring the same analyte, i.e., IL-6. The variability in standard curves of different manufacturers (Figure 2), while not extreme, was nevertheless noteworthy. Some variability in concentrations and the potential presence of impurities in different formulations, even from the same manufacturer, are to be expected. Calibrator variation is likely to be the source of systematic bias in measurements between different assays, although differences in the recognizing antibodies may also play a role. These differences are to be expected since immunoassays are neither capable nor designed to yield absolutely precise concentrations of analytes and ultimately this variability could be adjusted with an international standard.
The IL-6 results derived from the IVGTTs suggested two phases of an IL-6 response that appear to reflect distinct but coordinated regulation by glucose and insulin. Furthermore, the extended duration of IL-6 elevation suggests that gene expression, protein synthesis and release, and clearance may all be involved in IL-6 regulation by glucose and insulin, demonstrating an interaction between immunological and metabolic pathways. In our study, both glucose and insulin were infused intravenously, the former at the start of the IVGTT and the latter after 20 minutes. Although the infusion of glucose and insulin in our study were episodic rather than continuous, and insulin was infused at lower concentrations than previously tested [9], we nevertheless detected a significant increase in plasma IL-6 concentrations in response to both stimuli. IL-6 rose steadily after the first 60 minutes through the end of the study at 3 hours, correlating temporally with both glucose and insulin infusion.
These data complement and expand the existing data on IL-6 variation in response to change in glucose homeostasis. In one previous study [8], hyperglycemic clamp (with inhibition of insulin release) led to a phasic (rapid rise by one hour, then return to baseline by three hours) response of IL-6. In contrast, sustained hyperinsulinemia (with glucose held at fasting levels) led to a slow and continuous rise in IL-6 beginning after 2-3 hours and continuing at least 6 hours [9]. It remains to be seen how changes in glucose tolerance and insulin sensitivity, as impaired by obesity or ameliorated by exercise, might be reflected in the pattern of IL-6 response.
As the number and type of molecular assays proliferate, it becomes increasingly important for research groups to thoughtfully choose and validate the methods by which they generate biomarker data. Earlier efforts may have dispensed with this step for the simple reason that only one assay may have been available for a particular analyte, but years of product development have increased options as well as the responsibility to deliberately select a method that optimizes the criteria required of the research objectives. Regarding the technical aspects of this study, of the three assay methods, the MSD Ultra Sensitive Immunoassay proved preferable for quantifying IL-6. The dynamic range accommodated both the very high and low concentrations, variability within and between plates and between lots was sufficiently low, and the assay required only small volumes of sample.. We experienced problems in measuring IL-6 with the Invitrogen Luminex assay in the context of a multiplex panel, including poor reproducibility (particularly between plates and between lots), and inconsistent detection of analyte changes in response to physiologic stimuli. We have had more consistent results and continue to use Invitrogen Luminex for other analytes.
The Invitrogen Ultrasensitive Luminex provides an acceptable alternative, although its dynamic range and reproducibility are more limited than MSD. Perhaps most important, the face validity of the MSD, R&D, and ULX assays -i.e., that they are actually  Author Contributions