Comparability of thyroid-stimulating hormone immunoassays using fresh frozen human sera and external quality assessment data

Background This study aimed to assess the comparability among assays using freshly frozen human sera and external quality assessment (EQA) data in China. Methods Twenty-nine serum samples and two commercial EQA materials, obtained from the National Center for Clinical Laboratories (NCCL), were analyzed in triplicate using eight routine TSH assays. The commutability of commercial EQA materials (NCCL materials) was evaluated in accordance with the CLSI EP30-A and IFCC bias analysis. Median values obtained for the NCCL EQA materials were used to determine the systematic and commutability-related biases among immunoassays through back-calculation. The comparability of TSH measurements from a panel of clinical samples and NCCL EQA data was determined on the basis of Passing–Bablok regression. Furthermore, human serum pools were used to perform commutable EQA. Results NCCL EQA materials displayed commutability among three or five of seven assay combinations according CLSI or IFCC approach, respectively. The mean of systematic bias ranged from -13.78% to 9.85% for the eight routine TSH assays. After correcting for systematic bias, averaged commutability-related biases ranged between -42.26% and 12.19%. After correction for systematic and commutability -related biases, the slopes indicating interassay relatedness ranged from 0.801 to 1.299 using individual human sera, from 0.735 to 1.254 using NCCL EQA data, and from 0.729 to 1.115 using pooled human serum EQA(the commutable EQA). Conclusions The harmonization of TSH measurement is challenging; hence, systematic and commutability-related biases should be determined and corrected for accurate comparisons among assays when using human individual serum and the commercial EQA materials.


Results
NCCL EQA materials displayed commutability among three or five of seven assay combinations according CLSI or IFCC approach, respectively. The mean of systematic bias ranged from -13.78% to 9.85% for the eight routine TSH assays. After correcting for systematic bias, averaged commutability-related biases ranged between -42.26% and 12.19%. After correction for systematic and commutability -related biases, the slopes indicating interassay relatedness ranged from 0.801 to 1.299 using individual human sera, from 0.735 to 1.254 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Thyroid disease is a global health problem that can impact well-being, particularly during pregnancy and childhood [1][2][3]. Although widespread thyroid function testing has resulted in a reduction in the prevalence of undiagnosed thyroid disease [4], there remains a lack of consistency in inter-laboratory results and consensus. Thyroid-stimulating hormone (TSH) has long been used to evaluate thyroid function [5], and some studies have developed robust factor analysis models in an effort to harmonize TSH measurements [6]. Moreover, in 2010, the International Federation of Clinical Chemistry Working Group for standardization of Thyroid Function Tests published three reports on the standardization of these tests [7][8][9].
Currently, external quality assessment (EQA) is not only an essential component of laboratory management systems but also used as an index for monitoring the status of standardization. In this regard, although commutable EQA samples, particularly human serum pools, are used to determine the same numeric relationship among different measurement procedures and laboratories expected for patient samples [10], it is difficult to obtain such commutable materials, owing to certain limitations associated with factors including concentration, quantity, and transportation [11].
In this study, we obtained data to conduct an assessment for TSH measurement, namely (1) the comparability of TSH immunoassays, for which systematic biases were corrected, using a panel of clinical patient samples in China; (2) the commutability of National Center for Clinical Laboratories (NCCL) commercial EQA materials; (3) the comparability of TSH immunoassays, for which commutability-related biases were corrected, using NCCL EQA materials; and (4) the comparability of TSH immunoassays using pools of fresh human serum. Furthermore, we established a method to determine precise relationships among TSH immunoassays based a panel of individual human serum samples and non-commutable commercial EQA materials (NCCL EQA materials).

Serum panel and EQA materials
Twenty-nine serum samples containing different TSH concentrations were obtained from the clinical laboratory of Beijing Chaoyang Hospital, Capital Medical University (Beijing, China), after approval from its human ethics committee (document 2018-2-26-1). Each of these samples was aliquoted into eight portions and frozen at -80˚C until use. All 29 specimens were non-hemolyzed and non-lipemic and contained TSH ranging from 0.09 to 84.03 μIU/mL, as determined using a Siemens ADVIA CentaurXP immunoassay (Siemens Healthineers, Tarrytown, NY, USA), collected from January 2018 to July 2018. All authors could not access to information that could identify individual participants during or after data collection. Two commercial EQA materials, which were randomly interspersed among the 29 samples of patient, were obtained from NCCL. All samples were analyzed in triplicate after internal quality controls were passed. On the one hand, the commutable EQA materials (human serum pool), which were prepared as previously described [12], but not frozen, were measured in cities in the provinces of Beijing, Tianjin, Hebei, and Shandong.

Data analysis
After excluding outliers on the basis of three standard deviations, the median values of the NCCL EQA materials were used to determine the systematic biases. The commutability of the EQA materials was carried out following Clinical and Laboratory Standards Institute (CLSI) guideline EP30-A and difference in bias based on the recommendations by the IFCC Working Group on Commutability [12][13][14][15] and commutability-related biases were determined on the basis of backcalculation among assays after correcting for systematic biases via Deming regression.
The commutability assessment was done according to linear regression analysis firstly, the log-transformed mean concentration of each serum sample was plotted vs log-transformed mean concentration obtained with ADVIA CentaurXP. The Deming regression and 95% prediction interval around this regression were plotted using formulas described in CLSI EP30-A. The log-transformed median concentrations of NCCL EQA materials were also plotted, the commutability were confirmed if its data point was inside the 95% prediction interval. The commutability-related biases were calculated with Eq 1.
Biasð%Þ ¼ ðC mean;other À ð10 logC mean;cen �bþa ÞÞ � 100 ð10 logC mean;cen �bþa Þ ð1Þ Where C mean,other was the measured TSH average concentration with other systems, C mean, cen was the measured TSH average concentration with ADVIA CentaurXP, β and α were the slope and intercept with Deming regression, respectively. The commutability assessment was done according to difference in bias analysis secondly, The bias of each serum sample calculated as the difference between the ln-transformed mean concentration obtained with the other method and the ln-transformed mean concentration obtained with ADVIA CentaurXP, was plotted against mean of other method and ADVIA CentaurXP. Lines for the average bias for the human serum sample, black line, and the criteria for commutability (C) of the EQA materials, dashed lines, are shown on the plot. A C value of 0.237 (about 23.7% in concentration, based on desirable specification for total error from biologic variation of TSH) was used for this example. The uncertainty of the difference in bias between the human serum sample and the EQA material was shown for each EQA material as error bars. The uncertainty consists of two components: the uncertainty of the estimate of bias for the human serum samples and the uncertainty (substituted by the uncertainty of human serum samples because position effects could not be obtained for this study) of the estimate of bias for each EQA material. The associated expanded uncertainty was calculated with the coverage factor 1.9 multiplied the uncertainty. When the expanded uncertainty interval was inside C lines the EQA material was commutable, when it was outside C lines the EQA materialwas non-commutable.
After correcting for systematic and commutability-related biases, the comparability of TSH immunoassays using individual serum samples and commercial EQA data (from 2016 to 2019) was determined via Passing-Bablok regression using MedCalc Statistical Software version 19.0.7 (MedCalc Software bvba, Ostend, Belgium). The comparability among assay measurements was also assessed using commutable EQA. Pearson correlation coefficients were also calculated. Coefficients of variation (CVs) of every EQA materials were calculated after outliers (any laboratory results that is more than 3 standard deviations) were removed in two EQA programs for these eight platforms, respectively. CV of within-laboratory was calculated from triplicate measurements. All the comparisons were based on the use of ADVIA CentaurXP as a comparative assay because this platform were always used for clinical sample measurement in our laboratory which has been accredited by International Standardization Organization (ISO) 15189 and College of American Pathologists (CAP). A workflow chart were showed in

Descriptive statistics
The concentrations of TSH in the 29 individual samples and EQA samples determined using eight different immunoassays are presented in biases were corrected, detailed in Tables 1 and 2. Almost all samples were assessed thrice, with just a few samples being tested twice owing to an insufficient volume of serum obtained from one individual. Furthermore, measurements were not obtained for one sample, as the TSH concentrations exceeded the upper limit of the analytical measurement range (>75 μIU/mL) of the Immulite 2000 platform. Between 2016 and 2019, and 2017 and 2019, 1666 to 2453 laboratories participated in the NCCL EQA Program and 145 to 477 participated in the commutable EQA program, respectively. Details of the numbers of peer groups were listed in S2 and S3 Tables. Similar CVs between these two EQA programs were obtained, with medians of 3.75~6.25% and 2.82~6.33%, respectively (Tables 2 and 3). Furthermore, using individual human sera, we obtained within-laboratory median CV values of 1.17%~4.09% ( Table 1).

Systematic and commutability-related biases
Median TSH concentrations of the NCCL EQA materials obtained from 2016 to 2019 and measured using different assays were listed in Table 2. The means of systematic and commutability-related biases for two lot EQA materials (201811 and 201812) were -13.84%~9.85% and -38.83%~18.65%, respectively. After correcting for systematic bias, the mean commutabilityrelated bias was between -42.26% and 12.19% (S4 and S5 Tables). The commutability of materials among different assays was shown in Fig 3. The NCCL EQA materials were commutable for three of the seven assay pairs (Fig 3A, 3B and 3D) with smaller commutability-related biases (-11.55%~-6.58%) after correcting for the systematic bias based on CLSI approach. However, these NCCL EQA materials were commutable for five of the seven assay pairs (Fig  4A, 4B, 4D, 4F and 4G) according to IFCC method.

Comparability of the analytical results for TSH
Herein, we analyzed patient samples using eight assays to evaluate the comparability of different routine clinical laboratory measurement procedures after correcting for the systematic bias. The medians and ranges of the measured TSH concentrations and measurement imprecisions for the patient samples were shown in Table 1. Among these eight assays, compared with the ADVIA CentaurXP assay, the Cobas 601 assay displayed the greatest measurements, with a median concentration of 6.193 μIU/mL and slope (indicating interassay relatedness) of 1.299. In contrast, the Architect i2000sr assay showed lowest measurement results, with a median concentration of 4.202 μIU/mL and a slope of 0.801 ( Table 1). Among the 29 samples analyzed, we detected a 1000-fold difference between the lowest and highest concentrations of TSH (e.g., from 0.07 to 82.625 μIU/mL using the ADVIA CentaurXP assay), and the Pearson correlation coefficients ranged from 0.990 to 0.999. The NCCL EQA and commutable EQA data revealed similar results, with the Cobas assay yielding higher median values (29.988 μIU/mL and 3.42 μIU/mL, respectively) and larger slopes (1.254 and 1.105, respectively), and the Architect assay yielding lower medians (17.035 μIU/mL and 2.36 μIU/mL, respectively) and smaller slopes (0.735 and 0.729, respectively) (Tables 2 and 3). An approximately 10-fold difference was observed between the lowest and highest concentrations of TSH in the two EQA programs (e.g., NCCL EQA: between 0.610 and 55.670 μIU/mL; commutable EQA: between 1.32 and 16.80 μIU/mL, using the ADVIA CentaurXP assay).
The scatter plots for the assay pairs illustrating the linearity and the distribution of the data were presented in

Discussion
TSH is currently the most sensitive and widely used marker of thyroid status in individuals with thyroid disease [4]. Nevertheless, although the functional sensitivity of TSH measurements has improved considerably with the introduction of "third"-generation systems, further studies are required to standardize these measurements [6,16,17]. Achievement of comparable results and standardization among different instrument platforms and measurement procedures has been anticipated by clinicians and laboratory personnel; however, these goals have yet to be realized over the mid-term, owing primarily to the lack of accepted reference measurement procedures for these measurements [17]. Furthermore, measurements were characterized by large systematic biases (-13.84%~9.85%). However, the correction of such systematic biases helped determine the actual relationships among immunoassays ( Table 1 and Fig 2B). Meanwhile, we identified similar relationships using NCCL EQA data after correcting for commutability-related biases (-42.26%~12.19%) ( Table 2, S5 Table, and Fig 2D). For all data sources (a panel of individual serum samples, NCCL EQA data, or commutable EQA data), compared with the ADVIA CentaurXP assay, the Cobas 601 and Maglumi2000 plus assays yielded greater measurements and the Architect i2000sr assay provided lesser measurements (Fig 5), which tends to be consistent with certain previous findings [17,18].  Even though commutability is one of the most important properties of a Proficiency Testing (PT)/EQA sample, most commercial PT/EQA samples are non-commutable owing to modifications in their preparation [10,19]. The NCCL EQA materials used herein were commutable only in three or five of seven assay combinations (Figs 3 and 4) based on two different approaches. This relatively poor commutability in contrast with the findings of Clerico et al. [18], which could be explained on the basis of differences between the two studies with respect to the source of EQA materials and statistical methods used to assess commutability. Nevertheless, we believe that the merit of our study lies in the fact that we evaluated and corrected for commutability-related biases after correcting for systematic biases, which are common procedures used in routine laboratory practice. Accordingly, we observed similar relationships among immunoassays when using patient serum panel and pools and commercial EQA materials. In this study, we also described a method of harmonization using non-commutable EQA materials, which is important for EQA organization. Although we could not demonstrate the commutability of human serum pools (commutable EQA), some studies have provided evidence regarding the commutability of these materials through the same preparation technique [12,14,19,20].
Nonetheless, this study has some notable limitations. First, we used EQA materials with only two concentrations of TSH to evaluate the systematic and commutability-related biases. However, for practical reasons, the evaluation of commutability of all lots of PT/EQA materials, which were obtained using the same preparation procedure, was unnecessary and not feasible [5]. Furthermore, systematic biases tended towards zero (S5 Table) and the relationships among immunoassays for NCCL EQA material were similar to those for patient samples (Fig  5), thereby providing evidence of the feasibility of this method. Second, owing to the insufficient volume of a sample obtained from one patient, we used only 29 samples to evaluate the comparability of TSH assays; However, using the ADVIA CentaurXP assay, the concentration range was well characterized from 0.076 μIU/mL to 82.625 μIU/mL. Third, we assessed only eight TSH immunoassays in this study; hence, future studies are required to verify the reliability of this method with a larger number of platforms.
In conclusion, this study shows systematic differences among the TSH immunoassay methods that were most widely used in China. Our data are potentially applicable to clinicians and experts in laboratory medicine to better compare and more correctly interpret patient results. The same relationships can also be clarified from EQA data, but only if the commutabilityrelated biases are evaluated and corrected, which will make a valuable contribution to the harmonization and standardization of TSH measurements.
Supporting information S1   Tables 2 and 3