Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparison of thyroid hormones reference intervals based on thyroid antibody levels: A multicenter study

Abstract

Objectives

This study aimed to determine reference intervals (RI) for thyroid hormones based on thyroid antibody levels using different autoanalyzer kits.

Methods

RI for Thyroid-stimulating hormone (TSH), free thyroxine (fT4) and free triiodothyronine (fT3) were determined according to thyroid antibody levels and independently of thyroid antibody levels using the R statistical program and RefineR algorithm.

Results

Significant differences in RIs were found between antibody-positive (Ab(+)) and antibody-negative (Ab(–)) individuals. TSH RI varied most notably in Abbott and Siemens analyzers. In females, Abbott showed higher TSH RIs in the Ab(+) group (0.41–7.44 mU/L) than in Ab(–) (0.24–3.50 mU/L). In males, Roche and Beckman exhibited the greatest differences (Roche Ab(+): 0.19–5.77; Ab(–): 0.44–3.63; Beckman Ab(+): 0.12–5.23; Ab(–): 0.39–3.96 mU/L). For fT4, Roche showed increased RIs in females with Ab(+) status (11.42–20.42 vs. 10.34–19.35 pmol/L). In males, Beckman and Siemens autoanalyzers also indicated notable differences.

Conclusion

Antibody status significantly affects thyroid hormone RI, particularly for TSH. These findings highlight the need for antibody-specific RI and further standardization in establishing reference intervals.

Introduction

The thyroid gland is a key endocrine organ that synthesizes essential hormones such as triiodothyronine (T3) and thyroxine (T4), playing a crucial role in regulating metabolism. These hormones influence various biological processes, including lipid metabolism, cardiovascular function, bone health, and cognitive performance, necessitating an accurate assessment of thyroid function [1,2]. Thyroid-stimulating hormone (TSH), secreted by the pituitary gland, regulates thyroid hormone production and is considered one of the most important biomarkers of thyroid function. In addition to TSH, T3, T4, thyroglobulin antibodies (Tg-Ab), and thyroid peroxidase antibodies (TPO-Ab) are essential screening tests for evaluating thyroid function [3,4].

Thyroid disorders are among the most prevalent endocrine disorders worldwide, with their incidence increasing with age. Various physiological and pathological factors can influence thyroid hormone levels, including age, sex, pregnancy, acute or chronic illnesses, dietary iodine content, seasonal variations, and medication use [57].

Thyroid disorders are often associated with autoimmune diseases such as Hashimoto’s thyroiditis and Graves’ disease, which are characterized by the presence of TPO-Ab and Tg-Ab autoantibodies [8]. Although thyroid antibody positivity (Ab+) is an indicator of autoimmune thyroid disease, individuals with positive antibodies are not directly classified as patients nor subjected to immediate treatment [9]. Currently, treatment decisions are primarily based on TSH levels rather than antibody status [8].

Autoimmune thyroid diseases develop gradually over time. In the early stages, only antibody positivity is detected, whereas in later phases, thyroid function deteriorates, leading to significant changes in TSH levels [10,11]. In this process, the accuracy of RI used for thyroid function assessment becomes crucial. However, current RIs are often established without considering the presence of thyroid antibodies (Ab), which serve as early biochemical markers of thyroid autoimmunity. Most laboratories use refeence intervals for thyroid function evaluation without accounting for antibody status. Yet, antibody levels, as early indicators of autoimmune processes, have a significant impact on thyroid function.

In the current literature, it is observed that the RI for thyroid function is often determined without considering antibody status. Many studies establish these intervals by excluding antibody-negative individuals. To better represent a healthy population, we propose that reference intervals should be defined based on antibody-negative (Ab−) individuals. Antibody negativity (Ab−) is a critical criterion, and including these individuals in analyses will provide more accurate and clinically meaningful reference ranges. In this context, a healthy population should be defined as those who are antibody-negative. When determining reference intervals, TSH levels in antibody-negative (Ab−) healthy individuals should be measured and used as the reference range to enhance the accuracy of thyroid function assessments.

In our study, we aim to emphasize the importance of defining reference intervals for thyroid function tests based on thyroid antibody status. Our hypothesis is that the reference intervals for antibody-positive (Ab+) individuals differ from those for antibody-negative (Ab−) individuals, and this difference may have significant implications for clinical diagnosis and treatment. We argue that thyroid function test reference intervals should be re-evaluated considering thyroid antibody status to ensure accurate interpretation of test results.

Moreover, reference intervals for thyroid function tests may vary not only based on antibody status but also due to demographic factors (such as ethnicity, age, and sex), body mass index, specific medication use, iodine status, and methodological differences [1114,15,16]. Organizations such as the American Thyroid Association (ATA), the International Federation of Clinical Chemistry (IFCC), and the Clinical and Laboratory Standards Institute (CLSI) recommend that each laboratory establish population-specific reference intervals [12,17,18]. However, many laboratories and hospitals continue to use manufacturer-provided reference intervals, which are typically based on American and European populations. Previous studies have demonstrated significant ethnic differences in thyroid hormone reference intervals, emphasizing the need for population-specific ranges [11,15]. Therefore, RIs used for evaluating thyroid function should be tailored to the characteristics of the studied population [19,20].

In this study, we aimed to establish reference intervals for thyroid hormones in the Turkish population aged 18–50 using four different autoanalyzers (Architect i2000sr, Atellica IM, Modular E170, and DxI 800 Unicel). We evaluated the impact of thyroid antibody levels on these reference intervals and analyzed measurement differences between autoanalyzers. Our findings highlight the importance of considering antibody status and demographic factors when defining reference intervals for thyroid function tests.

Materials and methods

Study participants

This study was conducted as a retrospective analysis based on data obtained from laboratory records over a long period 01/01/2016 and 31/12/2023. Data were accessed for research purposes on 01/02/2024. The thyroid function test results collected from different centers were analyzed using data obtained from the laboratory records of these hospitals. The study was carried out as a multicenter study across different regions of Turkey, and the included centers represent the country’s general demographic structure, including cosmopolitan cities such as Ankara and Istanbul. Thyroid hormone and thyroid antibody levels were measured using four different autoanalyzers: the Architect i2000sr (Abbott Laboratories, Abbott Park, Illinois, U.S.A), Atellica IM (Siemens Diagnostics, Tarrytown, NY), Modular E170 Analyzer (Roche Diagnostics, Germany), and DxI 800 Unicel (Beckman Coulter, USA). A total of 44,671 individuals, obtained from measurements performed in different laboratories, were included in this study. Of these individuals, 8,818 were male and 35,851 were female.

Laboratory differences and measurement methods

In this study, thyroid parameters were measured using various autoanalyzers and immunoassay methods across different medical centers. Depending on the hospital, some Acıbadem hospitals used Roche analyzers, while others used Siemens analyzers. At Acıbadem Labmed Clinical Laboratory, thyroid parameters were measured using the Siemens Healthineers Atellica IM Analyzer with the chemiluminescent immunoassay (CLIA) method. Data from a total of 22,918 individuals (4,611 males and 18,307 females) collected between 2013 and 2023 were included in the analysis. During the same period, measurements were also performed at the same laboratory using the Roche Diagnostics Modular E170 Analyzer with the electrochemiluminescent immunoassay (ECLIA) method, and data from 10,334 individuals (1,915 males and 8,417 females) were included in the study. Furthermore, thyroid parameters were measured using the Architect i2000sr Analyzer with the CLIA method at Ankara Keçiören Training and Research Hospital. A total of 1,591 individuals (303 males and 1,288 females) participated in this group. In addition, at Ankara Hacettepe University Faculty of Medicine Hospital, thyroid parameters were measured using the Beckman Coulter Access DxI 800 Unicel device with the CLIA method, including 1,989 males and 7,839 females, totaling 9,828 participants.

Study design

In this study, cut off for thyroid antibody levels were determined based on the values provided in the kit inserts of four different autoanalyzers. The autoanalyzers and their manufacturer-reported reference values are as follows:

Abbott; TG: 4.11 U/mL, TPO: 5.61 U/mL,

Beckman; TG: < 1 IU/mL, TPO: < 10 IU/mL

Roche; TG: 115 IU/mL, TPO: 34 IU/mL

Siemens; TG: > 4.5 IU/mL, TPO: > 60 U/mL

In Turkey, all laboratory reference intervals undergo a verification process before being implemented in clinical practice. Therefore, the manufacturer-provided RI were validated and used in this study. Thyroid antibodies were classified as positive or negative according to the cut-off values specified by the manufacturer.

Data selection and exclusion criteria

This multicenter study utilized routine laboratory data from adult individuals (aged 18–50 years) who underwent thyroid function testing, including serum TSH, fT₃, fT₄.

The dataset was initially screened to exclude individuals based on the following criteria:

  1. (i) missing or incomplete hormone data,
  2. (ii) documented diagnosis of cirrhosis, pulmonary failure, or chronic kidney disease,
  3. (iii) confirmed pregnancy status in female participants,
  4. (iv) patients with a documented diagnosis of thyroid disease based on hospital LIS–HIS records and/or those using medications known to affect thyroid function (e.g., levothyroxine, methimazole, propylthiouracil),
  5. (v) inpatients, including those from intensive care units, endocrinology, nephrology, and emergency departments (Fig 1).
thumbnail
Fig 1. Flow diagrams for participants examined using Abbott kits (upper left panel), Beckman kits (upper right panel), Roche kits (lower left panel), and Siemens kits (lower right panel).

https://doi.org/10.1371/journal.pone.0344197.g001

Subgroup stratification was performed according to gender (male/female) and thyroid antibody status. For antibody-negative status, individuals with both antibodies below assay cutoffs were included. For antibody-positive groups, at least one antibody needed to exceed the defined threshold.

Outliers were detected separately within each subgroup using the Tukey method and the proportion of removed observations did not exceed 5% per subgroup. Each analyzer platform (e.g., Siemens, Abbott, Roche, Beckman) was processed and analyzed independently.

Ethical principles

This study was conducted retrospectively using medical records and laboratory data collected from participating hospitals and laboratories. All data were fully anonymized before analysis. This study was approved by the Acıbadem Mehmet Ali Aydınlar University Medical Research Evaluation Board (ATADEK) with decision number 2023/07 dated April 28, 2023.

Statistical analysis

After outlier exclusion, the distribution of each hormone within subgroups was assessed using the Shapiro–Wilk test. Since not all subgroups satisfied the assumption of normality (p < 0.05), group comparisons were consistently performed using the non-parametric Mann–Whitney U test for TSH, fT₃, and fT₄.

Reference intervals (RIs) for thyroid hormones were estimated using the refineR algorithm. RefineR is provided as an open-source R package (https://CRAN.R-project.org/package=refineR) and has been described and evaluated in detail in previous studies [21]. The method applies a Box–Cox transformation to approximate normality, followed by iterative exclusion of outliers and density-based modeling to isolate the central distribution. The final reference limits are obtained by determining the central 95% interval of this modeled distribution and back-transforming the results to the original scale.

The refineR algorithm was applied separately to each defined subgroup using 250 bootstrap replicates. For each subgroup, the lower and upper limits of the reference interval were determined along with the 90% confidence intervals of those limits. A 90% confidence interval was selected to maintain narrower interval widths, improving the detection of subgroup differences, in line with bootstrap-based RI methodology. The choice of 250 replicates was based on the original refineR publication, which used this number as a practical balance between computational cost and statistical robustness [21].

To compare the reference intervals between subgroups (e.g., female Ab(−) vs. female Ab(+)), the bias ratio (BR) was calculated for both lower and upper limits. The bias ratio is a standardized measure used to quantify the difference between two reference interval limits by expressing the difference relative to the width of a reference interval. It is particularly useful for evaluating whether observed differences between subgroups are statistically and clinically meaningful. The bias ratio was calculated using the following formula:

Where LL and UL represent the lower and upper limits of the subgroup being tested, and LL₀ and UL₀ refer to the corresponding limits in the reference group. The denominator, corresponds to the standard deviation across the reference interval. A |BR| value ≥ 0.375 was considered indicative of a statistically significant difference, as recommended by the IFCC C-RIDL and applied in previous validation studies [22]. All statistical analyses were performed using R version 4.4.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

The lower and upper limit RI and (CI 95%) determined according to thyroid antibody levels in the four autoanalyzers are summarized in Tables 1–3 for serum TSH, fT4, and fT3, respectively.

thumbnail
Table 1. Reference and CI 95% determined for TSH depending on and independent of antibody levels.

https://doi.org/10.1371/journal.pone.0344197.t001

thumbnail
Table 2. Reference and CI 95% determined for fT4 depending on and independent of antibody levels.

https://doi.org/10.1371/journal.pone.0344197.t002

thumbnail
Table 3. Reference and CI 95% determined for fT3 depending on and independent of antibody levels.

https://doi.org/10.1371/journal.pone.0344197.t003

In addition, Figs 13 show comparison graphs of the thyroid hormone RI and CI values calculated separately for individuals categorized according to thyroid Ab(+) and Ab(-) group for serum TSH, fT4, and fT3, respectively.

thumbnail
Fig 2. Comparative Reference and CI 95% for TSH in Different Autoanalyzers.

*The yellow background indicates the reference intervals recommended by each manufacturer.

https://doi.org/10.1371/journal.pone.0344197.g002

thumbnail
Fig 3. Comparative Reference İntervals and CI 95% for fT4 in Different Autoanalyzers.

*The yellow background indicates the reference intervals recommended by each manufacturer.

https://doi.org/10.1371/journal.pone.0344197.g003

The statistical differences in the LL and UL RI limts of thyroid hormones, stratified by antibody status, were evaluated using the BR method.

Serum TSH reference intervals according to age, gender, and antibody groups

TSH(Abbott) ın males, the RIs determined by antibody status (M− and M+) showed statistically significant differences in both the lower limit (LL: S) and upper limit (UL: S). In females, no significant difference was observed in the LL (LL: NS), but the UL differed significantly (UL: S) between antibody-negative (F−) and antibody-positive (F+) groups.

TSH(Beckman) ın males, the RI comparison based on antibody levels (M − vs. M+) revealed no significant difference in the LL (LL: NS), while the UL showed a considerable difference (UL: S). Similarly, in females, the LL did not differ significantly (LL: NS), but the UL exhibited a significant variation (UL: S) between F− and F+ groups.

TSH(Roche) ın males, significant differences were observed in both the LL and UL (LL: S, UL: S) when comparing antibody-negative (M−) and antibody-positive (M+) subgroups. In females, while the LL showed no significant difference (LL: NS), the UL was significantly different (UL: S) between F− and F+ groups.

TSH(Siemens) ın males, the RI analysis demonstrated no significant difference in the LL (LL: NS) but a significant difference in the UL (UL: S) between M− and M+ groups. In females, the LL was not significantly different (LL: NS), whereas the UL showed a significant variation (UL: S) between F− and F+ individuals (Table 1, Fig 2).

Serum fT4 reference intervals according to age, gender, and antibody groups

fT4(Abbott) ın both gender (M − vs. M+ and F − vs. F+), the RIs showed no significant difference in the LL (LL: NS) but a significant difference in the UL (UL: S).

fT4(Beckman) ın both gender (M − vs. M+ and F − vs. F+), the RIs showed no significant difference in the LL (LL: NS) but a significant difference in the UL (UL: S).

fT4(Roche) ın males, significant differences were observed in both LL and UL (LL: S, UL: S). In females, the LL differed significantly (LL: S), but the UL showed no significant difference (UL: NS).

fT4(Siemens) no significant differences were detected in either the LL or UL (LL: NS, UL: NS) for any subgroup (M − /M + , F − /F+) (Table 2, Fig 3).

Serum fT3 reference intervals according to age, gender, and antibody groups

fT3(Abbott) ın males, significant differences were observed in the lower limit (LL: S) of the RI between antibody-negative (M-) and antibody-positive (M+) groups, while the LL (UL) showed no significant difference (UL: NS). In females, both LL and UL demonstrated statistically significant differences (LL: S, UL: S) between F- and F+ groups.

fT3 (Beckman) ın males, significant differences were observed in the lower limit (LL: S) of the RI between antibody-negative (M-) and antibody-positive (M+) groups, while the upper limit (UL) showed no significant difference (UL: NS). Among female subjects, both LL and UL demonstrated statistically significant differences (LL: S, UL: S) between F- and F+ groups.

fT3 (Roche) ın males, showed no significant differences in either RI limit (LL: NS, UL: NS) when comparing antibody status groups. Similarly in females, significant variation was observed only in the LL (LL: S), with no significant difference in the UL (UL: NS).

fT3 (Siemens) ın males, significant differences were detected in the LL (LL: S) but not in the UL (UL: NS) of the RI. In females, demonstrated no significant differences in either RI limit (LL: NS, UL: NS) between antibody status groups (Table 3, Fig 4)

thumbnail
Fig 4. Comparative Reference Intervals and CI 95% for fT3 in Different Autoanalyzers.

*The yellow background indicates the reference intervals recommended by each manufacturer.

https://doi.org/10.1371/journal.pone.0344197.g004

Serum TSH reference intervals independent of antibody levels

The TSH, fT4, and fT3 values calculated independently of thyroid antibody levels in females and males are presented in Tables 1, 2, and 3, respectively. Additionally, comparison graphs showing the RI and CI values for thyroid hormones, calculated independently of thyroid antibody levels, are presented for TSH, fT4, and fT3 in Figs 2, 3, and 4, respectively.

The statistical differences between LL and UL RI limits for thyroid hormones, determined independently of antibody status, were evaluated using the BR method.

Serum TSH reference intervals independent of antibody levels

When comparing gender-specific RI independent of thyroid antibody status (M + ,- F + ,-), significant differences were observed in the UL(UL: S) but not in the LL (LL: NS) for TSH(Abbot, Beckman, Roche). However, TSH(Siemens) showed with no significant gender-based differences in either limit (Table 1, Fig 2).

Serum fT4 reference intervals independent of antibody levels

fT4(Roche), when comparing gender-specific RI independent of thyroid antibody status (M + ,- F + ,-), statistically significant differences were observed in both the LL and UL (LL: S, UL: S).

In contrast, fT4(Abbott, Beckman) revealed no significant gender-based differences in either the LL or UL RI (LL: NS, UL: NS). fT4(Siemens)with no significant gender difference in the LL (LL: NS) but a statistically significant difference in the UL (UL: S) (Table 2, Fig 3).

Serum fT3 reference intervals independent of antibody levels

fT3(Beckman, Roche, Siemens) istatistically significant gender differences were observed in both the LL (LL: S) and UL (UL: S) of RIs when comparing males and females independent of thyroid antibody status (M + ,- F + ,-). In contrast, fT3(Abbott) showed no significant gender difference in the LL (LL: NS) but demonstrated a statistically significant difference in the UL (UL: S) (Table 3, Fig 4).

Dıscussıon

In this study, the influence of variables such as thyroid antibody status and gender on the RI of TSH, fT4 and fT3 hormones was systematically investigated in the Turkish population aged 18–50 years using four different autoanalyzers. Our findings demonstrate that, particularly for TSH, RI vary significantly depending on both thyroid antibody levels and gender. These results highlight the necessity of considering both antibody status and demographic variables when establishing RI for thyroid function tests, and clearly underscore the need for standardization across different analytical platforms.

Evaluation of thyroid hormone reference intervals based on thyroid antibody status and gender

In the current literature, most studies defining RI for thyroid hormones typically present uniform RI for the general population, often disregarding thyroid antibody status. However, previous studies have emphasized that elevated anti-TPO and anti-Tg levels are significant risk factors for the development of subclinical hypothyroidism [23]. Despite this, the impact of antibody status on RIs has not been adequately investigated to date. Our study addresses this critical gap in the literature by being one of the few multicenter studies to establish antibody-specific RI for thyroid hormones across four different autoanalyzers.

The accurate determination of RI plays a critical role in the clinical interpretation of thyroid function tests. Our most striking finding was the consistent and statistically significant difference in TSH RI across analyzers according to antibody status. For instance, our study demonstrated distinct RI for males when stratified by antibody levels on both TSH ((Abbott, Roche) autoanalyzers. Furthermore, TSH RI determined by thyroid antibody levels is higher in females than in males (e.g., Roche analyzer: Ab- females 0.42–4.68 mU/L vs. males 0.44–3.63 mU/L). This finding further highlights the role of gender hormones in the regulation of TSH.

One of the major strengths of our study is its multicenter design, conducted across different regions of Turkey. The inclusion of metropolitan cities such as Ankara and Istanbul, which reflect the general demographic structure of the country, enhances the representativeness of our findings for the broader Turkish population. Notably, in regions with endemic goiter, TSH RI were significantly affected in both antibody-negative and antibody-positive individuals. This finding suggests that the standard RI recommended by current guidelines (e.g., 4.0–5.0 mU/L) may be inadequate for Ab+ populations.

Another important finding of our study is the observed variation in RI among different autoanalyzers, despite the use of a standardized algorithm (RefineR) for RI estimation across all platforms. The discrepancies between devices underscore the necessity of analytical harmonization. For instance, the Beckman and Abbott analyzers demonstrated greater sensitivity to antibody status in fT3 measurements, while the Roche analyzer showed the most prominent RI variation for fT4. Additionally, the Roche analyzer yielded the widest TSH interval for Ab+ males, whereas the Abbott analyzer provided the narrowest. These differences are likely due to variations in measurement methodologies, calibration protocols, and reagent compositions. From a methodological perspective, the consistent observation of this effect across all four analyzers (Abbott, Beckman, Roche, Siemens) supports the reliability of our findings. However, the observed inter-analyzer variability further emphasizes the critical need for laboratories to establish their own population-specific RI rather than relying solely on manufacturer-provided or generalized RI.

Our findings contribute to clinical practice from three distinct perspectives:

  1. (i) Antibody status should always be taken into account when interpreting thyroid function tests.
  2. (ii) Particularly in Ab+ female patients, the currently used reference intervals (RI) may be inadequate.
  3. (iii) In countries like Turkey, where endemic goiter is prevalent, laboratories should establish region- and analyzer-specific RI.

International organizations such as ATA and the IFCC have also recommended the inclusion of gender as a factor in the interpretation of thyroid function tests [12,17,18]. Our findings support these recommendations and further emphasize the importance of considering both antibody status and gender in clinical interpretation. In particular, broader TSH intervals observed among Ab+ individuals suggest that autoimmune processes may affect gender differently. Therefore, the use of a uniform RI may result in misdiagnoses and unnecessary interventions.

In conclusion, this study highlights the necessity of considering analyzer-specific differences, thyroid antibody status, and gender when establishing RI for thyroid function tests. Even when using the same algorithm (RefineR), RI values varied significantly across different analyzers, indicating that such differences may have clinical relevance beyond mere analytical variability. This finding underscores the importance of inter-analyzer variation not only from an analytical perspective but also in clinical interpretation.

The need for harmonization and standardization across analytical platforms has long been emphasized in the literature [2429]. Our study supports the notion that this need must also be addressed during RI determination processes. Furthermore, the consistently elevated TSH levels observed in Ab+ individuals indicate that incorporating antibody status into RI determination may enhance diagnostic accuracy. Ignoring these differences, especially in patients with borderline hormone levels, could potentially lead to misdiagnosis or inappropriate treatment decisions.

Overall, our findings suggest that developing personalized RI accounting for analyzer-specific characteristics as well as individual variables such as gender and antibody status may significantly improve the diagnostic reliability of thyroid function testing.

Evaluation of thyroid hormone reference ıntervals: comparison with manufacturer data and literature ındependent of thyroid antibody status

In the second phase of this study, RI were established separately for males and females without considering thyroid antibody status. In clinical practice, physicians often rely on manufacturer-recommended RI, which serve as the basis for diagnosis and treatment decisions. However, it has long been debated to what extent these reference values reflect the characteristics of local populations [30]. In this section of the discussion, the RI established without taking antibody status into account were evaluated in comparison with both population-based studies in the literature and the reference values provided by manufacturers, which are often based on American or European populations.

The findings demonstrated that gender significantly influences thyroid hormone levels. Notably, higher upper limits of TSH were observed in females, underscoring the importance of using gender-specific RIs in the diagnosis of subclinical hypothyroidism. In alignment with literature reporting greater variability in thyroid hormone levels among females [31,32] this study also showed clear effects of gender on reference intervals. Similarly, the variability of fT4 and fT3 levels across different analyzers highlights the necessity of reporting gender-specific and locally derived reference values in laboratory reports. These findings support the recommendations of international organizations such as ATA and IFCC regarding the use of population-specific RI in thyroid testing [12,17,18].

Gender-based differences also varied according to the autoanalyzer used. Our results revealed significant gender-specific differences in TSH RI across all analyzers. For TSH (Abbott, Beckman, Siemens, Roche) wider RI were found in females compared to males. For fT4(Abbott), no significant gender difference was observed, while in Roche and Beckman analyzers, males exhibited broader RI. For fT3(Abbott, Beckman,Roche, Siemens) males generally showed wider intervals. These results emphasize the importance of considering both analyzer-specific and gender-related differences to improve diagnostic accuracy in thyroid function testing.

Another important finding was the inconsistency between RI calculated without considering antibody status and those provided by manufacturers based on American/European populations. This discrepancy may be attributed to factors such as iodine intake, genetic variability, and ethnic differences. For example, the UL of TSH calculated for females using the Abbott analyzer (7.44 mIU/L) was considerably higher than the manufacturer’s recommended value (4.94 mIU/L). This difference suggests that individuals who might be diagnosed with hypothyroidism in clinical practice could in fact be within physiological limits, raising concerns about overdiagnosis.

In conclusion, our study demonstrates that manufacturer-provided RI often derived from American and European populations may fail to adequately reflect local biological variability. Furthermore, our findings are consistent with previous literature highlighting population-specific and gender-related differences in thyroid hormone RI [12,20,21,33].These results suggest that laboratories should consider developing their own locally derived reference intervals rather than relying solely on manufacturer data, thereby ensuring more accurate and reliable clinical decision-making.

In this section of the discussion, the RI determined independently of antibody levels were compared with existing population-based studies in the literature.

Architect i2000sr (Abbott Laboratories, Abbott Park, Illinois, ABD)

In studies conducted by Motor et al. (2010) and Örkmez et al. (2023), narrower TSH RIs were reported for females using the Architect i2000sr (Abbott Laboratories, Abbott Park, Illinois, USA) analyzer [34,35]. However, in our study, we identified wider TSH RI for females. These discrepancies may stem from differences in sample size, age distribution, and methodological approaches. Although the manufacturer’s recommended ranges are generally broader, we observed narrower RI for fT4 and fT3 in our study, and gender-specific differences for TSH were also evident when using the Abbott autoanalyzer.

Studies using the same autoanalyzer in different populations have reported significant variations in the lower and upper limits of thyroid hormone RI [33,36,37]. Therefore, the discrepancies observed between our findings and those in the literature highlight the necessity for each country to establish its own population-specific RI.

Modular E170 Analizör (Roche Diagnostics, Almanya)

Gender-specific differences in TSH RI observed in our study contradict the findings reported by Yıldız et al. (2022) [38]. However, another study emphasized the diagnostic relevance of gender-specific RI, thus supporting our results [12]. Regional studies conducted in China and Korea validated our established fT4 RI for both males and females. However, these studies reported wider TSH RI compared to our findings [14,39]. These results illustrate the critical influence of demographic and methodological factors in determining RI. Furthermore, the RI established in our study differed from those recommended by the manufacturer.

Lewis et al. reported no clinically meaningful differences in TSH measurements between Roche Cobas and Siemens Atellica platforms and therefore supported the use of harmonized TSH reference intervals [40]. In contrast, a substantial positive inter-analyzer bias was observed for fT4, precluding the recommendation of harmonized reference intervals for this analyte.

Our findings are consistent with recent large-scale studies employing similar methodological approaches. In the study by Bohn et al., reference intervals for thyroid hormones were derived using the refineR method across multiple autoanalyzers and large population datasets, demonstrating methodological similarities to our approach [41]. However, an important distinction is that thyroid autoantibody status was not considered as a stratification variable during reference interval derivation in that study. Their findings demonstrated that due to significant inter-analyzer differences, harmonized RI were not recommended for free thyroxine (fT4), whereas no statistically significant differences were observed between analyzers for thyroid-stimulating hormone (TSH).

In contrast to these important studies, our findings reveal a more nuanced picture; RI for thyroid function tests are influenced not only by the analytical platform but also significantly by thyroid antibody status and demographic factors. These results support the necessity for establishing analyzer-specific and autoantibody-stratified RI that are validated at the local population level, thereby providing a more precise and clinically relevant approach to thyroid function test interpretation.

Atellica IM (Siemens Diagnostics, Tarrytown, NY)

TSH and fT4 RI identified in our study were narrower than those reported by Kösoğlu et al. (2010) but consistent with the ranges reported by Enli et al. (2004) [42,43]. In contrast, our fT3 RI were broader than those reported in both studies. When compared to studies conducted in Polish and Serbian populations, discrepancies were observed, likely due to differences in indirect RI determination methods and demographic/ethnic variations [44,45]. These findings underscore the importance of using standardized methodologies and establishing population-specific RI.

Access DxI 800 Unicel (Beckman Coulter, ABD)

The LL of the TSH RI in our study were consistent with those reported by Çelebiler et al. (2010), while the upper limits were found to be broader [33]. Similarly, a study conducted in Italy reported comparable upper limits for TSH in females. In contrast, a study from the Pakistani population reported broader fT4 RI compared to our findings [46]. These discrepancies may be attributed to differences in sample size, age range, and demographic characteristics, all of which should be considered when interpreting results.

In our study, the RI for thyroid hormones determined using four different automated analyzers revealed significant discrepancies between autoanalyzer. These differences can primarily be attributed to the methodological characteristics of immunoassays, particularly the specificity and sensitivity of the antibodies used. The literature indicates that variations in assay methodologies, even among healthy populations, can influence the reference intervals and subsequently affect clinical interpretation [47,48]. For instance, in a study conducted by Barth et al. (2020) following the IFCC-CRIDL protocol, samples from healthy individuals without systemic disease were analyzed using different autoanalyzer, and significant methodological differences were identified between the autoanalyzer [49]. Similarly, in a more comprehensive study involving eight different immunoassay kits, although a high correlation was observed across kits (R > 0.99), the slopes varied between 0.75 and 1.06. This finding highlights that thyroid hormones, especially TSH, can vary significantly depending on the kit used, which may affect medical decision thresholds [50]. These results underscore that, even in healthy individuals, the choice of autoanalyzer or assay kit can influence RI and thereby affect clinical decision-making. Furthermore, the IFCC emphasizes that different autoanalyzer and methodologies used in immunoassay testing can have a significant impact on measurement results [51]. The IFCC working group on hormone assay harmonization highlights the global need for standardization to ensure comparability of hormone measurements such as TSH across laboratories and analytical platforms.

Lımıtatıons

This study has several limitations that should be acknowledged. Firstly, while the 90% CI of the RI determined for different groups using the Atellica IM autoanalyzer showed notable proximity, increasing the number of bootstrapping iterations to 800 did not yield significant differences in the results.

Secondly, the age range of 18–50 years was selected as the primary focus for determining reference intervals due to its widespread use in clinical practice and the characteristics of our current dataset. However, we recognize the need to expand our research to include individuals both under 18 years and over 50 years of age. Ongoing studies are being conducted to address this limitation and provide a more comprehensive understanding of thyroid function across all age groups.

Thirdly, our study population included a higher number of female participants compared to males. This reflects the higher prevalence of thyroid disease among female, which is a well-documented epidemiological feature. While this gender imbalance is representative of the clinical reality, it may have influenced the generalizability of our findings, particularly in male subgroups. Future studies with a more balanced gender distribution are needed to validate and refine these reference intervals further.

Fourthly, as this study was conducted over several years, reagent lot-to-lot variation and seasonal variability may have influenced test results. While the same analyzers were used across all centers, the use of different reagent lots is considered a potential limitation of the study.

Fıfthly, due to national regulations, reflex testing for the assessment of thyroid function could not be applied, and we consider this a limitation of the study.

Finally, iodine status was not directly measured in the patient population, and iodine measurements were not included as a variable in the analysis. Although iodine supplementation is provided through salt fortification and dietary sources in our country, iodine deficiency still exists in the population.

Conclusion

In this study, we showed that thyroid antibody status and gender significantly influenced thyroid hormone RI across four different autoanalyzers. These findings highlight the importance of considering both thyroid antibody status and gender when determining RI for thyroid function tests, particularly for TSH levels, in clinical practice. To date, most RI studies have been conducted without measuring thyroid antibody levels, which may lead to inaccurate or misleading RI. Therefore, we recommend that future prospective studies focusing specifically on TSH measure thyroid antibody levels and determine RI based on these measurements.

Additionally, our study revealed significant variability in thyroid hormone RI across the four autoanalyzers, even when using the same algorithm (RefineR). These differences emphasize the challenges in standardizing thyroid function tests and suggest that such variability may have important implications for clinical practice and patient management. Clinicians should be aware of potential discrepancies between autoanalyzers when interpreting thyroid function test results, as these differences could influence diagnostic accuracy and treatment decisions.

In conclusion, our findings demonstrate that thyroid antibody levels significantly influence RI for thyroid hormones, particularly TSH, and that these effects vary by gender and autoanalyzer type. These results highlight the critical need to consider antibody status and demographic factors when establishing RI for thyroid function tests. Moreover, the observed discrepancies between autoanalyzers underscore the urgent need for standardization in thyroid hormone testing. We recommend that future studies establish separate RI for antibody-positive (Ab+) and antibody-negative (Ab−) groups and address existing deficiencies in standardization efforts to improve the accuracy, reliability, and clinical utility of thyroid function assessments.

Supporting information

S1 File. Supplementary Tables and Data.

(TSH, fT4, fT3 bias data).

https://doi.org/10.1371/journal.pone.0344197.s001

(XLSX)

Acknowledgments

We thank all participants.

References

  1. 1. Nilsson M, Fagman H. Development of the thyroid gland. Development. 2017;144(12):2123–40. pmid:28634271
  2. 2. Yamada S, Horiguchi K, Akuzawa M, Sakamaki K, Yamada E, Ozawa A, et al. The Impact of Age- and Sex-Specific Reference Ranges for Serum Thyrotropin and Free Thyroxine on the Diagnosis of Subclinical Thyroid Dysfunction: A Multicenter Study from Japan. Thyroid. 2023;33(4):428–39. pmid:36772798
  3. 3. Li C, Zhou J, Huang Z, Pan X, Leung W, Chen L, et al. The Clinical Value and Variation of Antithyroid Antibodies during Pregnancy. Dis Markers. 2020;2020:8871951. pmid:33144894
  4. 4. Babić Leko M, Gunjača I, Pleić N, Zemunik T. Environmental Factors Affecting Thyroid-Stimulating Hormone and Thyroid Hormone Levels. Int J Mol Sci. 2021;22(12):6521. pmid:34204586
  5. 5. Dhatt GS, Griffin G, Agarwal MM. Thyroid hormone reference intervals in an ambulatory Arab population on the Abbott Architect i2000 immunoassay analyzer. Clin Chim Acta. 2006;364(1–2):226–9. pmid:16098499
  6. 6. Jonklaas J, Razvi S. Reference intervals in the diagnosis of thyroid dysfunction: treating patients not numbers. Lancet Diabetes Endocrinol. 2019;7(6):473–83. pmid:30797750
  7. 7. Jonklaas J, Bianco AC, Bauer AJ, Burman KD, Cappola AR, Celi FS, et al. Guidelines for the treatment of hypothyroidism: prepared by the american thyroid association task force on thyroid hormone replacement. Thyroid. 2014;24(12):1670–751. pmid:25266247
  8. 8. Caturegli P, De Remigis A, Rose NR. Hashimoto thyroiditis: clinical and diagnostic criteria. Autoimmun Rev. 2014;13(4–5):391–7. pmid:24434360
  9. 9. Vanderpump MPJ. The epidemiology of thyroid disease. Br Med Bull. 2011;99:39–51. pmid:21893493
  10. 10. McLeod DSA, Cooper DS. The incidence and prevalence of thyroid autoimmunity. Endocrine. 2012;42(2):252–65. pmid:22644837
  11. 11. Keestra S, Högqvist Tabor V, Alvergne A. Reinterpreting patterns of variation in human thyroid function: An evolutionary ecology perspective. Evol Med Public Health. 2020;9(1):93–112. pmid:34557302
  12. 12. Wang X, Li Y, Zhai X, Wang H, Zhang F, Gao X, et al. Reference Intervals for Serum Thyroid-Stimulating Hormone Based on a Recent Nationwide Cross-Sectional Study and Meta-Analysis. Front Endocrinol (Lausanne). 2021;12:660277. pmid:34140930
  13. 13. Park SY, Kim HI, Oh H-K, Kim TH, Jang HW, Chung JH, et al. Age- and gender-specific reference intervals of TSH and free T4 in an iodine-replete area: Data from Korean National Health and Nutrition Examination Survey IV (2013-2015). PLoS One. 2018;13(2):e0190738. pmid:29390008
  14. 14. Horowitz GL. Defining, establishing, and verifying reference intervals in the clinical laboratory; approved guideline. Clinical and Laboratory Standards Institute. 2010.
  15. 15. Hubl W, Schmieder J, Gladrow E, Demant T. Reference intervals for thyroid hormones on the architect analyser. Clin Chem Lab Med. 2002;40(2):165–6. pmid:11939490
  16. 16. Shokripour M, Imanieh MH, Garayemi S, Omidifar N, Shirazi Yeganeh B, Althabhawee F. Thyroid Stimulating Hormone, T3 and T4 Population-based Reference Range and Children Prevalence of Thyroid Dysfunction: First Report from South of Iran. Iran J Pathol. 2022;17(4):427–34. pmid:36532638
  17. 17. Evgina S, Ichihara K, Ruzhanskaya A, Skibo I, Vybornova N, Vasiliev A, et al. Establishing reference intervals for major biochemical analytes for the Russian population: a research conducted as a part of the IFCC global study on reference values. Clin Biochem. 2020;81:47–58. pmid:32278594
  18. 18. Lu Y, Zhang W-X, Li D-H, Wei L-H, Zhang Y-J, Shi F-N, et al. Thyroid Hormone Reference Intervals among Healthy Individuals In Lanzhou, China. Endocrinol Metab (Seoul). 2023;38(3):347–56. pmid:37312257
  19. 19. Xing D, Liu D, Li R, Zhou Q, Xu J. Factors influencing the reference interval of thyroid-stimulating hormone in healthy adults: A systematic review and meta-analysis. Clin Endocrinol (Oxf). 2021;95(3):378–89. pmid:33662155
  20. 20. Ammer T, Schützenmeister A, Rank CM, Doyle K. Estimation of Reference Intervals from Routine Data Using the refineR Algorithm-A Practical Guide. J Appl Lab Med. 2023;8(1):84–91. pmid:36610416
  21. 21. Ozarda Y, Ichihara K, Jones G, Streichert T, Ahmadian R, IFCC Committee on Reference Intervals and Decision Limits (C-RIDL). Comparison of reference intervals derived by direct and indirect methods based on compatible datasets obtained in Turkey. Clin Chim Acta. 2021;520:186–95. pmid:34081933
  22. 22. Fukui S, Ikeda Y, Kataoka Y, Yanaoka H, Tamaki H, Tsuda T, et al. Clinical significance of monitoring hypothyroidism in patients with autoimmune rheumatic disease: a retrospective cohort study. Sci Rep. 2021;11(1):13851. pmid:34226611
  23. 23. Cowper B, Lyle AN, Vesper HW, Van Uytfanghe K, Burns C. Standardisation and harmonisation of thyroid-stimulating hormone measurements: historical, current, and future perspectives. Clin Chem Lab Med. 2024;62(5):824–9. pmid:38295422
  24. 24. Thienpont LM, Van Uytfanghe K, Beastall G, Faix JD, Ieiri T, Miller WG, et al. Report of the IFCC Working Group for Standardization of Thyroid Function Tests; part 2: free thyroxine and free triiodothyronine. Clin Chem. 2010;56(6):912–20. pmid:20395623
  25. 25. Van Uytfanghe K, Ehrenkranz J, Halsall D, Hoff K, Loh TP, Spencer CA, et al. Thyroid Stimulating Hormone and Thyroid Hormones (Triiodothyronine and Thyroxine): An American Thyroid Association-Commissioned Review of Current Clinical and Laboratory Status. Thyroid. 2023;33(9):1013–28. pmid:37655789
  26. 26. Vesper HW, Myers GL, Miller WG. Current practices and challenges in the standardization and harmonization of clinical laboratory tests. Am J Clin Nutr. 2016;104 Suppl 3(Suppl 3):907S-12S. pmid:27534625
  27. 27. Faix JD, Miller WG. Progress in standardizing and harmonizing thyroid function tests. Am J Clin Nutr. 2016;104 Suppl 3(Suppl 3):913S-7S. pmid:27534642
  28. 28. Thienpont LM, Van Uytfanghe K, De Grande LAC, Reynders D, Das B, Faix JD, et al. Harmonization of Serum Thyroid-Stimulating Hormone Measurements Paves the Way for the Adoption of a More Uniform Reference Interval. Clin Chem. 2017;63(7):1248–60. pmid:28522444
  29. 29. Koerbin G, Sikaris KA, Jones GRD, Ryan J, Reed M, Tate J, et al. Evidence-based approach to harmonised reference intervals. Clin Chim Acta. 2014;432:99–107. pmid:24183842
  30. 30. Naous E, Hijazi R, Chalhoub M, El Ghorayeb N, Gannagé-Yared M-H. Reference intervals for thyroid-stimulating hormone in the adult Lebanese population: Results from the last decade. Heliyon. 2025;11(3):e42453. pmid:39995938
  31. 31. Chen X, Zheng X, Ding Z, Su Y, Wang S, Cui B, et al. Relationship of gender and age on thyroid hormone parameters in a large Chinese population. Arch Endocrinol Metab. 2020;64(1):52–8. pmid:31576967
  32. 32. Çelebïler A, Bïlgïlï S, Erkizan EÖ, Serïn H, Karaca MB. Thyroid hormone reference intervals and the prevalence of thyroid antibodies. Turkish Journal of Medical Sciences. 2010.
  33. 33. Sahillioğlu B, Motor S, Erden G, Erdoğan S, Yıldırımkaya MM. Direct versus indirect strategies for thyroid hormone reference intervals established in a middle-aged and elderly population on an immunoassay analyzer. J Clin Exp Invest. 2010;1(3).
  34. 34. Örkmez M, Tarakcıoglu M. Determination of Reference Intervals of Biochemistry Parameters in healthy individuals in Gaziantep Province. Eur J Ther. 2023;29(2):173–8.
  35. 35. Evgina S, Ichihara K, Ruzhanskaya A, Skibo I, Vybornova N, Vasiliev A, et al. Establishing reference intervals for major biochemical analytes for the Russian population: a research conducted as a part of the IFCC global study on reference values. Clinical Biochemistry. 2020;81:47–58.
  36. 36. Raverot V, Bonjour M, Abeillon du Payrat J, Perrin P, Roucher-Boulez F, Lasolle H, et al. Age- and Sex-Specific TSH Upper-Limit Reference Intervals in the General French Population: There Is a Need to Adjust Our Actual Practices. J Clin Med. 2020;9(3):792. pmid:32183257
  37. 37. Yildiz Z, Dağdelen LK. Reference intervals for thyroid disorders calculated by indirect method and comparison with reference change values. Biochem Med (Zagreb). 2023;33(1):010704. pmid:36627974
  38. 38. Xie T, Su M, Feng J, Pan X, Wang C, Tang T. The reference intervals for thyroid hormones: A four year investigation in Chinese population. Front Endocrinol (Lausanne). 2023;13:1046381. pmid:36686466
  39. 39. Köseoğlu M, Işleten F, Dursun S, Çuhadar S, Adresi Y. Yaş arası sağlıklı bireylerde referans aralıklarının saptanması. Turk J Biochem. 2010;35(3):215–24.
  40. 40. Lewis CW, Raizman JE, Higgins V, Gifford JL, Symonds C, Kline G, et al. Multidisciplinary approach to redefining thyroid hormone reference intervals with big data analysis. Clin Biochem. 2024;133–134:110835. pmid:39442856
  41. 41. Bohn MK, Bailey D, Balion C, Cembrowski G, Collier C, De Guire V, et al. Reference Interval Harmonization: Harnessing the Power of Big Data Analytics to Derive Common Reference Intervals across Populations and Testing Platforms. Clin Chem. 2023;69(9):991–1008. pmid:37478022
  42. 42. Enli Y. Denizli’de yaşayan 18-40 yaş arası bireylerde farklı yöntemlerle referans aralıkların saptanması. Turk J Biochem. 2001;28(4):228–45.
  43. 43. Płaczkowska S, Terpińska M, Piwowar A. Establishing laboratory-specific reference intervals for TSH and fT4 by use of the indirect Hoffman method. PLoS One. 2022;17(1):e0261715. pmid:34995316
  44. 44. Mirjanić-Azarić B, Milinković N, Bogavac-Stanojević N, Avram S, Stojaković-Jelisavac T, Stojanović D. Indirect estimation of reference intervals for thyroid parameters using advia centaur XP analyzer. J Med Biochem. 2022;41(2):238–45. pmid:35510197
  45. 45. Clerico A, Trenti T, Aloe R, Dittadi R, Rizzardi S, Migliardi M, et al. A multicenter study for the evaluation of the reference interval for TSH in Italy (ELAS TSH Italian Study). Clin Chem Lab Med. 2018;57(2):259–67. pmid:30016276
  46. 46. Abbas R, Abbas HG, Shahid A, Chand S, Nawaz S. Reference intervals for free T3 and free T4 in Pakistani euthyroid patients: effect of age and gender on thyroid function. J Coll Physicians Surg Pak. 2014;24(11):806–9. pmid:25404437
  47. 47. Serdar MA, Ozgurtas T, Ispir E, Kenar L, Senes M, Yücel D, et al. Comparison of relationships between FT4 and log TSH in Access DXI 800 Unicel, Modular E170 and ADVIA Centaur XP Analyzer. Clin Chem Lab Med. 2012;50(10):1849–52. pmid:23089718
  48. 48. Westbye AB, Aas FE, Dahl SR, Zykova SN, Kelp O, Dahll LK, et al. Large method differences for free thyroid hormone assays in the hyperthyroid range can affect assessment of hyperthyroid status: Comparison of Abbott Alinity to Roche Cobas, Siemens Centaur and equilibrium dialysis LC-MS/MS. Clin Biochem. 2023;121–122:110676. pmid:37848158
  49. 49. Barth JH, Luvai A, Jassam N, Mbagaya W, Kilpatrick ES, Narayanan D, et al. Comparison of method-related reference intervals for thyroid hormones: studies from a prospective reference population and a literature review. Ann Clin Biochem. 2018;55(1):107–12. pmid:28081637
  50. 50. Yoo WS. Clinical Implications of Different Thyroid-Stimulating Hormone (TSH) Reference Intervals between TSH Kits for the Management of Subclinical Hypothyroidism. Endocrinol Metab (Seoul). 2024;39(1):188–9. pmid:38311827
  51. 51. Thienpont LM, Van Uytfanghe K, De Grande LAC, Reynders D, Das B, Faix JD, et al. Harmonization of Serum Thyroid-Stimulating Hormone Measurements Paves the Way for the Adoption of a More Uniform Reference Interval. Clin Chem. 2017;63(7):1248–60. pmid:28522444