Figures
Abstract
This study aimed to verify the reference interval(s) (RIs) for routine biochemistry tests in adult Turkish population using the reflimR method and compare them with the manufacturer-provided RIs. The RIs of 19 routine biochemical parameters, analyzed using Beckman Coulter analyzers between February and October 2024, were evaluated using the reflimR algorithm. The RIs were estimated separately for females and males using five indirect approaches (reflimR, refineR, KOSMIC, Hoffmann, and Bhattacharya). A traffic light algorithm based on permissible uncertainty was used to interpret whether the RIs limits calculated with reflimR were within the tolerance limits. Using reflimR, 40 of 76 RI limits were accepted, 21 required checking, and 15 were rejected for RI verification compared with the manufacturer-provided values. The comparison of reflimR with other indirect methods generally produced concordant results, except for the alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT), and total bilirubin tests. The reflimR algorithm may offer a swift and accessible method for calculating and verifying RIs. Verification failures may arise from fundamental variations, including ethnicity, sex, age demographics, and geographic factors, between the manufacturer’s study results and our analyzed population.
Citation: Deniz L, Demirelce O (2026) Verification of reference intervals for routine biochemical tests using the reflimR in Turkish adults. PLoS One 21(2): e0342530. https://doi.org/10.1371/journal.pone.0342530
Editor: Apeksha Niraula, Tribhuvan University Institute of Medicine, NEPAL
Received: October 6, 2025; Accepted: January 25, 2026; Published: February 11, 2026
Copyright: © 2026 Deniz, Demirelce. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files. All data were fully anonymized prior to submission, and no identifiable personal information is included in the dataset.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Reference interval(s) (RIs) are one of the most common decision-support tools used to interpret numerical values in clinical laboratory reports. Laboratory results are interpreted by comparing them with these intervals; therefore, the quality of RIs can be as important as that of the results when making accurate clinical decisions [1]. The classical approach to determining RIs involves collecting samples from a reference population of at least 120 healthy individuals and calculating RIs as specified percentiles (e.g., 2.5th and 97.5th percentiles) using the direct method. Many RIs require appropriate stratification because they depend on variables such as ethnicity, sex, age, geographic region of residence, diet, and exercise. Therefore, laboratories must generate their own RIs or validate existing ones [2]. An alternative approach is the indirect approach, in which the results of routine samples are used to determine RIs. The indirect method has some important advantages over direct methods, such as being faster and less expensive. The disadvantages of indirect methods include the possible influence of unhealthy (diseased) subpopulations on the derived range [3]. In conclusion, any method used to determine the RIs between the distributions of healthy and diseased individuals must be applied with accuracy and high precision. Several indirect techniques have been developed and are routinely used for RI estimation. Important approaches that have been applied in the past include visual Hoffman [4] and Bhattacharya methods [5]. Recently, more sophisticated indirect methods based on open-source software algorithms have been developed, such as the truncated maximum likelihood (TML) method developed by Arzideh et al. [6] and the truncated minimum chi-square method (TMC) developed by Wosniok et al. [7], the kolmogorov-smirnov distance (KOSMIC) (an updated version of the TML technique) by Zierk et al. [8], and most recently, refineR by Ammer et al. [9].
An alternative R package called reflimR, developed by Hoffman et al., has been proposed as a tool that can provide results in a shorter time and perform more accurate and precise calculations for results below 1000. To achieve the intended purpose with this tool, an algorithm was added to the package, and it was possible to observe how well the estimated RIs matched the predefined limits used in the laboratory using traffic light colors. The reflimR algorithm facilitates RIs verification by employing the manufacturer’s or literature values’ specified tolerance limits [10].
This study aimed to verify the RIs for routine biochemistry tests in adult patients using the reflimR method and to compare them with the RIs transferred from the values provided by the manufacturer. Additionally, we investigated whether the RIs obtained using the reflimR method were consistent with those obtained using other indirect algorithms, including refineR, KOSMIC, Hoffmann, and Bhattacharya.
Materials and methods
Routine biochemistry test results were collected from the Biochemistry Department of Istanbul Training and Research Hospital via the laboratory information system (Alis, Ventura Software, Ankara, Türkiye) between 01/02/2024 and 31/10/2024. Data were accessed for research purposes on 01/02/2025. This study was conducted with the approval of the Clinical Research Ethics Committee of Istanbul Training and Research Hospital (date: 25.01.2025, number:17) and in accordance with the principles of the Declaration of Helsinki. Since this study was designed as a retrospective analysis based on medical records, all data were fully anonymized prior to access. The ethics committee waived the requirement for informed consent owing to the retrospective nature of the study.
Instruments and reagents
Serum thyroid-stimulating hormone (TSH), free triiodothyronine (FT3), and free thyroxine (FT4) levels were measured using a chemiluminescence method with a DxI 800 analyzer (Beckman Coulter Inc., Brea, CA, US). Serum glucose, total protein, albumin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), gamma-glutamyl transferase (GGT), lactate dehydrogenase (LDH), total bilirubin, calcium, magnesium, phosphorus, urea, creatinine, sodium, potassium, and chloride levels were measured using an AU5800 analyzer (Beckman Coulter Inc., Brea, CA, US). The detailed assay methods for the analytes are provided in Table 1. The instruments remained stable throughout the study period. The analyzer was maintained according to the manufacturer’s instructions, and its analytical stability was evaluated over time. Calibration and maintenance were performed using the original products provided by Beckman Coulter®. Internal quality control and external quality assessment tests consistently met the desired analytical performance targets. The manufacturer-provided RIs supplied by Beckman Coulter® were originally established on populations from North America and Western Europe. These RIs were inserted with local verification throughout the implementation of the analyzer.
Data cleaning
Data pre-processing was performed using Excel (Microsoft, Redmond, WA, US). The data obtained were filtered prior to analysis. During the study period, each analyte was extracted separately from the laboratory information system as an independent dataset. Therefore, the male and female sample sizes were analyte-specific and may not represent matched patient cohorts across different tests. Non-numeric or rejected sample results were excluded. For individuals with multiple laboratory test results, only the initial results were considered based on their identification numbers, assuming that the necessity of multiple tests indicated a higher likelihood of a pathological condition, and this approach was applied consistently across all analytes. Results from intensive care units, hematology, oncology, nephrology (hemodialysis), infectious diseases, gastroenterology, endocrinology, obesity, interventional radiology, nuclear medicine, and radiation oncology departments were excluded. Individuals aged <18 and >75 years were excluded from the study.
Statistical analysis
Statistical analyses were performed using R Studio (version 4.4.1) and SPSS (version 26; IBM Corp.). Bland-Altman analyses were performed using MedCalc® Statistical Software version 23.4.5 (MedCalc Software Ltd, Ostend, Belgium; https://www.medcalc.org; 2025). The normality of the data was evaluated using the Kolmogorov-Smirnov test. RIs were estimated separately for females and males using five indirect approaches (reflimR, refineR, KOSMIC, Hoffmann, and Bhattacharya). RefineR method was employed to estimate RIs using the web-based interface at https://kc.uol.de/rifindr/. The KOSMIC method was employed to estimate RIs using the web-based interface at https://KOSMIC.diz.uk-erlangen.de. Hoffmann analysis was conducted using this website (https://gocrunch.shinyapps.io/HoffApp/). Bhattacharya analysis was conducted using the website (https://gocrunch.shinyapps.io/BhattApp/). The lower and upper reference limits (LRLs and URLs) of the RIs were calculated as the 2.5th and 97.5th percentiles, respectively. The reference limits obtained using the reflimR algorithm were compared with those derived from other indirect methods by calculating absolute or percentage differences. These differences were then evaluated against the allowable total error limits defined by the Clinical Laboratory Improvement Amendments (CLIA) (https://westgard.com/clia-a-quality/quality-requirements/2024-clia-requirements.html). Because the CLIA performance goals for calcium, sodium, and potassium are specified in absolute units, the differences for these analytes were calculated and interpreted as absolute values. For the remaining analytes, the differences were expressed as percentages.
reflimR
The reflimR method [10], known as the “modified Hoffmann” approach, is valued because of its simplicity and low computational requirements. It builds on the original probability paper method by incorporating Bowley’s quartile skewness to determine whether a normal or lognormal distribution is appropriate, applying log transformation if necessary, truncating the central 95% of non-pathological values using the iBoxplot95 algorithm, and calculating the reference limits from a normal quantile-quantile plot. It can be downloaded from GitHub and installed in the R environment following the instructions provided on the website: https://github.com/SandraKla/reflimR_Shiny (accessed on 10/02/2025). The reflimR package includes several functions, with reflim(x) serving as the main function. It calls the other important functions that can be arranged in three groups: Group 1 includes the main functions that provide the user with the final results of reflim() and help to interpret them: ri_hist() creates a graphical output (histogram with RI and density curves), permissible_uncertainty() calculates the medical tolerance limits of the results, and interpretation() assesses the medical significance of deviations from given target values. Group 2 performs the three underlying statistical operations of modelling, truncation, and calculation, and group 3 contains auxiliary functions for miscellaneous tasks. The reflim function can be executed by simply calling reflim(x), where x is a vector of numeric data to be analyzed. As shown in Fig 1, the main result of the reflim function is an illustrative graphic showing a histogram of the original data with a dotted overall density curve. The estimated reference limits are displayed as dashed vertical lines and the respective medical tolerance limits as gray bars. The blue solid density curve represents the theoretical distribution of the assumed reference population, and the density curves of potential pathological outliers are shown in red. For each analyte, the RI values derived from the reflimR algorithm were compared with those obtained from the permissible uncertainty (tolerance limits) algorithm, values reported in the instructions for use (IFU) or the literature, and RIs generated using other data mining algorithms. The green bars indicate that the predicted target values were within the tolerance ranges determined by reflimR. Yellow bars signify that the target values are outside the tolerance range but still have some overlap, whereas red bars indicate that there is no overlap in the tolerance ranges. Accordingly, the outputs from the reflimR function are categorized as “within tolerance” (green), “slightly increased/decreased” (yellow), and “markedly increased/decreased” (red). In the reflimR package, 95% confidence intervals were calculated using the conf_int95 function, which estimates the interval through 100,000 Monte Carlo simulations [10]. As the manufacturer-provided RIs for ALT, AST, GGT, and LDH did not include LRLs, a comparison with the reflimR results was not feasible; therefore, reference limits from the literature were used [11].
The upper panel illustrates the albumin results for male participants and the lower panel presents the corresponding albumin results for female participants. The figure displays the estimated reference limits (dashed lines), medical tolerance limits (gray bars), theoretical reference distribution (blue curve), and pathological outlier distributions (red curves). Accordingly, reflimR outputs are classified as ‘within tolerance’ (green), ‘slightly increased/decreased’ (yellow), and ‘markedly increased/decreased’ (red).
Results
The detailed age distribution, sample size per analyte, and sex are provided in Table 2. The total number of patient measurements analyzed for all 19 biochemical parameters was 694,869. Across the 19 analytes and two sex-specific subgroups (38 subgroups in total), the median sample size was 15,600, with a range of 3,772–63,266 measurements. The total sample sizes across the 19 analytes showed a median of 26,355 measurements, with a range of 10,195–93,561. Among the sex-specific subgroups, the female sample sizes ranged from 6,181–63,266 (median = 16,715), whereas the male sample sizes ranged from 3,772–30,295 (median = 9,640). The study cohort reflects the broader adult demographic residing in Istanbul, an ethnically heterogeneous metropolitan area primarily composed of individuals of Turkish origin.
Table 2 presents the verification results of manufacturer-recommended RIs for 19 routine biochemical tests using the reflimR algorithm in women and men aged 18–75 years. In the male population, the URLs for the glucose, AST, total bilirubin, and urea tests and the LRLs for the albumin, potassium, and TSH tests markedly exceeded the established tolerance limits based on the reflimR algorithm. In women, the URLs for AST, LDH, total bilirubin, and urea tests and the LRLs for albumin, AST, potassium, and TSH also markedly exceeded the tolerance limits based on the reflimR algorithm. Fifteen of the 76 RI limits were rejected (“red: markedly increase/decreased”) for RI verification. In the male population, the URLs for ALT, LDH, and FT3, as well as the LRLs for glucose, total protein, LDH, calcium, urea, and chloride tests, were found to slightly exceed the established tolerance limits based on the reflimR algorithm. In women, the URLs for glucose, albumin, ALT, and TSH and the LRLs for total protein, GGT, LDH, total bilirubin, calcium, phosphorus, urea, and chloride slightly exceeded the tolerance limits based on the reflimR algorithm. Twenty-one of the 76 RI limits required further evaluation (“yellow = slightly increased/decreased”) for RI verification. Forty of the 76 RI limits were accepted (“green= within tolerance”) for RI verification. For both sexes, the URLs and LRLs for magnesium, creatinine, sodium, and FT4 were within tolerance limits (Table 2).
Table 3 presents the RIs calculated using the refineR, KOSMIC, Hoffmann, and Bhattacharya algorithms, as well as reflimR for sex-specific subgroups of 19 parameters. The percentage or absolute differences between the reference limits estimated by reflimR and those derived from other indirect methods are presented in Table 4. The URLs calculated using the reflimR algorithm were slightly higher for ALT (8.57%), AST (7.17%), total bilirubin (10.9%), and FT4 (5.50%) in females than those calculated using the refineR algorithm. Compared to the refineR algorithm, the LRL calculated by the reflimR algorithm was slightly lower for FT3 (−5.02%) in males and slightly higher for magnesium (4.29%) in females. The URL calculated using the reflimR algorithm for ALT (16.9%) in males was markedly higher than that calculated using the refineR algorithm. The URLs calculated using the reflimR algorithm were slightly higher for AST (7.10% in males, 8.28% in females), TSH (14.1% in males, 8.95% in females), FT4 (6.54% in males, 6.48% in females) in both sexes, ALT (16.2%), calcium (0.30 mg/dL), chloride (1.89%) in males, total bilirubin (10.9%), creatinine (5.62%) in females than those calculated using the KOSMIC algorithm. Compared with the KOSMIC algorithm, the LRL calculated by the reflimR algorithm was slightly lower for chloride (−2.00%) in males, whereas it was slightly higher for total bilirubin (13.0%) in females and FT4 (5.08%) in males. The URLs calculated using the reflimR algorithm were markedly higher for GGT (18.5%) and total bilirubin (20.7%) levels in males than those calculated using the KOSMIC algorithm. Compared with the Hoffmann algorithm, the LRL calculated by the reflimR algorithm was slightly higher for total bilirubin (10.7%) in males and magnesium (4.94%) in females. The URL calculated using the reflimR algorithm was slightly higher for total bilirubin (13.2%) in males than that calculated using the Hoffmann algorithm. Compared with the Bhattacharya method, the LRL for magnesium (6.83%) in males, phosphorus (6.23%) and magnesium (7.59%) in females calculated using the reflimR algorithm were found to be slightly higher. Compared with the Bhattacharya method, the LRL for FT3 (−5.02%) in males using the reflimR algorithm was found to be slightly lower (Tables 3 and 4). In the Bland–Altman analysis, the mean differences were −0.47% and 2.02% for the LRLs and URLs, respectively, in the comparison between reflimR and refineR, and −0.22% and 4.26% for the corresponding comparison between reflimR and KOSMIC (Fig 2). Similarly, the mean differences were 0.71% and −0.44% for the LRLs and URLs in the comparison between reflimR and the Hoffmann method, and 0.98% and −0.37% for the corresponding comparison between reflimR and the Bhattacharya method (Fig 3).
The solid horizontal line represents the mean difference, whereas the dashed lines indicate the 95% limits of agreement (mean ± 1.96 SD).
The solid horizontal line represents the mean difference, whereas the dashed lines indicate the 95% limits of agreement (mean ± 1.96 SD).
Discussion
In this study, we applied the reflimR algorithm, a state-of-the-art indirect method for the verification of RIs, to real-world data and compared it with other methods and IFU RIs. Although the CLSI C28-A3c guideline considers it sufficient for no more than two out of 20 healthy individuals to fall outside the proposed RI during verification [2], this approach may fail to identify RIs that are excessively wide. Therefore, the risk of accepting an excessively wide RI is not eliminated by this method, and the representativeness and reproducibility of this alternative approach are limited [12,13]. Unlike the guideline-based approach [2], reflimR applies a more stringent evaluation of the proposed RIs, thereby preventing the erroneous acceptance of inappropriate limits [10].
Using the reflimR method, the modified method was fully automated in R and allowed calculations to be performed within milliseconds. The significantly higher speed of the reflimR method compared to that of refineR is an advantage for routine laboratory applications. Fast results are important, particularly when confidence intervals are calculated using simulations or bootstrap techniques. The reflimR package incorporates a color scheme (traffic light colors) to illustrate how well the estimated RIs align with the predefined limits of the laboratory. These colors (green, yellow, and red) are not subjective because they are determined based on the permissible uncertainty (tolerance limits) of laboratory results [10].
For the majority of the LRLs and URLs (40/76) of the analytes evaluated, the RIs predicted by reflimR were similar to and consistent with the IFU RIs. For some analytes, slight or marked differences in RIs were observed. In our study population, the URLs for glucose, AST, total bilirubin, and urea tests in men and AST, LDH, total bilirubin, and urea tests in women significantly exceeded the tolerance limits determined using the IFU or literature limits. Additionally, the LRLs for albumin, AST, potassium, and TSH in women and albumin, potassium, and TSH in men significantly exceeded the tolerance limits determined using the IFU limits (Table 2). In the study by Hoffman et al., while URLs for ALT, AST, and creatinine in men and URLs for albumin and bilirubin tests in women significantly exceeded the tolerance limits, the tests that significantly exceeded the reflimR algorithm for LRLs were determined as AST, bilirubin, and creatinine in men and AST and creatinine tests in women [10]. This may be due to fundamental differences (e.g., ethnicity, sex and age distribution, time period, and measurement location) between the IFU studies and our population analyzed using the indirect method.
The RIs for biochemical tests performed using the AU480 analyzer in Ghanaian adults were calculated using a parametric method. The RIs calculated in our study for total protein (65−84 g/L), creatinine (male: 0.66–1.23 mg/dL, female: 0.45–0.93 mg/dL), sodium (136−143 mmol/L), and chloride (99−108 mmol/L) were highly consistent with the limits obtained in this study [14]. Although the LRLs (percentage difference: 11.7% in males and 11.1% in females) calculated for albumin in our study could not be verified when compared with the IFU values, similar LRLs were obtained in studies conducted in Ghana (2.89% in males and 2.37% in females) [14], Kenya (−2.25% in males and 2.37% in females) [15], and Russia (0.26% in males and −0.26% in females) [16] using the same manufacturer’s assay, demonstrating substantial agreement with our findings. For glucose, the URL (12.3% in males) calculated in our study was markedly higher than that reported by the manufacturer. Compared with the URL reported in the Ghanaian study [14], our URLs were also higher (11.2% in males and 4.67% in females). Notably, in Kenya [15] higher glucose URL for males (9.85%) were observed, particularly in individuals aged > 45 years. Compared with the URLs reported in the Russian study, our URLs demonstrated better agreement, especially in participants older than 45 years, with percentage differences of 1.71% for males and 0.00% for females [16]. As our study population had median and quartile values of 52 (38–62) for men and 49 (35–60) for women without age stratification, the observed elevation may be attributed to age-related variations. The URLs for total bilirubin calculated in our study differed markedly from the manufacturer’s claimed values, being higher in males (21.7%) and lower in females (−15.0%). Furthermore, studies conducted in Ghana (30.8% in males and 27.1% in females) [14] and Kenya (41.8% in males and 35.4% in females) [15] have reported markedly higher URL values than those found in our study. Consistent with our findings, these studies also demonstrated higher URLs in males compared than in females. The RIs for total bilirubin reported in the Kenyan study were nearly twice as high as those reported in studies conducted outside the African continent. This may be attributed to the higher prevalence of Gilbert syndrome, the most common genetic cause of asymptomatic unconjugated hyperbilirubinemia in Kenya [15]. A study conducted in Russia attributed this finding to the presence of Gilbert syndrome [16]. Total bilirubin levels exhibited sex-related differences, with males showing higher URLs. The prevalence of Gilbert syndrome in Europe is estimated to be approximately 3–6%, with a marked male predominance, reflected by a male-to-female ratio of 4:1 [17]. Genetic studies conducted in Türkiye have shown that Gilbert syndrome is largely associated with the A(TA)7TAA (UGT1A1*28) polymorphism in the promoter region of the UGT1A1 gene and that this variant is predominant in the population. Additionally, other rare UGT1A1 sequence alterations, such as c.1091C > T (p.Pro365Leu) and c.880_893delinsA, have been reported to affect bilirubin metabolism and contribute to this phenotype [18]. When the URLs were compared with previously published reference intervals from Türkiye, the percentage differences were 4.29% for men and –15.7% for women on the Roche platform [19] and 3.5% for men and 9.7% for women on the Abbott platform [11]. All URL percentage differences were within the allowable limits (20%) defined by the CLIA. For urea, the URLs (22.8% in males, 13.7% in females) calculated in our study was markedly higher than those reported by the manufacturer. The URLs for urea calculated in our study were markedly higher (55.3% in males, 43.8% in females) than those reported in a Ghanaian study [14]. In contrast, the urea RIs estimated in the Kenyan study [15] were lower than those reported in Türkiye [11] and Saudi Arabia [20]. Urea RIs estimated in Türkiye (−22.2%) [11] and Saudi Arabia (−20.6%) [20] were markedly lower than those found in our study for males. However, in females, the URLs reported for individuals aged over 50 years in the Turkish study (2.30%) [11] and over 45 years in the Russian study (3.17%) [16] showed better agreement with the URLs established in our study. These differences may be attributable to variations in protein-rich dietary intake. In a Chinese study using Beckman Coulter® AU5800, albumin, total bilirubin, ALT, and GGT levels were generally higher in males than in females, likely due to factors such as greater muscle mass, higher alcohol consumption, and increased obesity prevalence in men. These sex-related physiological and behavioral differences may account for the observed disparities in liver enzyme, bilirubin, and protein levels [21]. Although studies from Ghana [14] and Russia [16] were similar to those from the IFU in terms of the potassium test, the LRL values obtained in our study were higher than the limits (0.30 mmol/L in males, 0.32 mmol/L in females) reported in both countries [14,16]. When the LRLs were compared with previously published reference intervals from Türkiye, the absolute differences on the Roche platform were 0.10 mmol/L for men and 0.02 mmol/L for women [19], whereas on the Abbott platform they were 0.10 mmol/L for men and 0.12 mmol/L for women [11]. Notably, all absolute differences in LRLs remained within the allowable limit of 0.30 mmol/L. Although our LRL for chloride was slightly lower than the IFU value, our limits were identical to those reported (1.01% in males, 0.00% in females) in the Ghanaian [14] and Russian [16] studies. This study [11] yielded potassium and chloride LRL values that were higher and lower, respectively, than those of IFU, which is in agreement with our findings. For ALT, the value obtained for females was very close to that reported in the Kenyan [15] and Türkiye [19] (%1.33 for both studies), whereas the URL for males was lower than the Kenyan and Türkiye values (−8.55% for both studies). Regarding AST, the URLs derived in our study were consistent with those reported in studies conducted in Kenya (9.50% in males, 8.28% in females; acceptable limits: 15%) [15]. As the manufacturer-provided LDH RIs did not include LRLs, comparison with reflimR results was not possible and literature-based limits [11] were used instead. In females, the LDH URL was higher than those reported in the literature but showed good agreement (3.23% in males, 1.21% in females) with the IFU value.
Agaravatt et al. reported higher URLs for TSH using both the KOSMIC and refineR algorithms, whereas the Hoffman method detected lower URLs. The LRLs of TSH and URLs and LRLs for FT4 and FT3 showed a good correlation with IFU using the Hoffman, KOSMIC, and refineR methods [22]. In our study, while LRL levels were significantly higher in both sexes for TSH than for IFU, we detected slightly higher values for TSH URL only in women and slightly higher values for FT3 URL only in men. In another study [12], similar to our findings, both the directly and indirectly estimated LRL values for the TSH test on the Beckman Coulter® DxI analyzer were found to be higher than the manufacturer’s claimed LRL of 0.38. The study generally indicated that manufacturer-provided RIs for TSH, particularly those of Abbott®, Roche®, and Beckman Coulter®, exhibited inappropriate URLs. However, it was noted that the Beckman Coulter® analyzer showed different and broader limits for TSH compared to the manufacturer’s claims [12]. Although TSH LRL verification was unsuccessful, the values obtained in the Kenyan study were nearly identical to the limits we calculated for males, indicating that the LRL values were higher than those reported in the IFU. In the same study, the FT3 and FT4 limits were consistent with the IFU, which is similar to our findings [15]. However, in our study, the URL values for females were higher than those for males, which may be attributed to the higher prevalence of subclinical hypothyroidism among women [23].
The URL calculated using the reflimR algorithm for ALT (absolute difference: 7.30 U/L, percentage difference: 16.9%; acceptable limit: 15.0%) in males was markedly higher than that calculated using the refineR algorithm. The URLs calculated using the reflimR algorithm were markedly higher for GGT (absolute difference: 10.1 U/L, percentage difference: 18.5%; acceptable limit: 15.0%) and total bilirubin (absolute difference: 0.25 mg/dL, percentage difference: 20.7%; acceptable limit: 20.0%) levels in males than those calculated using the KOSMIC algorithm. According to the permissible uncertainty algorithm of reflimR, these parameters differed between the indirect methods (Table 3). Moreover, when evaluated against CLIA target values, they exceeded the allowable limits, indicating clinically meaningful differences (Table 4). This observation may be attributable to the underlying mathematical assumptions of the indirect algorithms as well as the data distribution (right-skewed) characteristics. Indirect methods utilize real-world data generated continuously throughout the patient care process, encompassing both non-pathological (physiological) and pathological test results. For different analytes, discrepancies between indirect methods may be observed as the pathological data size changes depending on the study population [9]. Reference limit estimation becomes the most challenging when physiological and abnormal test results overlap substantially, particularly when abnormal distributions are centered near the true reference limits, leading to increasing deviations, particularly when the proportion of abnormal values is ≥ 20%. In simulations and real-world data, it was found that the difference between the predicted reference limits and the actual values was more pronounced at the URLs than at the LRLs. In simulations performed using KOSMIC, it has been observed that the URLs calculated for GGT are estimated to be much lower than those for other analytes, such as hemoglobin and TSH [8]. Parameters such as liver enzymes (ALT and GGT) and total bilirubin inherently exhibit right-skewed distributions owing to their biological characteristics and show substantial overlap between healthy and pathological populations. In such challenging datasets, modern indirect methods such as refineR and KOSMIC employ complex statistical models to identify the “pure” physiological distribution; refineR relies on regularized maximum likelihood optimization combined with Box–Cox transformations [9], whereas KOSMIC aims to minimize the Kolmogorov–Smirnov (KS) distance between empirical and theoretical distributions following Box–Cox transformation [8]. In contrast, reflimR follows a different methodological approach, combining modified boxplot-based iterative truncation with a regression strategy that focuses on the central linear region of the normal Q–Q plot [10]. For GGT, the high number of slightly elevated results that may occur because of alcohol use, overweight participants, and medication use may introduce bias into the indirect methods used to separate the pathological fraction from the healthy distribution [24,25] and consequently may be excluded in an overly aggressive manner (over-truncation). This may cause the model to constrain the healthy population into a narrower range, thereby leading to underestimation of the URLs compared to reflimR [8,9]. Considering all methods, it would be more appropriate to determine the RIs using analyte-specific approaches, especially for ALT, GGT, and total bilirubin tests, according to our study. The use of personalized RIs with population-based RIs may be useful in clearly distinguishing pathological from non-pathological boundaries [26], and will need to be determined by the scientific community in future studies.
To the best of our knowledge, this is the first external validation of reflimR using a large routine dataset obtained from a Turkish adult population, and it provides a new perspective on its applicability in different demographic and epidemiological contexts. Second, by performing a systematic comparison of five widely used indirect algorithms (reflimR, refineR, KOSMIC, Hoffmann, and Bhattacharya), we enabled an evaluation of methodological concordance for reflimR.
This study has some limitations. Initially, this was a single-center study, which might lead to selection bias and could impact the generalizability of the RIs. Additionally, there may have been incomplete exclusion criteria that could have affected the outcomes. This study exclusively included individuals aged 18–75 years, thereby excluding both the pediatric population and those over 75 years of age. Additionally, the verification and calculation of RIs were performed without employing age partitioning.
Conclusion
The reflimR algorithm may be helpful for RI verification with intuitive color-coded result interpretation and minimal data requirements for estimates. Fifteen of the 76 RI limits were rejected for RI verification. This may be due to fundamental differences (e.g., ethnicity, sex and age distribution, time period, and measurement location) between the IFU studies and our population analyzed using the indirect method. Comparison of reflimR with other indirect methods generally produced concordant results, except for ALT, GGT, and total bilirubin tests. The differences in the RIs obtained from various indirect methods may be attributed to the data distribution and variations in the computational approaches of the algorithms.
Supporting information
S1 File. Study dataset.
Anonymized dataset used for all statistical analyses in this study.
https://doi.org/10.1371/journal.pone.0342530.s001
(XLSX)
References
- 1. Jones G, Barker A. Reference intervals. Clin Biochem Rev. 2008;29(Suppl 1):S93–7.
- 2.
CLSI EP28-A3C. Defining, establishing, and verifying reference intervals in the clinical laboratory, 3rd ed. Clinical & Laboratory Standards Institute; 2010.
- 3. Jones GRD, Haeckel R, Loh TP, Sikaris K, Streichert T, Katayev A, et al. Indirect methods for reference interval determination - review and recommendations. Clin Chem Lab Med. 2018;57(1):20–9. pmid:29672266
- 4. Hoffmann RG. Statistics in the Practice of Medicine. JAMA. 1963;185(11):864.
- 5. Bhattacharya CG. A simple method of resolution of a distribution into gaussian components. Biometrics. 1967;23(1):115–35. pmid:6050463
- 6.
Arzideh F Dissertation: estimation of medical reference limits by truncated Gaussian and truncated power normal distributions Vorgelegt im Fachbereich 3 (Mathematik und Informatik) der Universität Bremen; 2008.
- 7. Wosniok W, Haeckel R. A new indirect estimation of reference intervals: truncated minimum chi-square (TMC) approach. Clin Chem Lab Med. 2019;57(12):1933–47. pmid:31271548
- 8. Zierk J, Arzideh F, Kapsner LA, Prokosch H-U, Metzler M, Rauh M. Reference Interval Estimation from Mixed Distributions using Truncation Points and the Kolmogorov-Smirnov Distance (kosmic). Sci Rep. 2020;10(1):1704. pmid:32015476
- 9. Ammer T, Schützenmeister A, Prokosch H-U, Rauh M, Rank CM, Zierk J. refineR: A Novel Algorithm for Reference Interval Estimation from Real-World Data. Sci Rep. 2021;11(1):16023. pmid:34362961
- 10. Hoffmann G, Klawitter S, Trulson I, Adler J, Holdenrieder S, Klawonn F. A Novel Tool for the Rapid and Transparent Verification of Reference Intervals in Clinical Laboratories. J Clin Med. 2024;13(15):4397. pmid:39124664
- 11. Ozarda Y, Ichihara K, Aslan D, Aybek H, Ari Z, Taneli F, et al. A multicenter nationwide reference intervals study for common biochemical analytes in Turkey using Abbott analyzers. Clin Chem Lab Med. 2014;52(12):1823–33. pmid:25153598
- 12. Dirks NF, den Elzen WPJ, Hillebrand JJ, Jansen HI, Boekel ET, Brinkman J, et al. Should we depend on reference intervals from manufacturer package inserts? Comparing TSH and FT4 reference intervals from four manufacturers with results from modern indirect methods and the direct method. Clin Chem Lab Med. 2024;62(7):1352–61. pmid:38205847
- 13. Ozarda Y, Higgins V, Adeli K. Verification of reference intervals in routine clinical laboratories: practical challenges and recommendations. Clin Chem Lab Med. 2018;57(1):30–7. pmid:29729142
- 14. Bawua SA, Ichihara K, Keatley R, Arko-Mensah J, Ayeh-Kumi PF, Erasmus R, et al. Derivation of sex and age-specific reference intervals for clinical chemistry analytes in healthy Ghanaian adults. Clin Chem Lab Med. 2022;60(9):1426–39. pmid:35786502
- 15. Omuse G, Ichihara K, Maina D, Hoffman M, Kagotho E, Kanyua A, et al. Determination of reference intervals for common chemistry and immunoassay tests for Kenyan adults based on an internationally harmonized protocol and up-to-date statistical methods. PLoS One. 2020;15(7):e0235234. pmid:32645006
- 16. Evgina S, Ichihara K, Ruzhanskaya A, Skibo I, Vybornova N, Vasiliev A, et al. Establishing reference intervals for major biochemical analytes for the Russian population: a research conducted as a part of the IFCC global study on reference values. Clin Biochem. 2020;81:47–58. pmid:32278594
- 17. Owens D, Evans J. Population studies on Gilbert’s syndrome. J Med Genet. 1975;12(2):152–6. pmid:1142378
- 18. Çağan Appak Y, Aksoy B, Özyılmaz B, Özdemir TR, Baran M. Gilbert Syndrome and Genetic Findings in Children: A Tertiary-Center Experience from Turkey. Turk Arch Pediatr. 2022;57(3):295–9. pmid:35781232
- 19. Bakan E, Polat H, Ozarda Y, Ozturk N, Baygutalp NK, Umudum FZ, et al. A reference interval study for common biochemical analytes in Eastern Turkey: a comparison of a reference population with laboratory data mining. Biochem Med (Zagreb). 2016;26(2):210–23. pmid:27346966
- 20. Borai A, Ichihara K, Al Masaud A, Tamimi W, Bahijri S, Armbuster D, et al. Establishment of reference intervals of clinical chemistry analytes for the adult population in Saudi Arabia: a study conducted as a part of the IFCC global study on reference values. Clin Chem Lab Med. 2016;54(5):843–55. pmid:26527074
- 21. Jia S, Wei L, Shi X, Sun D, Shi T, Lv H, et al. Reference intervals of biochemical analytes in healthy adults from northern China: A population-based cross-sectional study. Medicine (Baltimore). 2023;102(42):e35575. pmid:37861546
- 22. Agaravatt A, Kansara G, Khubchandani A, Sanghani H, Patel S, Parchwani D. Verification of Reference Interval of Thyroid Hormones With Manual and Automated Indirect Approaches: Comparison of Hoffman, KOSMIC and refineR Methods. Cureus. 2023;15(5):e39066. pmid:37323364
- 23. Shekarian A, Mazaheri-Tehrani S, Shekarian S, Pourbazargan M, Setudeh M, Abhari AP, et al. Prevalence of subclinical hypothyroidism in polycystic ovary syndrome and its impact on insulin resistance: a systematic review and meta-analysis. BMC Endocr Disord. 2025;25(1):75. pmid:40102852
- 24. Meyer A, Müller R, Hoffmann M, Skadberg Ø, Ladang A, Dieplinger B, et al. Comparison of three indirect methods for verification and validation of reference intervals at eight medical laboratories: a European multicenter study. Journal of Laboratory Medicine. 2023;47(4):155–63.
- 25. Grundy SM. Gamma-glutamyl transferase: another biomarker for metabolic syndrome and cardiovascular risk. Arterioscler Thromb Vasc Biol. 2007;27(1):4–7. pmid:17185620
- 26. Coskun A, Sandberg S, Unsal I, Yavuz FG, Cavusoglu C, Serteser M, et al. Personalized reference intervals - statistical approaches and considerations. Clin Chem Lab Med. 2022;60(4):629–35.