Real-world study: Assessing the impact of hemolysis on 48 biochemical and immunological analytes through big data analysis and its feasibility validation

Chaochao Ma; Xiaoqi Li; Wei Luo; Lian Hou; Dandan Sun; Li Liu; Xin Liu; Ying Zhang; Jingrong Xu; Ling Qiu; Liangyu Xia

doi:10.1371/journal.pone.0340265

Abstract

Background

This research utilizes clinical laboratory real-world data to explore the influence of in vitro hemolysis on 48 biochemical and immunological analytes, aiming to propose and validate the methods and tools based on big data for analysis of the impact of hemolysis on laboratory analytes.

Methods

This research initially employs univariate analysis to display the levels and distribution of 48 analytes across different H-index groups. Subsequently, it utilizes quantile regression models to analyze the impact of hemolysis on laboratory analytes, adjusting for age, gender, patient type, and PVD, with the magnitude of impact described using β values and 95% CIs, visualized through error bar graphs. Finally, the study compares its results with those obtained from homogenized experimental research using the same testing platforms and hemolysis assessment methods, validating the feasibility of conducting research based on big data.

Results

Adjusting for gender, age, patient type, and PVD, hemolysis showed a significant positive interference on ALT, Alb, TBil, GGT, AST, CK, LD, K, P, Mg, and FFA(P < 0.001)., and a significant negative interference on DBil, Na, Cl, TCO2, and Cr (P < 0.001). High hemolysis levels also negatively interfere UA, PA, and GA. No consistent pattern of significance was observed for other analytes. Our multivariate analysis, when compared to experimental data, revealed a 93.0% concordance, with discrepancies noted in GGT, ALP, and RF.

Conclusions

The impact of hemolysis on laboratory analytes can be effectively evaluated through comprehensive big data analysis, demonstrating a level of consistency comparable to that of homogeneous experimental research.

Citation: Ma C, Li X, Luo W, Hou L, Sun D, Liu L, et al. (2026) Real-world study: Assessing the impact of hemolysis on 48 biochemical and immunological analytes through big data analysis and its feasibility validation. PLoS One 21(1): e0340265. https://doi.org/10.1371/journal.pone.0340265

Editor: Tomasz W. Kaminski, Versiti Blood Research Institute, UNITED STATES OF AMERICA

Received: August 20, 2025; Accepted: December 17, 2025; Published: January 23, 2026

Copyright: © 2026 Ma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: National Natural Science Foundation of China project(No. 7227041577). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: CIs, Confidence Intervals; PVD, Patient Visit Department; H-index, Hemolysis index; ALT, Alanine Aminotransferase; TP, Total Protein; Alb, Albumin; TBil, Total Bilirubin; DBil, Direct Bilirubin; GGT, Gamma-Glutamyl Transferase; ALP, Alkaline Phosphatase; AST, Aspartate Aminotransferase; TBA, Total Bile Acid; CK, Creatine Kinase; LD, Lactate Dehydrogenase; ChE, Cholinesterase; K, Potassium; Na, Sodium; Cl, Chloride; TCO2, Total Carbon Dioxide; Ca, Calcium; Urea, Urea Nitrogen; Glu, Glucose; UA, Uric Acid; P, Phosphorus; TC, Total Cholesterol; TG, Triglycerides; HDL_C, High-Density Lipoprotein Cholesterol; LDL_C, Low-Density Lipoprotein Cholesterol; ApoA1, Apolipoprotein A1; ApoB, Apolipoprotein B; Lp(a), Lipoprotein(a); hsCRP, High Sensitivity C-Reactive Protein; Mg, Magnesium; PA, Prealbumin; RF, Rheumatoid Factor; HCY, Homocysteine; IgG, Immunoglobulin G; IgA, Immunoglobulin A; IgM, Immunoglobulin M; ASO, Antistreptolysin O; CysC, Cystatin C; C3, Complement Component 3; C4, Complement Component 4; FFA, Free Fatty Acids; GA, Glycated Albumin; Cr(E), Creatinine; SI, Serum Iron; TRF, Transferrin; SF, Serum Ferritin; Sfa, Serum folic acid; VB12, Vitamin B12.

1. Introduction

Hemolysis, the rupture of erythrocyte and other blood cell membranes leading to the release of intracellular contents into serum or plasma, occurs both in vivo and in vitro [1,2]. In vivo hemolysis precedes blood collection, often resulting from pathological conditions warranting further investigation, such as infections by hemolytic Gram-positive bacteria or complications from artificial heart valves and hereditary erythrocyte disorders. In contrast, in vitro hemolysis, the most prevalent pre-analytical error accounting for 40–70% of global clinical chemistry sample rejections [3,4], arises from improper handling, transportation, and storage of samples.

Hemolysis can interfere with laboratory test results, leading to significant disruptions in clinical decision-making processes. The mechanisms of interference are multifaceted, including spectrophotometric interference from released hemoglobin, the interaction of intracellular components with analytes, sample dilution effects, and chemical interference from substances like free hemoglobin [5]. The extent of interference varies with the hemolysis level and may depend on the assay method, necessitating careful management of hemolyzed samples in laboratory measurements.

Previous research has shown that the impact of hemolysis on laboratory analytes significantly depends on the assay method and instrumentation used [6,7]. Thus, clinical laboratories must evaluate the influence of hemolysis on analytes based on their specific methodologies and equipment, including those assessing hemolysis levels, to develop appropriate management strategies for hemolyzed samples. However, experimental evaluation of hemolysis [8,9] effects is often laborious, time-consuming, and costly, requiring significant resources for extensive assessments across different platforms and methods. With advancements in computing power and the development of programming languages like R and Python, coupled with the training of data science talent in clinical laboratories, mining real-world data [10,11] for evidence to support clinical decisions [12,13] and laboratory management has become feasible. Accordingly, this study utilizes real-world big data from clinical laboratories, using R for data cleaning and annotation. We employed non-parametric regression models to analyze the impact and magnitude of in vitro hemolysis on 48 biochemical and immunological analytes. By integrating sampling error and model visualization concepts, this research aims to demonstrate the effects of in vitro hemolysis, comparing results from big data analysis with experimental outcomes to verify the feasibility of this approach. This study not only provides evidence on the extent of in vitro hemolysis’s impact on 48 laboratory analytes but also offers a real-world data analysis approach for assessing the effects of in vitro hemolysis on laboratory analytes, with open-source code serving as a valuable tool for laboratories conducting related analyses.

2. Method and materials

2.1. Study design and approach

This study was conducted based on real-world data, utilizing patient data from visits to Peking Union Medical College Hospital between September 1, 2016, and December 31, 2023. Data was accessed between January and February 2024. The research process is divided into six parts:

Obtaining a total data subset from the data repository according to our inclusion and exclusion criteria.
Developing a data cleaning scheme and executing data cleaning.
Describing the basic information of the dataset, including sample size, gender ratio, patient age level, and the proportion of patient sources.
Conducting univariate analyses to assess the impact of different degrees of hemolysis on 48 biochemical and immunological analytes.
Performing multivariate analyses to evaluate the impact of different degrees of hemolysis on 48 biochemical and immunological analytes, with adjustments made for influencing factors using models.
Comparing with similar clinical research results from our laboratory in the past to evaluate the feasibility of analyzing the impact of hemolysis on laboratory tests based on real-world data analysis.

2.2. Data inclusion and exclusion

Data were selected from the total data repository according to the following inclusion and exclusion criteria, establishing a data subset as illustrated in Fig 1.

Download:

Fig 1. Flowchart of the study process.

https://doi.org/10.1371/journal.pone.0340265.g001

2.2.1. Inclusion criteria.

Data with a measured hemolysis index and collected between September 2016 and December 2023 were included to ensure consistency in the testing system during this period.

2.2.2. Exclusion criteria.

Hemolysis levels were treated as ordinal categorical variables and divided into six grades (0–5). Records with a hemolysis level of 5 or above, representing extremely severe hemolysis with very few samples, were excluded. In addition, data associated with diagnostic variables such as thrombotic thrombocytopenic purpura, paroxysmal nocturnal hemoglobinuria, autoimmune hemolytic anemia, thrombosis, or cold agglutinin disease were excluded. Samples with an unknown gender variable, data used exclusively for research purposes, and non-serum specimens such as drainage fluid, pleural effusion, and ascites were also excluded. Finally, any records containing errors in basic information variables were removed.

2.3. Data cleaning process

The data cleaning procedures and details for this research are outlined below:

Due to the treatment of severely hemolyzed samples as non-qualified, some patients have multiple test results, which can affect the construction of the model. Therefore, only the first test result for the same patient was retained to ensure consistency and reliability in the dataset.
Exclusion of Missing and Null Values: Entries with missing or null values in their test result variables were removed from the dataset to improve data quality and analysis accuracy.
Patient Sample Source Coding: Outpatient samples were coded as 0, whereas inpatient samples were coded as 1, facilitating the differentiation of patient origin in the analysis.
Gender Encoding: Male participants were encoded with a value of 1, and female participants were encoded with a value of 0, standardizing the representation of gender across the dataset.
Age Standardization: The process involved standardizing age data by eliminating units and uniformly converting age to years, discarding any other units. This step ensures that age data is consistent and comparable across the dataset.
To standardize the variable Patient’s Department of Visit (PDV), if a patient visits the Pediatrics department, assign a value of 1, for Endocrinology-related departments assign 2, for Nephrology-related departments assign 3, for Emergency department assign 4, for Hematology-related departments assign 5, for Gynecology or Obstetrics-related departments assign 6, and assign 7 for all other departments.

These meticulous data cleaning steps were crucial to ensuring the dataset’s accuracy and usability, rendering it suitable for the intended analyses.

2.4. Instruments and methods

This research employed the AU5800 automatic biochemical analyzer (Beckman Coulter, USA) for conducting 45 out of the 48 analytical tests. These tests include Alanine Aminotransferase (ALT), Total Protein (TP), Albumin (Alb), Total Bilirubin (TBil), Direct Bilirubin (DBil), Gamma-Glutamyl Transferase (GGT), Alkaline Phosphatase (ALP), Aspartate Aminotransferase (AST), Total Bile Acid (TBA), Creatine Kinase (CK), Lactate Dehydrogenase (LD), Cholinesterase (ChE), Potassium (K), Sodium (Na), Chloride (Cl), Total Carbon Dioxide (TCO2), Calcium (Ca), Urea Nitrogen (Urea), Glucose (Glu), Uric Acid (UA), Phosphorus (P), Total Cholesterol (TC), Triglycerides (TG), High-Density Lipoprotein Cholesterol (HDL_C), Low-Density Lipoprotein Cholesterol (LDL_C), Apolipoprotein A1 (ApoA1), Apolipoprotein B (ApoB), Lipoprotein(a) (Lp(a)), High Sensitivity C-Reactive Protein (hsCRP), Magnesium (Mg), Prealbumin (PA), Rheumatoid Factor (RF), Homocysteine (HCY), Immunoglobulin G (IgG), Immunoglobulin A (IgA), Immunoglobulin M (IgM), Antistreptolysin O (ASO), Cystatin C (CysC), Complement Component 3 (C3), Complement Component 4 (C4), Free Fatty Acids (FFA), Glycated Albumin (GA), Creatinine (Cr(E)), Serum Iron (SI) and Transferrin (TRF).The remaining three tests: Serum Ferritin (SF), Serum folic acid (Sfa), and Vitamin B12 (VB12) were carried out using the UniCel DxI 800 Access Immunoassay System (Beckman Coulter, USA). Blood samples for these analyses were collected in Vacuette blood collection tubes (Greiner Bio-One GmbH, Frickenhausen, Germany), which contain a coagulant to accelerate blood coagulation. Following serum separation, analyses were conducted as per the specified methods. Detailed information on the methods and units employed for these 48 tests is provided in S1 Table in S1 File, ensuring clear and comprehensive documentation of the analytical procedures.

Beckman Coulter instruments equipped with hemolysis index (H-Index) capabilities automatically assess these indices using spectrophotometric methods. The instruments measure the absorbance of light at specific wavelengths that correspond to the optical characteristics of hemoglobin. Based on these measurements, the instrument calculates the indices, which help laboratory personnel identify samples that may require pre-analytical treatment, dilution, or re-collection to ensure accurate test results. In this study, the hemolysis index is classified into six levels: 0, 1, 2, 3, 4, and 5, with level 5 indicating samples with very severe hemolysis. Given that samples with an H-index of 5 are rare and could negatively impact model fitting, they have been excluded from the analysis. The H-index classification used here follows the manufacturer’s LIH serum-index workflow described in the analyzer’s Instructions for Use: a continuous spectrophotometric H-index (derived from predefined hemoglobin-sensitive wavelengths and internally calibrated to hemolysis signal intensity) is converted to ordinal categories via analyzer-specific decision limits, i.e., instrument-defined cut-offs that correspond to increasing degrees of hemolysis and may be locally verified/adjusted by the laboratory to align with method performance and reporting policy.

2.5. Quality control

The quality assurance in this study revolves around two critical aspects: quality control during data generation, and quality control during data analysis and programming. During the data generation phase, all 48 tests regularly participated in the inter-laboratory quality assessment program organized by the National Clinical Laboratory Center. Additionally, daily quality control (QC) checks were conducted for these tests to ensure the accuracy and reliability of our test results. Throughout the period covered by the data, there were no changes to the analytical platforms for the analytes. Measurements of specimens were only conducted after passing quality control criteria.

Regarding data analysis and programming, a rigorous code review mechanism was implemented. This mechanism not only ensured that each piece of code was logically annotated but also required that every segment of code be reviewed, checked, and tested by two independent individuals. This dual-review process ensured robust and consistent analysis, thereby minimizing the potential for computational errors or oversights and ensuring that our research findings were based on accurate and reliable data.

2.6. Statistical analysis

Data were organized in Microsoft Excel 365 (Microsoft, Redmond, WA, USA) and analyzed using R (version 4.3.1) with relevant packages and scripts. The hemolysis index, categorized into five ordinal levels (0, 1, 2, 3, 4), was grouped according to the H-index output of the instrument, excluding level 5 due to its extremely low frequency and potential to destabilize statistical modeling. Within each group, the Anderson-Darling test was employed to assess the normality of the data. For descriptive statistics, if continuous variables conformed to normality, they were described using mean (standard deviation); otherwise, median (interquartile range) was used. Categorical variables with two categories were described using proportions.

In univariate analysis, if the data across groups were normally distributed, multivariate analysis of variance was utilized for inter-group difference testing. If the data did not meet the normality assumption, the Kruskal-Wallis H test was conducted to examine differences among the H-index groups. For the Kruskal-Wallis H test, the Benjamini-Hochberg (BH) method was applied to control the False Discovery Rate (FDR), thereby balancing the risk of false positives with statistical power. Furthermore, post-hoc pairwise comparisons were performed using the dunn.test package in R, and the Bonferroni correction was used for P-value adjustment to ensure the reliability of the results. Violin plots were used to display and compare data distributions, incorporating features of both box plots and kernel density plots to show statistical information such as median and quartiles, as well as density estimates. Sensitivity analysis was performed by excluding data from individuals under 18 years old to observe if the results changed, considering this analysis does not adjust for other factors’ effects.

For multivariate analysis, considering the impact of outliers on parametric model results, the study employed multivariate quantile regression for modeling, which offers better robustness to outliers. The response variable in the model was the result values of the laboratory tests, with H-index as an explanatory variable. Dummy variables were created for the H-index, using the data with H-index equal to 0 as the reference. The model also adjusted for sex, age, and patient type, PDV. The results were visualized to show changes in the median of the laboratory test results and their 95% confidence intervals with increasing degrees of hemolysis, after adjusting for various factors. If the model issued warnings due to insufficient sample size, a multivariate linear regression model was used under the assumption that model premises were met. To evaluate the dose-dependent impact of hemolysis, we assessed the monotonicity of the adjusted regression coefficients across hemolysis groups (indices 1–4) using a one-sided Spearman’s rank correlation test. To ensure robustness, a penalty correction was applied to the trend P-value based on the number of non-significant coefficients (P > 0.05) in the regression models.

To ensure the robustness and validity of the multivariate regression models, a comprehensive model diagnostic framework was applied. For quantile regression models, Koenker’s goodness-of-fit measure (R^1, a pseudo-R^2 based on the reduction in deviance) was calculated to quantify the proportion of variance explained by the predictors. The Akaike Information Criterion (AIC) was computed to assess model parsimony. Furthermore, to evaluate the generalization performance and rule out overfitting, the Root Mean Square Error (RMSE) was estimated using a 5-fold cross-validation procedure. Finally, residual diagnostics (residuals vs. fitted values plots) were visually inspected for all models to verify the assumptions of linearity and to confirm the presence of heteroscedasticity, thereby justifying the selection of quantile regression over ordinary least squares (OLS) regression for specific analytes.

Data analysis and visualization were performed using R’s for loop, with the ggplot2 package for drawing and the rq function for modeling. The significance level was set at 0.05.

2.7. Ethics statements

Ethical clearance was obtained from the Ethics Committee of the Peking Union Medical College Hospital of the Chinese Academy of Medical Sciences, under approval number I-24PJ2489, prior to the commencement of the study. All procedures adhered strictly to the applicable guidelines and regulations. The data were analyzed in an anonymized manner to ensure the privacy and confidentiality of individuals in the database.

3. Results

3.1. Baseline characteristics

In our detailed evaluation of normality across continuous variables, stratified by four distinct levels of hemolysis, significant deviations from a normal distribution were universally observed across a wide array of analyzed variables (S1 Table in S1 File). The baseline characteristics of the study population were stratified into five groups based on H-index values, ranging from 0 to 4. The distribution of sample sizes, gender ratios, ages, and patient type ratios varied across the H-index categories, reflecting the diversity within the population on these variables (Table 1).

Download:

Table 1. Baseline characteristics according to H-index categories.

https://doi.org/10.1371/journal.pone.0340265.t001

3.2. Univariate analysis of hemolysis on biochemical and immunological analytes

In our analysis, while TC, Lp(a), IgM, C3, and SI showed no statistically significant differences across different degrees of hemolysis, the impacts on other analytes were significant (Table 2). Specifically, AST, CK, LD, K, P, Mg, and FFA levels increased with the severity of hemolysis. In contrast, Cr and TCO2 levels decreased as hemolysis intensity escalated (Fig 2). These findings illustrate the variable impact of hemolysis on biochemical and immunological analytes, highlighting the importance of considering hemolysis degree in clinical evaluations. Detailed results of the post-hoc analysis are provided in S2 File.

Download:

Table 2. Univariate analysis of the impact of hemolysis on biochemical and immunological biomarkers.

https://doi.org/10.1371/journal.pone.0340265.t002

Download:

Fig 2. Distribution and levels of biochemical and immunological analytes across different hemolysis groups.

https://doi.org/10.1371/journal.pone.0340265.g002

3.3. Sensitivity analysis excluding participants under 18 years of age

In our sensitivity analysis, which excluded participants under the age of 18, the impact of hemolysis on biochemical and immunological analytes was reassessed using univariate analysis. This approach revealed that the effects of hemolysis on IgG, IgA, CysC, C4, and GA became statistically non-significant after removing data from underage participants. Conversely, the influence of hemolysis on TC shifted to become significant. The outcomes for other analytes remained unchanged (S1 Table in S1 File). These results underscore the importance of considering age as a potential confounder in the evaluation of hemolysis effects on specific analytes.

3.4. Multivariate analysis of hemolysis impact on biochemical and immunological analytes

After adjusting for gender, age, patient type, and PVD, with a hemolysis index of 0 as the reference point, hemolysis significantly positively influenced ALT, Alb, TBil, GGT, AST, CK, LD, K, P, Mg, and FFA. Conversely, the impact on DBil, Na, Cl, TCO2, and Cr was significantly negative. High levels of hemolysis negatively interfere with UA, PA, and GA. The effects on other analytes did not exhibit a consistent pattern of significance (Table 3). Within the analytes significantly affected, the positive impact of hemolysis on ALT, TBil, AST, CK, LD, K, P, Mg, and FFA intensifies with an increase in the hemolysis index. Simultaneously, the negative effects on Na TCO2, and Cr also amplify as the hemolysis index rises (Fig 3). Compared to the univariate analysis, the multivariate analysis, which adjusted for potential confounding factors, demonstrated that the effects of hemolysis on ALT, TBil, GGT, and Na were of trend-level statistical significance(P < 0.05). The negative interference of hemolysis on RF was not observed in the multivariate analysis. The hemolysis impact on analytes underwent modifications after adjusting for confounding factors in the multivariate analysis, which also provided 95% CI estimates for the effect sizes of hemolysis. This adjustment offers a more precise understanding of how hemolysis influences each analyte, underlining the importance of considering various patient and clinical characteristics when interpreting the effects of hemolysis on laboratory results. Detailed model diagnostic metrics (including Pseudo-R^2, AIC, and RMSE) are provided in S3 File, and the corresponding residual diagnostic plots are available in S4 File.

Download:

Table 3. Multivariate analysis of hemolysis impact on biochemical and immunological analytes adjusted for gender, age, patient type, PDV.

https://doi.org/10.1371/journal.pone.0340265.t003

Download:

Fig 3. Visualization of the impact of hemolysis on biochemical and immunological analytes after adjusting for confounding factors in the multivariate analysis.

https://doi.org/10.1371/journal.pone.0340265.g003

3.5. Comparison with experimentally obtained results

The results in our study derived from big data analysis were compared with those [14] obtained through experimental methods on the same platform and hemolysis index assessment techniques (Table 4). The comparison revealed that 38 parameters, including K, Na, and LD, yielded consistent results across both studies. Notably, for key analytes such as K, the degree of hemolysis impact calculated through big data closely matched the experimental findings. However, discrepancies were observed in the results for GGT, ALP, and RF.

Download:

Table 4. Comparison with results from other studies.

https://doi.org/10.1371/journal.pone.0340265.t004

4. Discussion

In this study, we leveraged real-world clinical laboratory data to analyze the impact of in vitro hemolysis on 48 biochemical and immunological analytes using non-parametric regression and other methods. Extensive work was undertaken in data cleaning and the presentation of baseline information, highlighting data cleaning as a critical step in big data analysis [15]. The use of regular expressions for handling text variables ensured the correct extraction of hemolysis indices, while secondary database searches were employed to address missing variable information and verify data integrity. Importantly, the study excluded results for samples with a hemolysis index of 5 due to their minimal volume, which could significantly impact model construction and parameter estimation. Future researchers are advised to exercise caution when dealing with groups of small sample sizes to prevent bias in parameter estimates. Additionally, during the data cleaning process, our study established stringent criteria for data inclusion and exclusion, eliminating diseases associated with in vivo hemolysis and excluding observations that could potentially affect experimental outcomes, such as cold agglutinin disease, from the study. The aim was to minimize interference with the research findings, ensuring that the analysis accurately reflects the impact of in vitro hemolysis on laboratory analytes without confounding from unrelated pathological conditions.

In presenting baseline information, given the potential for false positives due to large sample sizes [16], only descriptive statistical results were objectively presented. These results highlighted differences in gender composition, age, and patient types across different hemolysis index groups. Such confounding factors’ imbalance across groups poses a potential risk to the study’s outcomes, underscoring the necessity of adjustment in modeling efforts.

The baseline information table highlights imbalances across different hemolysis groups, underscoring the necessity of conducting multivariate analysis. Given the small sample size for certain analytes with a H-index of 4, and to observe changes introduced by multivariate adjustments, we performed univariate analysis prior to multivariate modeling. This approach allowed us to objectively present the levels and distributions of various test parameters across different H-index groups before adjustment, providing a clear foundation for subsequent analyses.

Acknowledging the influence of age on various analytes, our study conducted an additional round of univariate analysis after excluding minors. This adjustment led to a shift in the conclusions for indicators such as IgG, IgA, CysC, C4, and GA. These steps is crucial not only for identifying potential confounding factors but also for ensuring the robustness and validity of the multivariate models developed later in the study.

In our study, we utilized clinical laboratory real-world data to analyze the impact of in vitro hemolysis on 48 biochemical and immunological analytes. Our findings corroborate with those of Ji JZ et al. [17], who noted severe negative interference of hemolysis on PA, consistent with our observations. Additionally, our research identified a negative interference of hemolysis on Cr, diverging from the results presented by Mehmet Koseoglu [6], likely due to differing creatinine detection methodologies. The effect of hemolysis on TBil is a subject of debate [6,17,18]; our results indicate a positive interference, adding to the discourse. For analytes like ALT, GGT, AST, CK, LD, K, P, Mg, and FFA, a significant positive interference from hemolysis was observed, aligning with the conclusions of many previous studies [19,20]. Among them, the effect of analytes such as ALT was statistically significant, but the clinical significance was small. The discrepancy in findings emphasizes the need for clinical laboratories to assess the impact of hemolysis based on their specific testing platforms and devise suitable strategies for managing hemolyzed samples. Further, through multivariate modeling adjusted for confounders, minor changes were noted in the impact magnitude of hemolysis on analytes like K, without altering the trend. For analytes significantly affected by confounders, such as TCO2, adjustments revealed a trend-level impact. Comparing our multivariate analysis results with experimental data showed a high agreement rate of 93.0%, except for discrepancies in GGT, ALP, and RF. The discrepancy in RF could be attributed to our study’s small sample size compared to other analytes, potentially leading to biased estimates. The inconsistency in GGT results may relate to Xia’s study’s [14] limited sample size, impacting the statistical power, while the discrepancy in ALP may be due to the extreme rightward skewness of ALP that leads to bias in estimating the trend in small samples.

The high consistency between our big data analysis and experimental findings validates the rationale for multivariate adjustments. Importantly, it confirms the feasibility of conducting correlative analyses using big data. This convergence underscores the robustness of big data methodologies in replicating traditional experimental outcomes, offering a compelling case for their integration into clinical laboratory research and decision-making processes.

Crucially, our analysis of β values allows for a nuanced differentiation between statistical significance and clinical relevance, guiding precise laboratory interventions. While analytes such as ALT, TBil, P, TCO2, and Na exhibited statistically significant trends (P < 0.05) across hemolysis groups, the magnitude of these shifts (small β values) is likely negligible relative to clinical decision limits, suggesting limited clinical impact. In contrast, LD, CK, AST, and FFA demonstrated substantial positive interference (β values indicating large shifts) starting immediately at H-index 1, necessitating strict quality control even for mild hemolysis. Furthermore, we identified actionable thresholds for specific analytes where interference becomes clinically meaningful: K shows marked elevation at H-index ≧ 2, Mg at H-index ≧ 3, and Creatinine (Cr) shows significant negative bias at H-index≧ 3. These findings directly inform laboratory protocols. For instance, we recommend that for samples with H-index ≧ 2, the laboratory should withhold the report and contact the clinician for a re-draw, rather than releasing a potentially erroneous result that could misguide clinical decision-making.

Given the extensive number of analytes analyzed in this study, strict control over false discovery rates was essential to ensure the reliability of our statistical inferences. By applying the Benjamini-Hochberg (BH) correction to the initial screening (Kruskal-Wallis tests) and the conservative Bonferroni correction to post-hoc pairwise comparisons, we effectively minimized the risk of Type I errors (false positives). While these rigorous statistical adjustments inevitably reduced the significance of marginal associations, the relationships that remained significant are robust and highly likely to represent true biological interference caused by hemolysis.

The comprehensive model diagnostics provided further validation for our methodological choices and offered additional biological insights. Visual inspection of residual plots revealed distinct heteroscedasticity (non-constant variance) in the relationship between hemolysis indices and several analyte concentrations, providing strong statistical justification for employing quantile regression, which—unlike OLS—is robust to such distributional irregularities. The analysis of model fit statistics, specifically Koenker’s pseudo-R^2, highlighted the varying explanatory power of hemolysis across different tests. High pseudo-R^2 values were observed for markers such as LDH and K, indicating that hemolysis is the dominant driver of variance for these analytes; conversely, lower values for other analytes (such as RF) suggest that while hemolysis exerts a statistically significant effect, biological noise contribute more substantially to the observed variability. Additionally, the consistency of RMSE derived from 5-fold cross-validation confirms that the models possess stable generalization capabilities and are not subject to overfitting.

The limitations and strengths of this study are as follows: A limitation is that it solely investigates in vitro hemolysis, which may not be applicable to in vivo hemolysis scenarios. Additionally, severe hemolysis cases (H-index equal to 5) were not included in the analysis due to the small sample size. The advantages of this study include the introduction of a method using big data analysis to assess the impact and magnitude of hemolysis on laboratory analytes, with the development of open-source code providing a convenient analytical tool. Unlike previous studies that only provided point estimates, this study introduces the concept of sampling error through modeling, offering interval estimates for the impact of hemolysis on analytes. This provides more comprehensive evidence to support laboratory managers in devising rational strategies for managing hemolyzed samples. Furthermore, the study vividly and effectively demonstrates the influence and degree of hemolysis on analytes through appropriate visualization schemes. Lastly, the approach of big data analysis adopted in this study is also suitable for analyzing lipemia and jaundice, representing a more efficient and cost-effective analytical strategy.

5. Conclusion

Based on big data analysis, the impact of hemolysis on laboratory analytes can be effectively evaluated with a high level of consistency comparable to homogeneous experimental research.

Supporting information

S1 File. Clinical laboratory test index details, normality assessments, and hemolysis impact analyses of biomarkers.

https://doi.org/10.1371/journal.pone.0340265.s001

(DOCX)

S2 File. Post-hoc tests.

https://doi.org/10.1371/journal.pone.0340265.s002

(CSV)

S3 File. Model performance metrics.

https://doi.org/10.1371/journal.pone.0340265.s003

(CSV)

S4 File. Residual plots.

https://doi.org/10.1371/journal.pone.0340265.s004

(PDF)

S1 Data. Code for data analysis.

https://doi.org/10.1371/journal.pone.0340265.s005

(RMD)

S2 Data. Code for data cleaning.

https://doi.org/10.1371/journal.pone.0340265.s006

(R)

References

1. Guder WG. Haemolysis as an influence and interference factor in clinical chemistry. J Clin Chem Clin Biochem. 1986;24(2):125–6. pmid:3711796
- View Article
- PubMed/NCBI
- Google Scholar
2. Simundic A-M, Baird G, Cadamuro J, Costelloe SJ, Lippi G. Managing hemolyzed samples in clinical laboratories. Crit Rev Clin Lab Sci. 2020;57(1):1–21. pmid:31603708
- View Article
- PubMed/NCBI
- Google Scholar
3. Lippi G, Blanckaert N, Bonini P, Green S, Kitchen S, Palicka V, et al. Haemolysis: an overview of the leading cause of unsuitable specimens in clinical laboratories. Clin Chem Lab Med. 2008;46(6):764–72. pmid:18601596
- View Article
- PubMed/NCBI
- Google Scholar
4. Simundic A-M, Nikolac N, Vukasovic I, Vrkic N. The prevalence of preanalytical errors in a Croatian ISO 15189 accredited laboratory. Clin Chem Lab Med. 2010;48(7):1009–14. pmid:20441481
- View Article
- PubMed/NCBI
- Google Scholar
5. Šimundić AM, Gabaj NN, Guder WG. Preanalytical variation and preexamination processes. 2018.
6. Koseoglu M, Hur A, Atay A, Cuhadar S. Effects of hemolysis interferences on routine biochemistry parameters. Biochem Med (Zagreb). 2011;21(1):79–85. pmid:22141211
- View Article
- PubMed/NCBI
- Google Scholar
7. Fernandez P, Llopis MA, Perich C, Alsina MJ, Alvarez V, Biosca C, et al. Harmonization in hemolysis detection and prevention. A working group of the Catalonian Health Institute (ICS) experience. Clin Chem Lab Med. 2014;52(11):1557–68. pmid:24897397
- View Article
- PubMed/NCBI
- Google Scholar
8. Maimaiti M, Yang B, Xu T, Cui L, Yang S. Accurate correction model of blood potassium concentration in hemolytic specimens. Clin Chim Acta. 2024;554:117762. pmid:38211807
- View Article
- PubMed/NCBI
- Google Scholar
9. van Rossum HH. Demonstrating the feasibility of accurately and reliably correcting potassium results for mildly hemolytic samples using a new experimental design. Clin Chim Acta. 2021;522:83–7. pmid:34418365
- View Article
- PubMed/NCBI
- Google Scholar
10. Ma C, Li L, Wang X, Hou L, Xia L, Yin Y, et al. Establishment of reference interval and aging model of homocysteine using real-world data. Front Cardiovasc Med. 2022;9:846685. pmid:35433869
- View Article
- PubMed/NCBI
- Google Scholar
11. Ma C, Hou L, Zou Y, Ma X, Wang D, Hu Y, et al. An innovative approach based on real-world big data mining for calculating the sample size of the reference interval established using transformed parametric and non-parametric methods. BMC Med Res Methodol. 2022;22(1):275. pmid:36266618
- View Article
- PubMed/NCBI
- Google Scholar
12. Ma C, Yu Z, Qiu L. Development of next-generation reference interval models to establish reference intervals based on medical data: current status, algorithms and future consideration. Crit Rev Clin Lab Sci. 2024;61(4)298-316.
- View Article
- Google Scholar
13. Ma C, Wang X, Wu J, Cheng X, Xia L, Xue F, et al. Real-world big-data studies in laboratory medicine: current status, application, and future considerations. Clin Biochem. 2020;84:21–30. pmid:32652094
- View Article
- PubMed/NCBI
- Google Scholar
14. Xia L, Xu E, Cao XB, Liu L, Liu Q, Cheng X, et al. The effect of hemolysis on 41 chemistry and immunology tests and determination of hemolysis alert index. Chin J Lab Med. 2017;40(1):947–52.
- View Article
- Google Scholar
15. Chaochao M, Yicong Y, Li L, Qian L, Xin L, Ying Z, et al. Explore the reasons affecting the consistency of reference intervals established by two types of indirect methods for 34 biochemical analytes. Chin J Lab Med. 2023;46:1083–93.
- View Article
- Google Scholar
16. Wei X. R language data mining method and application. Electronic Industry Press; 2016.
17. Ji JZ, Meng QH. Evaluation of the interference of hemoglobin, bilirubin, and lipids on Roche Cobas 6000 assays. Clin Chim Acta. 2011;412(17–18):1550–3. pmid:21575617
- View Article
- PubMed/NCBI
- Google Scholar
18. Lippi G, Salvagno GL, Montagnana M, Brocco G, Guidi GC. Influence of hemolysis on routine clinical chemistry testing. Clin Chem Lab Med. 2006;44(3):311–6. pmid:16519604
- View Article
- PubMed/NCBI
- Google Scholar
19. DiToro DF, Conrad MJ, Jarolim P. Hemolysis index and potassium reporting: contextual evidence-based reporting criteria. Am J Clin Pathol. 2022;157:809–13.
- View Article
- Google Scholar
20. Yin T, Herskovits AZ. The impact of hemolysis-index thresholds on plasma and serum potassium measurements. J Appl Lab Med. 2022;7(3):788–93. pmid:35018422
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Guder WG. Haemolysis as an influence and interference factor in clinical chemistry. J Clin Chem Clin Biochem. 1986;24(2):125–6. pmid:3711796
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Simundic A-M, Baird G, Cadamuro J, Costelloe SJ, Lippi G. Managing hemolyzed samples in clinical laboratories. Crit Rev Clin Lab Sci. 2020;57(1):1–21. pmid:31603708
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Lippi G, Blanckaert N, Bonini P, Green S, Kitchen S, Palicka V, et al. Haemolysis: an overview of the leading cause of unsuitable specimens in clinical laboratories. Clin Chem Lab Med. 2008;46(6):764–72. pmid:18601596
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Simundic A-M, Nikolac N, Vukasovic I, Vrkic N. The prevalence of preanalytical errors in a Croatian ISO 15189 accredited laboratory. Clin Chem Lab Med. 2010;48(7):1009–14. pmid:20441481
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Šimundić AM, Gabaj NN, Guder WG. Preanalytical variation and preexamination processes. 2018.

[ref6] 6. Koseoglu M, Hur A, Atay A, Cuhadar S. Effects of hemolysis interferences on routine biochemistry parameters. Biochem Med (Zagreb). 2011;21(1):79–85. pmid:22141211
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Fernandez P, Llopis MA, Perich C, Alsina MJ, Alvarez V, Biosca C, et al. Harmonization in hemolysis detection and prevention. A working group of the Catalonian Health Institute (ICS) experience. Clin Chem Lab Med. 2014;52(11):1557–68. pmid:24897397
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Maimaiti M, Yang B, Xu T, Cui L, Yang S. Accurate correction model of blood potassium concentration in hemolytic specimens. Clin Chim Acta. 2024;554:117762. pmid:38211807
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. van Rossum HH. Demonstrating the feasibility of accurately and reliably correcting potassium results for mildly hemolytic samples using a new experimental design. Clin Chim Acta. 2021;522:83–7. pmid:34418365
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Ma C, Li L, Wang X, Hou L, Xia L, Yin Y, et al. Establishment of reference interval and aging model of homocysteine using real-world data. Front Cardiovasc Med. 2022;9:846685. pmid:35433869
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Ma C, Hou L, Zou Y, Ma X, Wang D, Hu Y, et al. An innovative approach based on real-world big data mining for calculating the sample size of the reference interval established using transformed parametric and non-parametric methods. BMC Med Res Methodol. 2022;22(1):275. pmid:36266618
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref12] 12. Ma C, Yu Z, Qiu L. Development of next-generation reference interval models to establish reference intervals based on medical data: current status, algorithms and future consideration. Crit Rev Clin Lab Sci. 2024;61(4)298-316.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref13] 13. Ma C, Wang X, Wu J, Cheng X, Xia L, Xue F, et al. Real-world big-data studies in laboratory medicine: current status, application, and future considerations. Clin Biochem. 2020;84:21–30. pmid:32652094
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Xia L, Xu E, Cao XB, Liu L, Liu Q, Cheng X, et al. The effect of hemolysis on 41 chemistry and immunology tests and determination of hemolysis alert index. Chin J Lab Med. 2017;40(1):947–52.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref15] 15. Chaochao M, Yicong Y, Li L, Qian L, Xin L, Ying Z, et al. Explore the reasons affecting the consistency of reference intervals established by two types of indirect methods for 34 biochemical analytes. Chin J Lab Med. 2023;46:1083–93.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref16] 16. Wei X. R language data mining method and application. Electronic Industry Press; 2016.

[ref17] 17. Ji JZ, Meng QH. Evaluation of the interference of hemoglobin, bilirubin, and lipids on Roche Cobas 6000 assays. Clin Chim Acta. 2011;412(17–18):1550–3. pmid:21575617
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref18] 18. Lippi G, Salvagno GL, Montagnana M, Brocco G, Guidi GC. Influence of hemolysis on routine clinical chemistry testing. Clin Chem Lab Med. 2006;44(3):311–6. pmid:16519604
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref19] 19. DiToro DF, Conrad MJ, Jarolim P. Hemolysis index and potassium reporting: contextual evidence-based reporting criteria. Am J Clin Pathol. 2022;157:809–13.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref20] 20. Yin T, Herskovits AZ. The impact of hemolysis-index thresholds on plasma and serum potassium measurements. J Appl Lab Med. 2022;7(3):788–93. pmid:35018422
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

Figures

Abstract

Background

Methods

Results

Conclusions

1. Introduction

2. Method and materials

2.1. Study design and approach

2.2. Data inclusion and exclusion

2.2.1. Inclusion criteria.

2.2.2. Exclusion criteria.

2.3. Data cleaning process

2.4. Instruments and methods

2.5. Quality control

2.6. Statistical analysis

2.7. Ethics statements

3. Results

3.1. Baseline characteristics

3.2. Univariate analysis of hemolysis on biochemical and immunological analytes

3.3. Sensitivity analysis excluding participants under 18 years of age

3.4. Multivariate analysis of hemolysis impact on biochemical and immunological analytes

3.5. Comparison with experimentally obtained results

4. Discussion

5. Conclusion

Supporting information

S1 File. Clinical laboratory test index details, normality assessments, and hemolysis impact analyses of biomarkers.

S2 File. Post-hoc tests.

S3 File. Model performance metrics.

S4 File. Residual plots.

S1 Data. Code for data analysis.

S2 Data. Code for data cleaning.

References