Performance and comparability of laboratory methods for measuring ferritin concentrations in human serum or plasma: A systematic review and meta-analysis

Background Different laboratory methods are used to quantify ferritin concentrations as a marker of iron status. A systematic review was undertaken to assess the accuracy and comparability of the most used methods for ferritin detection. Methods and findings National and regional databases were searched for prospective, retrospective, sectional, longitudinal and case-control studies containing the characteristics and performance of at least one method for serum/plasma ferritin determinations in humans published to date. The analysis included the comparison between at least 2 methods detailing: sensitivity, precision, accuracy, predictive values, inter-methods adjustment, and use of international reference materials. Pooled method performance was analyzed for each method and across methods. Outcomes Search strategy identified 11893 records. After de-duplication and screening 252 studies were assessed, including 187 studies in the qualitative analysis and 148 in the meta-analysis. The most used methods included radiometric, nonradiometric and agglutination assays. The overall within-run imprecision for the most reported ferritin methods was 6.2±3.4% (CI 5.69–6.70%; n = 171), between-run imprecision 8.9±8.7% (CI 7.44–10.35%; n = 136), and recovery rate 95.6% (CI 91.5–99.7%; n = 94). The pooled regression coefficient was 0.985 among all methods analyzed, and 0.984 when comparing nonradiometric and radiometric methods, without statistical differences in ferritin concentration ranging from 2.3 to 1454 μμg/L. Conclusion The laboratory methods most used to determine ferritin concentrations have comparable accuracy and performance. Registered in PROSPERO CRD42016036222.


Introduction
Ferritin is an iron storage protein present in all cells of the organism.A small amount is found in plasma and serum, which is a reflection of iron stores in healthy individuals [1,2].A low serum ferritin concentration is usually regarded as an indicator of iron depletion, although the interpretation of normal or high serum ferritin values is challenging in the presence of acute or chronic inflammatory processes [3,4], as ferritin is increased in iron overload states and inflammation [5,6,7].
Since ferritin concentration is widely used as marker of iron stores and status, it is important to determine if all methods commonly used to assess ferritin concentrations are capable of detecting and discriminating all possible iron statuses (deficiency, repletion, and overload), and to assess the comparability of methods across measurement systems, The World Health Organization (WHO) Expert Committee on Biological Standardization has established international reference materials to develop tests or to evaluate inter-laboratories performance.These reference materials for ferritin have been developed for calibrating working/secondary standards in the routine ongoing assays performed in laboratories and also for evaluating and standardizing new assays for ferritin quantification.At least three international reference materials have been developed: 1 st (liver), 2 nd (spleen) and 3 rd (recombinant) [8,9,10].
WHO is updating its global guidelines on the use of serum and plasma ferritin thresholds for diagnosis of iron deficiency and risk of iron overload [11,12].While currently WHO recognizes that ferritin is typically assessed in serum or plasma with enzyme immunoassays after venous blood collection, there is no specific recommendation on variability among analytical methods and commutability [13].
The aim of the present review was to analyze the performance and comparability of the most common laboratory methods used for serum or plasma ferritin concentration determinations to detect iron deficiency, repletion or overload in order to inform decisions on the need for adjustments in the interpretation of serum or plasma ferritin using different methods.This was achieved by determining the sensitivity, specificity and predictive value between ferritin methods; assessing the variability of serum or plasma ferritin concentrations using different laboratory methods of detection; and reviewing the use of international standard materials of ferritin for calibration purposes and in global public health surveillance.

Search strategy and selection criteria
A search strategy and structured search was performed and updated in March 27, 2017.The search strategy for MEDLINE is shown in Table 1.The strategy was adapted to the following international and regional databases: Cochrane Central Register of Controlled Trials, MED-LINE, Embase, CINAHL, Science Citation Index, Conference Proceedings Citation Index-Science, BIOSIS Previews, ARIF Reviews Database, PROSPERO, Database of Abstracts of Reviews of Effects (DARE), Cochrane Database of Systematic Review, IBECS, Scielo, Global Included studies contained data on participants of any age or gender that reported human serum or plasma ferritin concentrations in apparently healthy, iron deficient and/or iron overloaded populations.Studies from infection/inflammation settings, malaria areas, disaster or emergency areas, bacterial versus viral infections, acute versus chronic conditions, and data from diabetic, obese, overweight and/or, insulin-resistant individuals, were also included.Iron deficiency, iron repletion, and iron overload was defined by the trialists.If not specified, WHO cut-offs [13] were used.
The studies containing data using international reference materials (especially either of the WHO reference materials from spleen, liver or recombinant) to calibrate commercial assays, automated equipment or used in the routine ongoing assays performed in laboratories, were extracted, sub grouped and recorded in order to register the frequency of use in commercial assays and automated equipment developments, and in routine laboratory assays and surveys globally.
The primary outcomes were characteristics and performance indicators of ferritin methods: type (radiometric, nonradiometric, agglutination), sub-type (EIA, turbidimetric, chemiluminescent, RIA, IRMA), origin of the assay (homemade, commercial), detection equipment (single apparatus, automated multiple-analytes detection equipment), sensitivity, specificity, precision, limit of detection, recovery rate, within-run and between-run imprecision.Data comparing at least two methods for ferritin determination and extracted correlation coefficient, coefficient of agreement or Bland-Altman Plot or regression coefficients were also recorded.The use of international reference materials was also summarized including the origin, use for calibration while developing the assay or use on a routine basis.

Data collection and analysis
The data retrieved in each search was screened independently by two authors to assess eligibility.Selected records were independently extracted using data extraction forms tested and approved by all authors in order to enhance consistency amongst reviewers.Disagreements at any stage of the eligibility assessment process were resolved through discussion and consultation with a third author.
The information collected included: general information, sample size, baseline characteristics (iron status, age, sex, race, presence and severity of infection/inflammation), ferritin detection method(s) used, cutoff points used to define iron deficiency, repletion or overload, methodological details on ferritin quantification (method(s) used for ferritin determination(s) and the characteristics of the test already mentioned as outcomes.
We included a PRISMA (preferred reporting items for systematic reviews and meta-analyses) flow-chart of study selection (Fig 1) [14].
The variability of ferritin determinations was assessed by: 1. intrinsic indicators, commonly used to assess the characteristics of a particular ferritin method (e.g.precision and accuracy); and 2. comparative indicators between two ferritin methods, such as correlation coefficient.
The intrinsic indicators chosen to characterize the different ferritin methods were withinrun imprecision, between-run imprecision, recovery rate, minimal level of detection (μg/L) and assay linearity.
The comparative indicators between different ferritin methods included correlation coefficient (ρ), noise or disturbance term of the regression equation (intercept b) and slope of the regression equation (m).To assess comparability between methods we compute statistics of the corresponding linear regression equation parameters and fixed effects meta-analysis of the correlation.To estimate inter-methods adjustments, a search for laboratory assessment data was performed.The search included laboratory performance data or reports on internet and also directly contacted some laboratories to ask for the possibility of data sharing Bland-Altman plots and/or statistics were extracted if present in the selected studies.
The correlation coefficient between methods was performed by standard correlation fixedeffects meta-analysis, because the variance depends strongly on the correlation, the last is converted to the Fisher's z scale, and all analyses were performed using the transformed values.All statistical analysis was performed using Stata (https://www.jmp.com/en_gb/),MetaXL (http:// www.epigear.com/)and CMA (http://www.comprehensive.com)softwares.
Originally, covariate analysis, development of Summary Receiver Operating Curves (SROC) and accuracy estimates were planned for the following groups: a) test type or sub- type: radiometric, nonradiometric, agglutination, other; b) origin of the assay: homemade versus commercial c) detection equipment/system: single apparatus versus automated detection equipment; d) assay performance: variability and reproducibility; e) sample matrix: serum, plasma or erythrocytes; f) age group: infants (less than one year of age), children (two to 11.9 years of age), adolescents (12 to 18.9 years of age), adults (19 years of age or older); g) gender; h) vulnerability to iron deficiency: infants, women of reproductive age, pregnancy; i) body mass index (BMI); j) infection/inflammation: malaria area, emergency settings, groups of patients with inflammatory conditions; k) capability of both-ends detection: iron deficiency, sufficiency and overload; l) correlation of methods by publication year; m) calibration to international reference materials.The analyses were performed for the methods and subgroups with enough available data and included: test type or sub-type (radiometric, nonradiometric, agglutination); origin of the assay (homemade versus commercial); detection equipment/system (single apparatus versus automated detection equipment); assay performance (variability and reproducibility); limit of detection; calibration to international reference materials.

External laboratory quality assessment data
Data on ferritin measurement precision from laboratory quality assessment programs was searched and extracted to compare reported between-laboratory means and coefficients of variation.

Use of international reference materials
For assessing the use of international reference materials, we developed comparative tables for the most common used methods sub classified as homemade, commercial and automated.
For all analysis, selection of indicators was dictated by data availability and consistency.At least three studies reporting on a specific indicator detected by the same method type or subtype were required to perform meta-analysis estimates.Differences between covariates were assessed by visual inspection; non-overlapping confidence intervals (CIs) suggested a statistically significant difference in treatment effect between the subsets.

Within-run imprecision for method type or sub-type, origin of the assay and detection equipment
For analysis of within-run imprecisions, the methods more commonly used were 94 nonradiometric immunoassays (including 25 enzyme linked immuno sorbent assay (ELISA) and 25 chemiluminescent), 59 radiometric (27 radioimmunoassays (RIA) and 32 immunoradiometric assay (IRMA)), and 17 based on agglutination (including 13 turbidimetric and 3 nephelometric).

Between-run imprecision for detection method, origin of the assay and detection equipment
For analysis of between-run imprecision the methods more commonly used were 75 nonradiometric immunoassays (including 25 ELISA and 17 chemiluminescent), 47 radiometric (22 RIA and 25 IRMA), and 14 based on agglutination (including 10 turbidimetric and 3 nephelometric).

Limit of detection
The lowest average levels of detection were found in nonradiometric methods (2.3±4.5μg/L;n = 72) and agglutination methods (2.6±2.5μg/L;n = 14), compared to radiometric methods (7.3± 12.5μg/L; n = 47).There was no significant difference in average lowest levels of detection between commercial and homemade kits (4.

Assay linearity for method type, origin of the assay and detection equipment
The most reported maximal ferritin concentrations that maintained assay linearity were 500 and 1000μg/L (11 out of 56 assays).The highest concentration reported to maintain linearity of the curve was 6000μg/L.The highest concentrations that maintained linearity (2000-6000μg/L) were no related to a particular method or equipment; they were found with radiometric, nonradiometric or homemade assays as well as with agglutination or commercial assays.Likewise, the lowest linearity concentration reported was 100μg/L, and it was not related to origin of the assay (commercial or homemade), but the 3 studies that reported such linearity threshold, were nonradiometric.

Correlation between methods
Table 6 shows a correlation of 0.981 between all methods analyzed.Correlation coefficient was 0.984 when comparing nonradiometric and radiometric methods.

Intercept (b) and slope (m) of the regression equation
These comparative indicators were found in 85% of the selected articles (115/136), with 218 data entries.As seen in Table 7, intercept values were far from 0 although not significantly different from it for the comparisons between nonradiometric methods or between no radiometric and radiometric methods, showing a high variability with standard deviations between 8.85 and 20.83 μg/L.For the three comparisons, slopes were close to 1.

Agreement between laboratory methods
It is important to highlight that the use of the correlation, intercept, and slope indicators in our statistical analysis, only indicated the strength of the relationship or the linearity between compared methods, not the agreement between them.The most used agreement indicator Bland-Altman plots [200] and/or statistics, was only reported in seven studies [62,68,79,115,142,162,198]
In these seven studies, the authors stated that when ferritin concentrations were below a certain threshold (75 μg/L for three of the studies) [68,79,162], the agreement improved considerably.These results cannot be generalized because of the low number of studies reporting this indicator, although illustrate the limitations in interchangeability, especially above certain ferritin concentrations.

Laboratory assessment data
Data on ferritin measurement precision were obtained from four laboratory quality assessment programs, two reporting on between-laboratory means and coefficients of variation [201,202], other [203] on reference range data (lowest limits for ferritin range) and the forth on commutability and deviation from the mean [126].Based on the highest number of participating laboratories, a report from the XXI Laboratory Quality Assessment Program of the Spanish Society of Clinical Biochemistry and Molecular Pathology in 2014 was selected for analysis of ferritin results [202].
One hundred and eighty eight laboratories participated in this round, and most of them used automated equipment.The methods used for detection in these automated equipment were immunoenzymatic with different antigen-antibody pairs and detection systems (chemiluminiscence, agglutination).The confidence intervals and coefficient of variation calculated from this report are presented in S3 Table (supplementary material), showing a between-laboratory coefficient of variation from 7.52% to 13.10%, which is in agreement with our findings of a mean CV of 8.5% for EIA and 6.8% for chemiluminescence.

Use of international reference materials
As shown in S1 Table (supplementary material), some of the studies report calibrating the assays to WHO or other international reference materials, although calibration data is not presented in the articles [8,10,18,36,38,45,54,62,78,79,85,95,112,115,116,126,131,136,137,139,140,142,148,155,156,158,162,164,168,171,187,190].There were no studies reporting the use of these materials on a laboratory routine basis.Only one study reported results from a serum pool 'spiked' with either the 1 st and 2 nd international standard for ferritin.Those sera were measured by 52 laboratories using five automated methods and the recovery of the target values was calculated.The recovery of the first international reference materials by three of the methods was between 104 and 129%; and to the second reference material was between 99 and 125%.Authors recommend manufacturers to calibrate their methods against the 3 rd international standard (recombinant), and to periodically assess their methods relative to this standard as a means of avoiding assay drift over time [36].

Discussion
Measurement of iron status in the general population is important to determine the prevalence and distribution of iron deficiency and overload, and thus to decide appropriate interventions, and to monitor and evaluate the impact and safety of implemented public health programmes.
The selection of a test that reflects real ferritin concentration and has been validated through the use of reference materials has important implications from the perspective of the individual and public health pathway.The comparability of results from patients for differential diagnosis of iron deficiency or risk for iron overload has important implications for clinical decisions and also the appropriate use of resources.It is important to determine whether a therapeutic decision and treatment is having the expected effect and is not causing harm, regardless of the laboratory method used.Likewise, in the planning and evaluation of public health interventions it is important to determine the iron status of the targeted population as well as the impact of a nutrition-specific or nutrition-sensitive intervention.If different laboratory methods give different results, this may lead to misinterpretation of the effects and lead to incorrect public health decisions.Additionally, it should be possible to compare data from different surveys performed years apart and performed by different methods.
There are various methods to determine serum, plasma or erythrocyte ferritin concentration.The vast majority of them are based in antigen-antibody reactions that are detected by different methods.The detection could be by radioactive counting or by color development using different enzyme-substrate pairs to measure color development at visible range.Also, light scattering or turbidimetric measurements have been developed as well as fluorescence emission.The first reported methods for ferritin determination were radioactive, using radioisotopes to label the antigen (IRMA) or the antibody (RIA) [16,61].Common colorimetric methods include peroxidase or glycosidase based color development [177,204,205].Methods based in agglutination and light passage could be turbidimetric or nephelometric [35,206].More recently, quantum dots, plasma-mass spectrometry, and micro array based technologies have been developed [207,208].
Another milestone in the development of methods for ferritin determination was automatization.Radioactive methods started as home made in a few laboratories, followed in few years by colorimetric tests.Both methodologies were improved and commercialized, making them widely available.More recently, automated equipment that allow the determination of various metabolites, including ferritin, have been developed [34,157,209].The detection method for these equipment vary, but is mainly based on turbidimetry and chemiluminescence.
The evolution of routine medical laboratories is reflected in the different techniques applied for ferritin determination: the principle of the method used for detection has changed, as well as the use of automated equipment.Our results show that the methods used in the included studies and the use of automated or single apparatus equipment for detection, made no difference for ferritin determinations.
This analysis has limitations.It was not possible to identify a gold standard method for ferritin determinations.This precluded the possibility of comparing ferritin methods against a reference method or to develop 2x2 tables.Instead, individual method performance and comparisons between all available methods that contained consistent data were analyzed.
In most of the included studies there were no details on the characteristics of the human serum or plasma used to test or validate ferritin assays, and it was not possible to sub classify studies by iron status (deficiency, repletion or overload), physiological state, age, gender or inflammatory conditions of the samples used to describe or compare ferritin methods.In fact, many valuable studies do not report on the methods used to determine ferritin.A recent systematic review summarized the serum ferritin thresholds recommended by international professional organizations associations worldwide to define iron deficiency in different population groups [210].There was no mention on the preferred or recommended laboratory methods or the commutability between them to interpret the results.
Although it was not possible to perform comparisons or meta-analysis, available studies report the use of WHO or other international reference materials while developing and calibrating an assay, but also to improve comparability between methods and assays components [10,36,87,88,96,139,140,181].Studies reporting on commercial, automated equipment refer the use of international reference materials for calibration while developing an assay [96].One study on accuracy and precision of seven commercial kits for serum ferritin (3 RIAs and 4 IRMAs), demonstrate that the use of a reference ferritin standard improved the accuracy of serum ferritin determination, but did not eliminate the variability of the determinations particularly for high ferritin concentrations [95].
The main challenges identified with the development of this review were related to the variety of available methods and small practical differences between them, which resulted in a challenge for categorization of methods.Other important issue was the quality of publications, especially for automated and commercial assays.In many cases they were only reports or abstract to scientific meetings, that did not result in a posterior formal, complete publication.The use of a Diagnostic Test Accuracy (DTA) methods approach to develop this review was not possible due to the difficulty to unequivocally identify a gold standard method to perform comparisons.
Regarding statistical analysis, it was not possible to perform meta-analysis of important indicators such as Bland-Altman statistics (mean differences and corresponding variance), repeatability, high dose hook effect and carry over due to the limited amount and quality of data.These limitations also come from the fact that the studies were not trying to compare accuracies of existing methods, but focused on similarities between two methods that were available to them.In fact, more than 98% of the included articles did not study methods agreement or use proper method comparison techniques such as Passing-Bablok regressions [112,142], Spearman correlations, and concordance methods [62].
The results from this review show that the methods used to determine ferritin are comparable and there is not preferred /recommended laboratory method, although the risk for radioactive contamination and expensive equipment are important drawbacks of RIA and IRMA.For patient follow-up, public health surveys or evaluations of impact of interventions it is recommended that, once selected, the same ferritin method should be used during the different stages of the intervention.
International reference materials (e.g.WHO standard) should be used for calibration of all commercial assays and probably in the regular laboratory practice for periodical assessment of the reagents and equipment of the routine method used, providing that the reference material has been shown to be commutable for a particular assay.It is important that reference materials are commutable so the results from samples are equivalent among all procedures, in order to obtain results traceable to the reference system and without calibration bias among procedures.Laboratories performing ferritin determinations for patient care or for public health assessments should participate in national or regional quality control surveys.

Table 7 . Mean and variance of the regression intercept and slope between nonradiometric and radiometric assays. Type of Assay Mean intercept ±SD 95% confidence interval Number of data entries
https://doi.org/10.1371/journal.pone.0196576.t007