Determination of reference intervals for common chemistry and immunoassay tests for Kenyan adults based on an internationally harmonized protocol and up-to-date statistical methods

Background Due to a lack of reliable reference intervals (RIs) for Kenya, we set out to determine RIs for 40 common chemistry and immunoassay tests as part of the IFCC global RI project. Methods Apparently healthy adults aged 18–65 years were recruited according to a harmonized protocol and samples analyzed using Beckman-Coulter analyzers. Value assigned serum panels were measured to standardize chemistry results. The need for partitioning reference values by sex and age was based on between-subgroup differences expressed as standard deviation ratio (SDR) or bias in lower or upper limits (LLs and ULs) of the RI. RIs were derived using a parametric method with/without latent abnormal value exclusion (LAVE). Results Sex-specific RIs were required for uric acid, creatinine, total bilirubin (TBil), total cholesterol (TC), ALT, AST, CK, GGT, transferrin, transferrin saturation (TfSat) and immunoglobulin-M. Age-specific RIs were required for glucose and triglyceride for both sexes, and for urea, magnesium, TC, HDL-cholesterol ratio, ALP, and ferritin for females. LAVE was effective in optimizing RIs for AST, ALT, GGT iron-markers and CRP by reducing influence of latent anemia and metabolic diseases. Thyroid profile RIs were derived after excluding volunteers with anti-thyroid antibodies. Kenyan RIs were comparable to those of other countries participating in the global study with a few exceptions such as higher ULs for TBil and CRP. Conclusions Kenyan RIs for major analytes were established using harmonized protocol from well-defined reference individuals. Standardized RIs for chemistry analytes can be shared across sub-Saharan African laboratories with similar ethnic and life-style profile.


Introduction
Reference intervals (RIs) are an integral part of laboratory reports as they assist clinicians in interpretation of results. RIs should be population specific to ensure appropriate interpretation. Unfortunately, many clinical laboratories in sub-Saharan Africa (SSA) adopt RIs provided by manufacturers of laboratory reagents/equipment without verifying them as recommended by the Clinical Laboratory Standards Institute (CLSI) [1]. This could result in inaccurate interpretation of quantitative laboratory results leading to medical errors. Saathoff et al carried out a study in the Mbeya region, south-western Tanzania and found marked differences in RIs from the United States (US), Tanzania and other SSA countries. Overall, only 80.9% of reference values (RVs) for clinical chemistry tests from healthy individuals in Tanzania would have been classified as normal as per the US RIs published by Kratz et al [2].
The International Federation of Clinical Chemistry (IFCC) under its Committee on Reference Intervals and Decision Limits (C-RIDL) has been carrying out a global RI study using a protocol that harmonizes the pre-analytical, analytical and post-analytical study processes to ensure ease of comparison of derived RIs across different regions, countries and ethnicities [3].
An interim report of the global RI study comprising data from 12 countries identified between ethnic group differences in both males and females for serum total protein (TP), albumin (Alb), total bilirubin (TBil), high density lipoprotein cholesterol (HDL-C), magnesium (Mg), C-reactive protein (CRP), IgG, complement 3 (C3), vitamin B12, and folate. Females were found to generally have more pronounced age-related changes in RVs. Ethnic differences in BMI-related changes was also demonstrated. The only African country whose data were included in the interim report was South Africa where comparisons between black South Africans and Caucasian / mixed race showed much higher levels of CRP in the black South Africans [4].
The Kenyan study was undertaken to explore sources of variation of RVs, to derive country specific RIs and to standardize the RIs by use of a value-assigned panel of sera [4] intended for nationwide use and international comparison.

Materials and methods
The methodology used in recruiting participants for the study, sample collection, handling and analysis has previously been published [5]. The study was approved by the Aga Khan University Hospital Nairobi (2014/REC-46) and Stellenbosch University (S16/10/219) Health Research Ethics Committees. The study was conducted in conformity with the Declaration of Helsinki.

Study population
Recruitment of study participants in Kenya was carried out between January and October 2015 in several counties. Majority were urban dwellers from the capital city Nairobi, Kiambu

PLOS ONE
Reference intervals for chemistry and immunoassay tests in Kenya PLOS ONE | https://doi.org/10.1371/journal.pone.0235234 July 9, 2020 2 / 19 coordination, sample collection, processing, quality assurance and shipping to the PathCare reference laboratory in Cape town, South Africa. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. PathCare Kenya Ltd provided support in the form of salaries for authors JM and CW who are its employees, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section.

Blood collection and handling
Blood samples were collected by trained phlebotomists into a serum separator tube for all analytes tested in serum, lithium heparin tube for troponin I, sodium fluoride tube for plasma glucose. Serum and plasma samples requiring centrifugation were spun 2-4 hours after collection and stored at −80˚C at the Aga Khan University Hospital, Nairobi (AKUHN). Centrifugation was done at 2000g, for 10 mins in a non-refrigerated centrifuge (Beckman coulter, Allegra X-30, Brea, California, US) These were subsequently shipped frozen to the PathCare reference laboratory in Cape Town, South Africa for analysis. We also drew 2mL of whole blood for testing hematology parameters using Beckman-Coulter ACT5-DIFF-CP analyser (Brea, California, US), tested in PathCare Nairobi. The test results were primarily used for establishing RIs for hematology parameters [5], but they were referred to in this study for secondary exclusion of individuals with latent anemia or inflammation.

Measurements
The analysis of all serum specimens was performed in batches on the Beckman Coulter AU 5800 (Brea, California, US) for chemistry assays and DXI (Brea, California, US) for immunoassays as summarized in Table 1.

Quality control
The PathCare reference laboratory is accredited by the South African National Accreditation System. For purposes of the global RI study, all participating laboratories received a panel of sera produced by the C-RIDL in 2014 that had assigned values [4]. This panel was measured by participating laboratories to enable recalibration of RVs using linear regression analysis. It also allowed for alignment of RVs across different countries by all-pairwise comparison of test results.

Statistical analysis
In order to assess sex, age and BMI as sources of variation, we adopted the standard deviation ratio (SDR), which represents a ratio of between-subgroup SD (variation of the subgroup means from grand mean) to between-individual SD (approximately 0.25 the width of RI). For calculating SDRs, we first performed 2-level nested ANOVA to compute between-sex SD and between-age SD after partitioning age at 30, 40, and 50 years. With the results, the SDR for between-sex SD (SDRsex) and between-age SD (SDRage) were calculated as a ratio to the residual SD that corresponds to roughly between-individual SD or SD comprizing RI (SD RI . Since between-age variation changes by sex, we also computed SDRage for each sex by one way ANOVA. We considered SDR�0.40 as a primary criterion for judging the need for partitioning RVs by sex and/or age [5]. As calculated parameters, globulin (Glb) was computed as TP-Alb; non-HDL-C as TC-HDL-C; HDL-C ratio (HDLrat) as TC/HDL-C. https://doi.org/10.1371/journal.pone.0235234.t001 However, SDR represents between-subgroup difference at the center of the RV distribution, which may not reflect the difference (bias) at LL or UL (ΔLL or ΔUL). Therefore, we also evaluated ΔLL or ΔUL as its ratio to SD comprising RI (SD RI ) [= |UL−LL|/3.92] and expressed it as bias ratio (BR) at LL or UL (BR LL or BR UL ). For example, the formula for determining BR for sex was: where subscripts of MF, M, and F attached to LL or UL indicate the RI without partition by sex (for male+female) and the RIs after partition by male and female, respectively [6]. The same calculation was done for judging the need for age-specific RIs by setting LL, UL with/ without partitioning by age. The distinction of the concepts between SDR and BR is illustrated in Fig 1. In setting the threshold for the bias ratios, we followed the convention of allowable limits of bias in measurements: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where SD I and SD G represent within-individual and between-individual SD [7]. Since SD RI or the denominator of BR is composed of both SD I and SD G and equal to ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi , we set 0.375 as a threshold for BR. This scheme was adopted in recent papers [8,9]. On the other hand, both SDR and BR depend on their common SD ratio (SDR) vs. bias ratio (BR) as a measure of between-subgroup differences. SDR represents between-subgroup differences at the center of distributions, while BR represents between-subgroup differences at the periphery (LL or UL) of the distributions. The numerator of SDR is between-subgroup SD (or SDsex, if sub-grouped by sex), while that of BR is a difference of LLs or ULs.
https://doi.org/10.1371/journal.pone.0235234.g001 denominator, SD RI . For example, when the RI is narrow, both ratios can be inflated. Conversely, when the RI is wide, both ratios are suppressed. In order to cope with such situations, we set a pragmatic third criterion that between-subgroup bias at LL or UL (ΔLL or ΔUL) should be equal to or more than 3 times the "reporting unit (RU)" to allow partitioning of RVs. RU represents a unit of value for reporting test results. If the number of digits below the decimal point in reporting test results is 2, 1, or 0, RU is 0.01, 0.1, or 1, respectively. The flow of logic in deciding the need for partitioning by sex or age is shown in Fig 2. RIs were determined using both parametric and non-parametric methods before and after applying the latent abnormal values exclusion (LAVE) method [10,11]. For the non-parametric method, the RVs coinciding with the 2.5 th and 97.5 th percentiles after sorting the data in ascending order were used as the lower and upper limits (LL, UL) of the RI. For the parametric method, the RVs were transformed into a Gaussian form by the Box-Cox power transformation formula, and then mean±1.96 SD was computed as the central 95% limits (LL−UL) under the transformed scale, which were then reverse transformed to get the LL and UL in the original scale [11].
As a measure for secondary exclusion of abnormal values, we tried to apply LAVE in deriving the RI both by parametric and nonparametric methods [11]. For LAVE, we primarily used the following set of reference tests: Alb, Glb, UA, Glu, non-HDL, TG, ALT, AST, LDH, CK, Scheme for partitioning RVs by sex and age. We adopted this flow-chart in judging the needs for partitioning RVs by sex and age. We defined between-sex (or -age) subgroup bias in reference to the three points: 1) SDR>0.4 that represents the between-subgroup bias at the center of RV distribution, 2) BR>0.375 that represents the between-subgroup bias at the limits (LL or UL) of RV distribution, and 3) the actual difference (bias) � three reporting unit (RP). There were eight possible choices for the partitioning. GGT, and CRP, which were meant for excluding individuals with inappropriate values in the nutritional, inflammatory, or muscular damage markers. As an exception, for iron related analytes (Fe, Ferr, Tf, and TfSat), we set hemoglobin (Hb), hematocrit (Hct), and mid-corpuscular volume (MCV) in addition to the four iron markers as the reference tests. For proteins (TP, Alb, Glb, IgG/A/M, CRP), we set all the seven tests plus white blood cell and platelet counts as the reference tests.
On the other hand, for determination of RIs for the thyroid panel, the LAVE procedure was not applied because reference tests associated with the thyroid panel were not available. Rather, we first estimated the cutoff values for anti-thyroglobulin antibody (TgAb), and anti-thyroid peroxidase antibody (TPOAb) from the probability paper plot drawn with x-axis in logarithmic scale, as an intersection of the central linear part with the horizontal line at 97.5 percentile as shown in Fig 3. For judging the need for adopting LAVE, we computed BR LL and BR UL by setting LLs and ULs of RIs with/without LAVE: i.e., SD RI in the denominator was set to the RI by the LAVE procedure. The 90% confidence intervals (CIs) of LL and UL were derived by the bootstrap method with resampling of the final set of RVs and repeated computations of LL and UL for 50 iterations. Accordingly, final RIs were determined as the averages of LL and UL thus derived. The median age was 39 years with a range of 18-65 years. The participant characteristics are summarized in Table 2.

Sources of variation
Sex, age and BMI as sources of variation were explored with SDR�0.4 regarded as being significant as shown in Table 3.
Between-sex differences exceeded that level in Alb, UA, Cre, TBil, Cl, ALT, AST, CK, GGT, Ferr, Tf, TfSat, and IgM. Similarly, between-age subgroup differences were significant for Alb, Cl, Glu, and TG in males, and for Alb, urea, Na, Mg, Glu, TC, TG, HDLrat, LDL-C, ALT, ALP, and Ferr in females. BMI was an independent source of variation for UA, Glu, TC, HDLrat, nonHDL-C, LDL-C, ALT, AST, LDH, and GGT in males, and for CRP only in females. Graphical representations of reference value distribution sub-grouped by sex and age are shown for 12 representative analytes in Fig 5 and for all analytes in S1 Fig.

Reference intervals
Generally, the parametric method resulted in similar or lower RI ULs, and narrower 90% CIs for the LL and UL of the RIs compared to the non-parametric method as shown in S2 Fig.  Besides, the accuracy of Gaussian transformation by the parametric method was confirmed as shown in S3 Fig. Therefore, we adopted the RIs derived by the parametric method exclusively. In addition, we could not calculate a RI for Trop I because 94.7% of RVs were below detection limit. For TgAb and TPOAb, we determined cutoff values by use of the probability paper plot as 2.5 IU/ml and 5.0 IU/mL respectively as shown in Fig 3. We regarded individuals who had antibody values exceeding either of the cutoffs (7.7% of males and 13.6% of females) as possible autoimmune thyroiditis (AIT), and excluded them when calculating RIs for thyroid function tests.
The LAVE method resulted in significant differences in LL or UL (BR UL or BR LL > 0.375) for some analytes as shown in S1 Table. For example, UL lowered for AST, ALT, CRP; LL raised for Fe.
Based on the decision flow chart shown in Fig 2, RIs for TP, Alb, Glb, Na, K, Cl, Ca, IP, Glu, TG, HDL-C, nonHDL, Lip, AMY, LDH, CRP, Fe, IgG and IgA were not partitioned by sex as shown in Table 4. For age partitions, although the mean age of menopause for females is about 50 years, we chose 45 years as the borderline because there were limited number of subjects above the age of 50.
A comparison of our RIs with those recommended by Beckman coulter and those derived from IFCC studies conducted in India, Saudi Arabia and Turkey found much higher RIs for TBil as shown in Table 5. A similar comparison that also includes studies carried out in other African countries is shown in S2 Table.

Standardization of the RIs
Since this study utilized a serum panel provided by C-RIDL which comprised 50 samples with values assigned to 25 chemistry analytes, we confirmed the standardized status of the assays as shown in S4 Fig. For the method comparison, major-axis regression was used to express the structural relationship between our test results and the panel assigned values by calculating BR LL or BR UL as a difference of LL or UL before and after the recalibration using the regression coefficients. As a result, we found it necessary to recalibrate our RIs for HDL-C and LDH. For Na, we could not get a good linear relationship because of poor precision of the assay with the narrow reference interval.

Discussion
There have been controversies over the statistical methods used in deriving RIs. They include: selection between parametric and nonparametric methods, how and when to exclude RVs secondarily, and when to partition RVs into subgroups by sex and age. In this study we sought  optimal options for each. We found the nonparametric method of limited use with its wider 90% CI for RIs and frequent bias in UL (S1 Fig), while the parametric method was found to be more reliable after successful Gaussian transformation (S3 Fig). For secondary exclusion, we found LAVE procedure effective for reducing the influence of over nutrition for those analytes with high association with BMI such as TG, ALT, AST, and CRP. It was also effective in reducing the influence of latent inflammation and anemia on Fe, Tf and TfSat. In order to decide the need for partitioning RVs into subgroups by sex or age, we primarily used SDR, but it tended to provide an inflated value when the width of the RI was narrow. Another problem of SDR is that it reflects between-subgroup difference at the central part which may not reflect between-subgroup bias at the LL or UL, hence the use of BR LL or BR UL . Furthermore, we found it necessary to confirm the appropriateness of BR LL or BR UL by quantitating the actual difference at LL or UL by use of the reporting unit (RU). We chose to adopt partitioned RIs only when the difference at LL or UL was �3RU. We believe this three-way consideration ensured optimal judgement in partitioning RIs by sex and age as shown in S1 Table. RIs can vary appreciably across different populations as demonstrated in the interim report of the global study analyzing RVs of 12 countries [4]. They identified ethnic differences in many analytes such as Alb, urea, TBil, HDL-C, CRP, IgG, C3, and PTH. In reference to the RV profiles, we noted certain unique features of Kenyan RIs. For instance, our urea RIs are lower than those from Turkey and Saudi Arabia. Although most of our RIs for liver function tests   were similar to published reports from Saudi Arabia and Turkey [12,13], our TBil RI of 6--43 μmol/L for males and 5-27 μmol/L for females is almost double what has been reported in published studies from outside the African continent as shown in S2 Table [2,[12][13][14][15]. We hypothesize that Gilbert's syndrome, the commonest genetic cause of asymptomatic unconjugated hyperbilirubinaemia, could be quite prevalent in our population. Genetic studies would thus be useful in elucidating the cause of hyperbilirubinaemia observed in our study. Our electrolytes didn't differ much from other published IFCC studies except that Mg levels increased with age in females. It has been documented that reduced levels of oestrogen are associated with increased Mg levels in post-menopausal women as well as in the follicular phase of the menstrual cycle in women of reproductive age [16,17].
Our TC and TG RIs were higher than those reported by Kibaya et al who carried out a similar study in a rural Kenyan population [18]. Our study population was primarily composed of urban dwellers of whom 25.6% had metabolic syndrome [19]. According to the third report of the National Cholesterol Education Program (NCEP), the desirable LDL-C level for low risk adults is <4.1 mmol/L [20]. Our UL for LDL-C in males (4.8 mmol/L) and in females (4.2 and 4.9 mmol/L) are higher than the NCEP targets. However, the NCEP clinical decision limits (CDLs) are used for diagnosis of hyperlipidaemia and serve as treatment targets for reducing cardiovascular risk. Aside from the risk of over-nutrition, the RI ULs for TC and LDL-C are important in diagnosing cholestatic conditions and hypothyroidism. These are relatively short-term conditions unrelated to occurrence of atherosclerosis hence the RIs are more relevant than CDLs in their diagnosis. On the other hand, determination of LLs for TC or LDL-C RIs are essential for diagnosing malnutrition or thyrotoxicosis. Therefore, we are not replacing the derived RIs with those CDLs, rather providing CDLs in the footnote of test result report sheet.
We obtained fasting plasma glucose (FPG) RIs of 3.9−5.8 and 4.4−7.3 mmol/L for those aged <45 and �45 years respectively. The former RI is comparable to what was obtained in Turkey (3.96−5.88 mmol/L) and Saudi Arabia (4.0−5.9 mmol/L) [12,13]. The American Diabetes Association (ADA) uses FPG � 5.6 and 7.0 mmol/L to define pre-diabetes and diabetes respectively. Based on the ADA criteria, a total of 63 out of 528 participants would have been classified as having elevated values compared to 20 if the ULs of the derived RIs were used.
Our RIs for immunoglobulins are higher than those derived in the US but very similar to those derived in India [2,15]. Karita et al also found high levels of IgG in several SSA countries      [21]. We hypothesize that this may be due to inflammation caused by increased exposure to either infectious disease agents or environmental allergens. The increased inflammation is further evidenced by the very high RI UL for CRP of 14.7 mg/L compared to 1 mg/L recommended by Beckman Coulter [22]. Ichihara et al in a similar study carried out in Asia found that the closer the country or region was to the equator, the higher the serum concentrations of positive inflammatory markers (IgG, C3, CRP) a phenomenon they ascribed to increased exposure to infectious agents [23]. Similar to the IFCC study in India, IgM was significantly higher in female participants requiring the determination of sex specific RIs [15]. We hypothesize that this could be linked to the role that estrogen plays in enhancing humoral immunity [24]. Ichihara et al has demonstrated that BMI is a major source of variation in RVs for many analytes and that the magnitude of this association varies across different populations [25]. For example, a similar change in BMI resulted in a greater decline in HDL cholesterol amongst the Japanese (r = −0.39) compared to people from Pakistan (r = −0.05). On the other hand, an increase in BMI was associated with a greater increase in ALT values (r = 0.48) amongst nonblack South Africans compared to black South Africans (r = 0.02) [25]. BMI was a source of variation for several analytes in males especially those known to be associated with the presence of metabolic syndrome such as Glu, LDL-C, and ALT. We did not partition our RIs by BMI because the influence of BMI was suppressed by use of the LAVE method.
For iron markers, we also applied LAVE to reduce the influence of latent anemia. Although it was very effective in raising their LLs, the female LL for ferritin was still lower than the WHO cutoff value of <15 μg/L for iron deficiency [26].
The strengths of our study include the deliberate recruitment of healthy individuals using a harmonized protocol to ensure pre-analytical confounders were minimized, centralized analysis of samples in an accredited laboratory with excellent internal quality control, standardization of the RIs by use of a value-assigned panel of sera and use of the LAVE procedure to reduce the influence of sub-clinical disease on the derived RIs.
The weaknesses include failure to perform infectious serology to rule out chronic infections such as HIV, HBV or HCV, and over representation of an urban population hence limiting the generalizability of our findings to a rural population.

Conclusion
According to the harmonized IFCC-C-RIDL protocol, we established RIs for 40 major chemistry and immunoassay tests from well-defined healthy Kenyan volunteers by use of the up-todate statistical methods for the first time in Africa. The LAVE method was effective in reducing the influence on RIs of latent anemia and metabolic disorders. Based on SD ratio and bias ratios, we developed a flow chart to judge the need for partitioning RVs by sex and age subgroups, which we believe is helpful for the future RI study. RIs for chemistry analytes were standardized by use of a value assigned serum panel, and thus could be shared across sub-Saharan African laboratories with similar ethnic and life-style profile. As a whole, Kenyan RIs were comparable to those of other countries participating in the global study with a few exceptions such as higher ULs for TBil and CRP.