Variability of clinical chemical and hematological parameters, immunological parameters, and behavioral tests in data sets of the Mouse Phenome Database

The use of mice as animal models in biomedical research allows the standardization of genetic background, housing conditions as well as experimental protocols, which all affect phenotypic variability. The phenotypic variability within the experimental unit determines the choice of the group size which is necessary for achieving valid and reproducible results. In this study, the variability of clinical chemical and hematological parameters which represent a comprehensive blood screen of laboratory mice, as well as of immunological parameters and behavioral tests was analyzed in data sets which have been submitted to the Mouse Phenome Database for mouse strains which are predominantly used in biomedical research. Most of the clinical chemical and hematological parameters–except of some parameters being known for their high variability–showed an average coefficient of variation (CV = standard deviation / mean) below 0.25. Most immunological parameters measured in blood samples had a CV between 0.2 and 0.4. The behavioral tests showed a CV between 0.4 and 0.6, or higher. In addition, a large range of the CV was found for most parameters/tests between and within the selected projects. This clearly demonstrates the appearance of unpredictable major interactions between genotype, environment and experiment regarding the variability of the parameters and tests analyzed.


Introduction
When carrying out biomedical research with animal models, the extent of the phenotypic variability affects the group size which has to be used in the experiment for achieving valid and reproducible results. Therefore, one or few main parameter(s) of the animal study is/are defined and used for the statistical determination of the sample size. The variability of the main parameter(s) within the animals used is predicted according to the results of own previous studies and/or published data. In the case that no reliable data exist, often group sizes of n = 5-8 animals are used in pilot studies ( [1,2] and refs. therein).

PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0288209 July 12, 2023 1 / 12 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 A comprehensive blood screen of quantitative clinical chemical and hematological parameters covers a high range of biomedical traits. Therefore, these parameters were used for the assessment of the variability of additional phenotypic parameters in laboratory mice. On the one hand immunological parameters which are also measured in blood samples were chosen for this comparative analysis. In addition, behavioral tests as a field with a usually relatively high interaction of animals, environment and experimental procedures influencing the outcome were tested ( [3] and refs. therein). The variability of parameters in these traits was analyzed in data sets of laboratory mouse strains which have been submitted to the Mouse Phenome Database (https://phenome.jax.org). Parameters with sufficient data submitted in the database were selected for this study (see "Materials and Methods" section) which may indicate the frequent use of the respective parameters in the phenotypic analysis of mice for the chosen traits. The group sizes used for the data analysis shown in the Mouse Phenome Database varied, but often were similar to group sizes which are normally used in biomedical research with mice.

Materials and methods
The following ontology terms ( For the hematological parameters, the ontology terms were as follows: hemoglobin, VT:0001588; mean corpuscular volume (MCV), no VT applied; red blood cell count (RBC), VT:0001586; white blood cell count (WBC), VT:0000217; platelets, VT:0003179. The parameters hematocrit, mean corpuscular hemoglobin (MCH), and mean corpuscular hemoglobin concentration (MCHC) were not included in the study as they are subsequently calculated by using parameters which are directly measured.
The behavioral tests listed under the category "behavior" of the compilation "phenotyping procedures, protocols" in the Mouse Phenome Database (https://phenome.jax.org) were used for the selection of the respective data sets. In addition, the tests "acoustic startle test", "gait analysis", and "grip strength" listed under the category "physiology, anatomy" were used in this study.
Each data set chosen for this study represents the coefficient of variation (CV = standard deviation / mean) of an animal group (n � 5 mice) with the same sex of one specific mouse strain. Female and male data of the same mouse strain were analyzed and counted separately.
For the traits "clinical chemistry", "hematology" and "immunology", a "project" was carried out by a research group (giving the project its name) with mice of a defined age. In a project, a number of different mouse strains was often analyzed for more than one parameter of the trait but not always with identical group sizes. For the trait "behavior", a "project" was carried out by a research group (giving the project its name) with mice of a defined age. For a given test, the research group ("project") analyzed a number of different mouse strains in almost all cases for more than one parameter but not always with exactly identical group sizes. Thus, data sets of 1-27 parameters were included for a given test (see fifth column of Table 1). Some project names delivered results for more than one trait. For each parameter analyzed, the projects selected for this study are listed in S1 Table. Parameters/tests with data from at least three different projects available in the Mouse Phenome Database were included in this study. Additionally, parameters/tests with data from only two different projects but relatively high numbers of total project data sets and/or mice were analyzed for the traits "immunology" and "behavior".
For each parameter/test, strain data sets were chosen for the study by using the following selection criteria: inbred strains (including those derived from the Collaborative Cross (CC)), F1 hybrids; no lines with newly generated alleles; no treatment; age of the mice examined: 7-26 weeks; group size: n � 5 of a given sex. The numbers of different strains analyzed in the selected projects (S1 Table) were as follows: n = 1-69 for the trait "clinical chemistry", n = 2-68 for the trait "hematology", n = 2-72 for the trait "immunology", and n = 1-62 for the trait "behavior".
Data sets with a CV � 3 were assessed as technical outliers and/or not reproducible results, and therefore excluded from the study which particularly occurred in data sets of the trait "behavior" (0.3% of the selected data sets). CVs � 2 occurred in 0.04%, 0%, 0.15% and 1% of the selected data sets for the traits "clinical chemistry", "hematology", "immunology" and "behavior", respectively. In the trait "behavior", data sets also occurred with following results: negative CV; many results with mean and/or standard deviation (SD) = 0; apparently high inter-individual variations for parameters (e.g. fecal boli count); use of semi-quantitative scores / categorical results instead of precise quantitative measurements. These data sets were excluded from this study. In addition, repetitive data sets in the identical mouse group for the same test of the trait "behavior" were excluded from this study.
Data analysis was carried out using the software program Microsoft Excel 2016 (Microsoft Corp., Redmond, WA). For sample size calculations the software R 4.0.5 was used (https://www.r-project.org).

Results
The comparison of the variability of clinical chemical and hematological parameters, immunological parameters, behavioral tests, and two anatomical control parameters was carried out by using data sets of the Mouse Phenome Database (https://phenome.jax.org) which each represent the coefficient of variation (CV = standard deviation / mean) of an animal group (n � 5) with the same sex of one specific mouse strain. Female and male data of the same mouse strain were analyzed and counted separately. The variability of a parameter/test was analyzed by determining the average CV from all data sets selected for this parameter/test (Table 1).
Some projects analyzed only one sex, and/or not the identical strains and/or not the identical group sizes for each sex. Therefore, first the analysis of a putative sex-specific variability for a given parameter/test was carried out by including only projects in the analysis which examined both female and male mouse groups for the given parameter/test (but not in every case exactly the identical strains and/or the identical group sizes). Within all four traits "clinical chemistry", "hematology", "immunology" and "behavior", no obvious correlation of CV and sex was observed ("clinical chemistry": 12 and 5 of 17 parameters showed higher CV values for females and males, respectively; "hematology": 1 and 4 of 5 parameters showed higher CV values for females and males, respectively; "immunology": 11 and 10 of 21 parameters showed  "Parameter (clinical chemistry, hematology, immunology) / test (behavior)": The parameters of the traits "clinical chemistry", "hematology" and "immunology" are listed within the trait according to their average CV of the mouse strain data sets (third column). For the trait "behavior", a given test includes data of the same mice from one up to 27 parameters (see fifth column), therefore, the tests of the trait "behavior" are listed according to their main average CV of the projects (eighth column).
"CV of mouse strain data sets, f + m": The female and male data sets (n � 5 mice) were analyzed and counted separately. For the analysis of the average CV in this column, all selected data sets were included independently in the analysis.
" n projects": Parameters/tests with data from at least three different projects available were included. Additionally, parameters/tests with data available from only two different projects but relatively high numbers of data sets and/or mice are listed at the end of the respective traits "immunology" and "behavior". "n project data sets, f + m (beh.: n parameters measured)": Female and male data sets (n � 5 mice) of the same mouse strain were analyzed and counted separately. For the trait "behavior" ("beh."), in most cases more than one parameter (n = 1-27) was analyzed in the same mouse groups for a given test.
"CV of projects, f + m": In the first step the average CV was calculated for a given project by including all data sets which were analyzed in this project. For the trait "behavior", this was done by calculating the average CV of all parameters of a test separately and then determining the project average CV. In the second step, the main average CV in this column was calculated with the overall project CVs calculated in the first step. higher CV values for females and males, respectively; "behavior": 7 and 8 of 17 tests showed higher CV values for females and males, respectively, and in two tests no project examined both sexes; chi-squared test: p > 0.05). The ranges of the differences of female and male CV of a given parameter/test (= CV ratio = CV female / (CV female + CV male)) were between 0.47 and 0.54 with an average CV ratio of 0.51 in the trait "clinical chemistry", between 0.48 and 0.52 with an average CV ratio of 0.49 in the trait "hematology", between 0.45 and 0.56 with an average CV ratio of 0.50 in the trait "immunology", and between 0.44 and 0.55 with an average CV ratio of 0.50 in the trait "behavior" (S2 Table). Therefore, all female and male data sets available were included in the subsequent study.
In the first analysis, the CVs of all selected data sets were included independently in the analysis to calculate the average CV (Table 1, column "CV of mouse strain data sets, f + m"). In the second analysis, in the first step the average CV was calculated for a given project by including all data sets which were analyzed in this project. For the trait "behavior", this was done by calculating the average CV of all parameters of a test separately and then determining the project average CV. In the second step, the main average CV was calculated with the overall project CVs calculated in the first step (Table 1, column "CV of projects, f + m").
For the trait "behavior", a third analysis was carried out by first calculating the average CV of all parameters of a test separately (but without calculating a project CV as in the second analysis), and subsequently calculating the main average CV with the CVs calculated in the first step (Fig 1, column 9). All three different analyses delivered analogous results, with only few obvious exceptions which can be explained by the respective data sets (see Discussion section). Therefore, the parameters of the traits "clinical chemistry", "hematology" and "immunology" are listed in ascending order within the trait according to their CV of the mouse strain data sets (Table 1, third column), whereas the tests of the trait "behavior" are listed according to their CV of the projects (Table 1, eighth column).
For the traits "clinical chemistry" and "hematology", most parameters showed a CV � 0.25 with the exception of uric acid, the enzyme activities alanine aminotransferase (ALT) and creatine kinase (CK), and creatinine. These parameters are known not to be highly regulated and/ or to be able to vary to a high extent. In addition, the white blood cell count (WBC) showed a CV = 0.26. For the trait "immunology", except of the parameters "lymphocytes %" (CV = 0.07) and "B cells %" (CV = 0.12) all other parameters showed a CV between 0.2 and 0.4 or higher. For the trait "behavior", all tests showed a CV between 0.4 and 0.6 or higher, except of the two tests "grip strength" (CV = 0.12) and "gait analysis" (CV = 0.24; data from only two projects available) which are listed in the Mouse Phenome Database under the category "physiology, anatomy". As controls, two anatomical parameters known to be highly reproducible are used. The parameter "body length" showed the CV = 0.03, and "relative organ weight" the CV = 0.11 ( Table 1).
The results described above were controlled by additional subsequent analyses: First, within all four traits "clinical chemistry", "hematology", "immunology" and "behavior", all selected data sets were used independently (as in the first analysis described above), and the average CV of each parameter/test was determined for the 95% and 90% range of the data sets (Fig 1,  columns 10-13 for the 95% range, columns 14-17 for the 90% range). This was done by first determining the respective range within both sex groups separately (i.e. the 2.5% and 5%, respectively, of the data sets with the lowest CVs and the highest CVs were deleted for each sex group separately), and then the average CV was calculated from the remaining data sets. Secondly, within all four traits "clinical chemistry", "hematology", "immunology" and "behavior", all selected data sets were used independently, and the average CV of each parameter/test was determined by using only data sets with animal numbers of n � 10 or n � 15 (Fig 1, columns  18-21 for data sets with n � 10, columns 22-25 for data sets with n � 15). Both control analyses delivered analogous results compared to that shown in Table 1 (Fig 1). Overall, a large range (minimum-maximum) of the CV was found for most parameters/tests among the respective strain data sets (Table 1, last four columns). The minimal CV and maximal CV of the 95% and 90% data range listed in Table 1 are the lowest CV and the highest CV of the remaining data sets for the parameter/test after the 2.5% and 5%, respectively, of the data sets with the lowest CVs and the highest CVs were deleted for each sex group separately. The large range was often caused by high CV values in one or few projects (Fig 2).
In addition, the Mouse Phenome Database (https://phenome.jax.org) also includes phenotypic data sets of one project (JAX KOMP Phenotyping Center) where a high number (a few Comparative depiction of the average coefficient of variations (CV = standard deviation / mean) for the parameters of the phenotypic traits "clinical chemistry" (CC), "hematology" (Hem), "immunology" (Imm) and "behavior" (Beh; the dot with the lowest CV always represents the test "grip strength" which is listed under the category "physiology, anatomy" in the Mouse Phenome Database). Each dot represents the average CV of a phenotypic parameter/test. Grey dots show parameters/tests where data from less than three projects were available for the analysis. Columns 1-8 represent the data which are given in detail in column 3 ("data sets") and column 8 ("projects") of Table 1. Column 9 ("Beh param") represents the analysis of the behavioral data by calculating the main average CV of a given test with the average CVs calculated for all parameters (from one up to 27) separately (not shown in Table 1). For the subsequent columns, all selected data sets were used independently within the four traits, and the average CV of each parameter/test was determined for the 95% and 90% range of the data sets (columns 10-13 and 14-17, respectively) as well as by using only data sets with animal numbers of n � 10 or n � 15 (columns 18-21 and 22-25, respectively). For the 95% and 90% range of each parameter/test, the 2.5% and 5%, respectively, of the data sets with the lowest CVs and the highest CVs were deleted for each sex group separately, and then the average CV was calculated from the remaining data sets. hundred up to few thousand animals per parameter) of 7-12 week-old C57BL/6NJ inbred mice were analyzed over a period of time. Analysis of the CVs for the chosen parameters of the four traits "clinical chemistry", "hematology", "immunology" and "behavior" from this project also delivered analogous results compared to that shown in Table 1 (S1 Fig).
In comparative studies, the group size of the test and control groups is calculated by setting the α and β values (type I and type II errors) for the analysis as well as determining the magnitude of the variation and the biological effect size of the main experimental parameter(s). The sample size calculations for a two-sided two-sample t-test for two groups (assuming normal distributions in both groups) are based on the following considerations: The effect size is Δ = (μ 1 -μ 0 ) / σ, where the standard deviation σ is either the pooled standard deviation or one assumes equal standard deviations in both groups. Defining the mean in the test group, μ 1, as a multiple k of the mean in the control group, μ 0 , i.e. μ 1 = k μ 0 , we can write Δ = (k-1) μ 0 / σ = (k-1) / CV with CV = σ / μ 0 . Therefore, the effect size Δ is now independent of the concrete values for the means and can be expressed solely in terms of the factor k and the coefficient of variation CV. Standard programs for sample size planning can then be used (https://www.rproject.org). Note that the absolute value of the effect size is equal for (k-1) and-(k-1) = 1-k, i.e. one gets the same sample size whether e.g. k = 1.2 or k = 0.8 (Table 2). Note, that we have explicitly assumed that the CV is based on the control group. If we assume equal standard deviations in both groups, the (assumed) CV of the test group is given by CV/k.
In the cases where only two animals per group are calculated as sample size, the use of at least 3-5 animals per group is indicated due to the putative extent of the variation in the results shown above also within a given inbred strain. On the other hand, calculation of high animal numbers per group may require adaption of the research strategy.

Discussion
The Mouse Phenome Database provides reference values of a high number of phenotypic parameters mostly for the inbred strains which are predominantly used in biomedical research. It is assumed that the projects providing the strain data sets have been carried out by using standardized protocols, but not especially for the comparison of the phenotypic variability between different traits. Usually group sizes up to 20 mice were analyzed thereby reflecting the group sizes which are normally used at least in fundamental biomedical research.
The use of both sexes in experiments is strongly recommended because of possible differences in the outcome [4,5]. As some projects analyzed only one sex, first the analysis of a putative sex-specific variability for a given parameter/test was carried out. Inclusion of only the projects in the analysis which examined both female and male mouse groups for a given parameter/test of the four traits "clinical chemistry", "hematology", "immunology" and "behavior" observed 31 and 27 of 58 parameters/tests showing higher CV values for females and males, respectively (chi-squared test: p > 0.05). Analogous results with no evidence for a substantial general sex-specific variability have been observed in own recent studies for clinical chemical and hematological parameters with own data from a high number of mice of one specific inbred strain [6] and for results published in the Mouse Phenome Database [7] as well as in meta-analyses concerning this topic with published mouse data for various phenotypic parameters [8][9][10].
It is assumed that the data sets have been achieved by using the housing method which is usually carried out when working with mice in biomedical research, i.e. both sexes are group housed, and females are used without regard to the stage of the estrous cycle. A meta-analysis revealed that group housing of mice increased the variability in both males and females by 37% [10]. Therefore, no sex is expected to take advantage of this housing method in respect to the extent of the variability compared to the other sex. The similar increase of the variability in group housed males and females is also expected to cover the consequences of the Lee Boot effect which leads to the suppression of the estrous cycle in group housed female mice [11]. In our study, the average CV ratio of female and male CV (CV female / (CV female + CV male))  Table 1 which also shows the number of selected projects for each parameter (see fourth column).
Analysis of the data sets by different methods delivered analogous results for the average CV of a given parameter/test shown in column 3 and column 8 of Table 1. Only few obvious exceptions with highly different results in column 3 vs. column 8 of Table 1 appeared: the parameters α-amylase and aspartate aminotransferase (AST) in the trait "clinical chemistry" as well as the tests "conditioned place preference" and "light-dark box" in the trait "behavior" which all showed higher CVs in column 8 compared to column 3. This was caused by high differences of the average CVs between the selected projects compared to the CVs of the individual data sets selected for the analysis.

Conclusions
Overall, the large range of the CV for most parameters/tests was often caused by high CV values in one or few projects. This clearly demonstrates the appearance of unpredictable major interactions between genotype, environment and experiment regarding the variability of the parameters and tests analyzed. This has to be taken into account for the prospective calculation of the experimental group sizes in animal studies. Supporting information S1 Fig. Variability of clinical chemical and hematological parameters, immunological parameters, and behavioral tests of the project "JAX KOMP Phenotyping Center". A few hundred up to few thousand animals per parameter of 7-12 week-old C57BL/6NJ inbred mice were analyzed over a period of time. The CV for each parameter was determined for the 95% and 90% range of the data sets of the female mice. The chosen parameters are as follows: "clinical chemistry" (CC, n = 14): cholesterol, creatinine, glucose, total protein, triglycerides, urea, calcium, chloride, phosphorus, potassium. sodium, AP, ALT, AST; "hematology" (Hem, n = 5): hemoglobin, MCV, RBC, WBC, platelets; "immunology" (Imm, n = 10): lymphocytes, monocytes, basophils, eosinophils, neutrophils, lymphocyte count, monocyte count, basophil count, eosinophil count, neutrophil count; "behavior" (Beh, n = 7): grip strength, hole board, light-dark box, open field, prepulse inhibition, rotarod, tail suspension. Beh: the dot with the lowest CV in both columns represents the test "grip strength" which is listed under the category "physiology, anatomy" in the Mouse Phenome Database. (TIF) S1 Author Contributions