Repeatability analysis improves the reliability of behavioral data

Reliability of data has become a major concern in the course of the reproducibility crisis. Especially when studying animal behavior, confounding factors such as novelty of the test apparatus can lead to a wide variability of data which may mask treatment effects and consequently lead to misinterpretation. Habituation to the test situation is a common practice to circumvent novelty induced increases in variance and to improve the reliability of the respective measurements. However, there is a lack of published empirical knowledge regarding reasonable habituation procedures and a method validation seems to be overdue. This study aimed at setting up a simple strategy to increase reliability of behavioral data measured in a familiar test apparatus. Therefore, exemplary data from mice tested in an Open Field (OF) arena were used to elucidate the potential of habituation and how reliability of measures can be confirmed by means of a repeatability analysis using the software R. On seven consecutive days, male C57BL/6J, BALB/cJ and 129S1/SvImJ mice were tested in an OF arena once daily and individual mouse behavior was recorded. A repeatability analysis was conducted with regard to repeated trials of habituation. Our data analysis revealed that monitoring animal behavior during habituation is important to determine when individual differences of the measurements are stable. Repeatability values from distance travelled and average activity increased over the habituation period, revealing that around 60% of the variance of the data can be explained by individual differences between mice. The first day of habituation was significantly different from the following 6 days. A three-day habituation period appeared to be sufficient in this study. Overall, these results emphasize the importance of habituation and in depth analysis of habituation data to define the correct starting point of the experiment for improving the reliability and reproducibility of experimental data.


Introduction
Animal experimentation always requires an ethical evaluation, balancing the scientific benefit and possible constraints to animal welfare. Without a doubt, the generation of high-quality research data is crucial for a better translation of experimental data to humans. Regarding the welfare, stress in laboratory animals should be limited to a minimum. Fortunately, these two goals are often working perfectly together because it is well known that data derived from nonstressed animals is of higher quality expressed, for example, in more robust and/ or more consistent scientific data [1,2].
One well known possibility to minimize stress for the laboratory animal is the prior habituation to unknown experimental equipment and/or handling procedures [1,3]. The term habituation originates from behavioral biology and describes the diminution of a response induced by constant or repeated exposure to a novel stimulus and displays a simple form of non-associative learning [4]. A distinction is made between habituation over time (intrasession) and over repeated exposures (intersession) [4]. In this original field of behavioral biology, the habituation process is the primary research objective and serves as a model for non-associative learning and memory mechanisms as well as for pharmacological studies [4][5][6]. One of the most frequently assessed parameters of habituation is a decrease in exploratory behavior measured in the Open Field (OF) test as a change in distance travelled or activity. Once the animal is familiar with the new environment, the explorative behavior is reduced and it is considered that the laboratory animal has habituated and the learning and memory process is completed [4].
Knowledge about habituation can be used to introduce a wide variety of unknown behavioral tests to the animal and to increase the intended performance in the actual test situation which then is familiar to the animal [1,3,7,8]. For example, in rodents, physical activity can be measured under voluntary or under forced condition. On the one hand, voluntary exercise has the disadvantage of very variable volume and intensity differing from animal to animal. On the other hand, rodents in forced running wheel tests show low level of performance. Habituation and training with the running wheel have been shown to improve locomotor performance in the forced running wheel system in rats and consequently increased the validity of the scientific data [3]. Although, such effects of habituation are frequently applied, the documentation in the literature is heterogeneous. In case of the OF, there are research groups which use this test without documented habituation (e.g., [9][10][11][12][13]). Some of them stated that they want to test for novelty induced activity [13]. Other researchers establish only one day of habituation to either reduce novelty induced activity or to measure the effect of habituation from trial one to trial two [14][15][16]. Frequently, experimenters place the animals 30 min prior to testing in the corresponding room or arena and define this as habituation [17][18][19]. Rarely, a previous habituation is mentioned in the literature which is carried out for more than two days [20][21][22]. If the specific research question requires testing in a known environment, there is usually no well-founded derivation of the habituation period which is stated in the literature. In fact, a strategy how to determine an effective habituation period for a certain experiment would be favorable.
The repeatability analysis is one possible method to assess the accuracy of measurements which can be normally or not normally distributed [23] and it allows identifying an explanation for the occurring variances. Repeatability describes the proportion of the total variation that is reproducible among the repeated measurements of the same group [23][24][25]. Nakagawa and Schielzeth published a practical guide for biologists to bring forward the method of repeatability analysis in the scientific community [23]. This method of data analysis is commonly employed in the field of ecology and evolutionary biology where the repeatability of morphological, physiological and behavioral traits is of special interest [26]. Some examples include the wing morphology of drosophila [27], cancer evolution [28], or mate choice for reproduction [29][30][31]. More recently, the concept of animal personality that focuses on consistent between-individual differences in behavior was investigated by applying repeatability analysis [32][33][34][35][36][37]. In this context, the repeatability between several trials on one day or successive days as well as between several independent experiments of different research groups is meant. Bell et al. showed in a meta-analysis of repeatability of independent experiments from different research groups for animal behavior that many behavioral types are more consistent within individuals than previously assumed [2]. In addition, Mazzamuto et al. underline the importance of method validation to ensure that the intended parameter is really captured [38]. Given that there is growing concern regarding reproducibility of experimental data in biomedical science [39], we are convinced that a documented validation of the habituation data is a beneficial approach in all fields of animal behavioral science to increase the reliability and reproducibility of experimental data. In the context of this manuscript, we define experimental data as reliable if the method by which it was collected is scientifically valid and robust. This means that the method should measure what was intended minimizing any disturbing influences. A mathematically justified and comprehensible explanation for the choice of a certain habituation period would strengthen the scientific outcome, as it is shown that the intended parameter can actually be captured. In addition, valuable information can be obtained from the habituation data e.g., the source of variance, which might be helpful for the interpretation of the primary experimental results.
Therefore, the aim of our study was to elucidate the putative hidden information of habituation data in combination with the repeatability analysis for research questions which requires a familiar environment as test situation (for example wheel running, nest building, burrowing behavior or baseline locomotion [3,22,40]). As an example, we investigated retrospectively the effectiveness of intersession habituation of male mice of C57BL/6J, BALB/cJ and 129S1/ SvImJ inbred strains to an OF arena as familiar environment within two different experiments carried out at the same institute (one in winter and one in summer time) [41]. For this purpose, we analyzed video data to assess locomotor parameters in the OF (distance travelled, average activity). We conducted a repeatability analysis to check the reliability of our data from trial to trial and to evaluate if the chosen habituation period was sufficient or could be reduced in future studies. In order to identify the primary source of variance in locomotion data, we examined the factors experiment/ batch, strain, and the individual animal (animal ID). All these factors are known from the literature to affect locomotion [2,42,43].

Ethics statement
The data for this study is based on two experiments which were approved by the Berlin state authority, Landesamt für Gesundheit und Soziales, under license No. G 0309/15 and G 0194/16 and were conducted in accordance with the German Animal Protection Law (TierSchG, TierSchVersV). The main purpose of these two studies was the evaluation of the analgesic effect of buprenorphine in three different mouse strains [41]. All decisions regarding strain, sex and age of the animals were made with regard to this research question. At the end of this main study, animals were sacrificed by gradual CO 2 filling according to Annex IV of Directive 2010/63/EU. Here, we exclusively present the data of the habituation period of the animals.
(n = 15) from Jackson Laboratory (Bar Harbor, USA, imported via Charles River) were obtained after weaning (three to four weeks old), to familiarize them with the housing conditions. The mice were free of all viral, bacterial, and parasitic pathogens listed in the Federation of European Laboratory Animal Science Associations (FELASA) guidelines. During familiarization time, mice were housed in groups of 5-8 animals per cage in Eurostandard type III polycarbonate cages with filter tops (Tecniplast, Hohenpeissenberg, Germany), autoclaved bedding and nesting material (LASbedding TM PG2, LASvendi, Soest, Germany), a mouse house consisting of cardboard (LBS Biotechnology, United Kingdom) and gnawing material as environmental enrichment (J. Rettenmaier & Söhne GmbH + Co KG, Rosenberg, Germany). The animals had free access to autoclaved food pellets (LASQCdiet TM Rod16, LASvendi, Soest, Germany) and water acidified with HCL (pH 2.5-3.0 to prevent growth of algae and pathogens). The room temperature was maintained at 21 ± 1˚C, with a relative humidity of 55 ± 10%. The light/ dark cycle in the room consisted of 12/ 12 h artificial light with lights on from 5.00 am to 5.00 pm in winter time and 6.00 am to 6.00 pm in summer time. All animals were handled by cupping the mouse in open hands.

Habituation
After two weeks of familiarization to the housing conditions, all mice were single housed to prevent aggressive behavior. With the start of the habituation period, mice were six to seven weeks old. On seven consecutive days, each mouse was habituated once a day to the following experimental conditions in a randomized order. Each mouse was fixated by grip in the neck to mimic the stress of a subcutaneous (s.c.) injection and was subsequently placed in a round OF arena (ø 30 cm, 40 cm height, 109 lux) for 5 min, monitored by a video camera from above. Afterwards, the mouse was placed on the warm (at 35˚C) Incremental Hot Plate (IHP, IITC Inc. Life Science, Woodland Hills, USA) for 4 min, monitored by two video cameras (frontal and lateral). Habituation and testing took place between 9.00 am to 11.00 am in winter time and 10.00 am to noon in summer time to avoid influence by circadian rhythm. On day one and six, body weight of each mouse was measured within the routine animal check as a general welfare indicator (S1 Fig). Since there are hints in literature that the sex of the experimenter might have an influence [44], the whole habituation and experiments were performed by the same female experimenter. The results from pain assessment on the IHP will not be part of this publication.

Data collection and analysis
Locomotor activity, monitored as distance travelled (cm), and average activity (%, threshold 0.1 cm/s) were assessed for every mouse for 3 min in the round OF arena and analyzed with the Viewer Software (version 4, Biobserve GmbH, Bonn, Germany). We have chosen the mean of the last three minutes of observation for the evaluation to limit the influence of possible initial stress behavior.
Statistical analysis and plotting of the graphs were conducted with the freely available software R, version 3.5.1 and R-studio, version 1.1.383 (www.r-project.org) and with the software Graph Pad Prism (version 6, Graph Pad Software, San Diego, USA). To get an overview of the data set, all data points of distance travelled and average activity were plotted with R and statistically analyzed with Graph Pad Prism using the Friedman test with Dunn's multiple comparison test compared to day one. To test the repeatability of the data and to identify the source of the variance within the data, a repeatability analysis was performed using the "rptR" library to quantify the constancy of phenotypes [23,45]. First, data sets were tested for normal distribution by Q-Q-norm-plot (S2 Fig). If data did not show substantial deviation from normal distribution the repeatability value R was calculated using a linear mixed-effect model (LMM) based on Gaussian distribution. Animal ID, strain or experiment/ batch were included as random factors, whereas, distance travelled (cm) or average activity (%) served as fixed factors. The confidence interval (CI) [2.5%, 97.5%] is based on 500 bootstrapping runs and 100 permutations. To test for statistical significance of the repeatability of distance travelled or average activity between the selected numbers of trials, the likelihood ratio test was performed. To identify the most important random factor, R was first calculated over the seven-day habituation period for all fixed factors including animal ID, strain and experiment as multiple random grouping factors. To observe the progress of repeatability over the whole time, R was calculated additionally over three adjacent measurements (days), including the random grouping factors animal ID or strain. In a second step, the identified effect covariate strain that explain only a part of the variances in the data was included to adjust the repeatability for the predominating random factor animal ID. Data and basic steps to perform the analysis are available in the supplement (S1 Text and S1 Table) in order to ensure the comprehensibility of our analysis.

Effects of habituation on OF behavior of single mice
In the initial step, the raw data of each animal were examined.
From the second day of habituation onwards, a considerable reduction of distance travelled and average activity occurred compared to day one (Fig 1). In addition, the widths of confidence intervals of distance travelled and average activity decreased from day one until reaching stable plateaus at day five (Table 1).

Repeatability analysis of explorative behavior
In a second step, a repeatability analysis was carried out in order to obtain reliable evidence for the successful habituation and additional information on the variance of the data.
In order to identify the main factor by which the observed variance in locomotion data can be explained, a repeatability analysis with multiple grouping factors was conducted over the seven-day habituation period ( Table 2). The distance travelled as well as the average activity served as fixed factors and animal ID, strain, and experiment/ batch functioned as random multiple grouping factors. The variances observed for distance travelled and average activity is primarily explained by animal ID and strain. The fact that we included data from two different experiments did not contribute significantly to the observed variances of distance travelled or average activity (Table 2).
For this reason, repeatability values for distance travelled and average activity were calculated including both, animal ID and strain as a random factor. To record changes in repeatability over the whole period of habituation, each repeatability value R was calculated over three adjacent days resulting in five groupings (S2 and S3 Tables). Within the first grouping (day 1-3) repeatability values of 0.027 (distance travelled, random factor: ID) and 0.177 (distance travelled, random factor: strain) as well as 0.199 (activity, random factor: ID) and 0.321 (activity, random factor: strain) were calculated for distance travelled and average activity, respectively (Fig 2). This indicates that only 2.7% of the variance of distance travelled data and 19.9% of the variance of the average activity data can be explained by the factor animal ID. 17.7% of the variance of distance travelled data and 32.1% of the average activity data can be explained by the factor strain. The repeatability of the results within this period is considered unlikely. In addition, most of the variance cannot be explained by definable factors and is therefore considered random noise likely induced by other factors, e.g., stress or anxiety-related behavior. From the second (day 2-4) to the fifth grouping (day 5-7) stable repeatability values of 0.556, 0.617, 0.505, and 0.498 for distance travelled (random factor: ID) and 0.634, 0.652, 0.627, and 0.604 for average activity (random factor: ID) were calculated (Fig 2A and 2B). This means that roughly 50 to 60% of the variance can be explained by individual differences between the mice (S2 and S3 Tables). In contrast, only 27.6, 22.9, 18.2, 19.0% and 40.1, 36.3, 29.7, 30.7% of the observed variance in distance travelled and average activity data, respectively, can be traced back to strain (Fig 2C and 2D, S2 and S3 Tables).
From this repeatability analysis it can be concluded that strain is an important factor, but the influence of the individually stable behavior of each mouse predominates. Hence, estimations of repeatability over the whole time were calculated again including only animal ID as random factor with adjustment for strain (Fig 3, S2 Table). The resulting adjusted repeatability values are comparable to animal ID repeatability values for distance travelled and average activity showing only slightly reduced (~15%) repeatability values.

Discussion
Measured data of each individual mouse as well as adjusted repeatability values showed a stable pattern after three days of habituation for distance travelled and average activity. Moreover, additional habituation days did not lead to a further reduction of explorative behavior or increased repeatability of investigated parameters and reached a plateau in the second grouping from day two to four. Hence, a habituation time of three days seemed to be sufficient for this experimental set up. Furthermore, the individual personality of the mouse has a greater influence on the variance of the data than the strain to which it belongs. On the basis of these results a simple strategy can be established to introduce habituation as a preliminary step in behavioral experiments which requires a known environment (Fig 4). First, plan and set up the experiment and habituate the animals to the testing situation. The simultaneous or retrospective checking of the measured data with repeatability analysis allows the determination of the optimal starting point of the experiment.  In this study, we used locomotion data measured in a familiar OF arena instead of home cage based data. In recent years, home cage RFID-based monitoring has become a promising tool to detect unbiased behavior of rodents in social groups over long periods [46,47]. In fact, it has been shown that home cage activity is repeatable over long periods of time and indicating the emergence of stable individual differences in activity [32]. After habituation to the test arena activity data in the OF also became repeatable and it would be tempting to speculate that individual differences indeed correlate between home cage and habituated OF arena. Our initial research purpose, which is not the main focus of this manuscript, was to investigate the analgesic effect of buprenorphine in three different mouse strains [41]. In this context, we aimed at the comparison of baseline locomotion with the locomotion after buprenorphine administration. Individual tracking in separate cage systems or arenas is a common practice e.g., to study the influence of different xenobiotics [20][21][22]. In this study, we analyzed a relative short time frame of three minutes due to the restricted time frame to prevent putative influences of the circadian rhythm. To ensure that we really detected baseline locomotion in the OF arena we introduced the longer habituation period of seven consecutive days and analyzed it  with the repeatability analysis. However, data from home cage based observations would have been equally suitable for this kind of analysis. In addition, longer observation periods would have been proficient for the analysis of different relevant time frames.
In this study, a direct measurement of stress-related behavior or stress hormones was not performed. As reduction in body weight can give a hint of the general well-being of the animal, we measured the body weight on day one and six of habituation as part of the routine checkup. No loss of body weights was detected and, hence, we conclude that multiple habituation days to the least did not influence this general welfare parameter. Comparable to this result, Bodden et al. investigated the impact of repeated versus single open-field testing on welfare in C57BL/6J mice and also found no body weight changes or other signs of compromised welfare [48]. The anxiety of an animal is usually connected to time spent in the center and time spent  nearby the wall of an OF [4]. We have not assessed these parameters because of the small size of the OF used and for this reason cannot comment on the anxiety behavior of the animals. In our study, we were able to demonstrate that the habituation approach was successful in decreasing the distance travelled and average activity.
Furthermore, we could show that repeatability analysis is a powerful tool to assess reproducible animal behavior and to identify the source of variance within the measured data [23]. Even if the variance of the data cannot be reduced completely, knowing the sources of variability allows a much better interpretation of behavioral data. In connection with the reproducibility crisis, Baker reported that more than half of the researchers participating in a Nature's survey have failed to reproduce their own experiments [39]. The two experiments used here as examples were carried out with an interval of nine month. Our analysis did not reveal any significant effect of time of experimentation, indicating that our approach yielded reproducible results. Even if the variance of locomotion data can be only reduced to a certain extent, the variance becomes explainable from the second day on. Based on the results of the repeatability analysis, the influence of data from two different experiments could be excluded as the source of variation within this data set. For this reason, we were able to pool the data and concentrated on the other two scientifically relevant factors, namely strain and the individual animal.
Interestingly, within the first grouping (day one to three) the variance of the data could not be traced back to either strain or the individual animal and is therefore more likely induced by unknown other factors, e.g., stress or anxiety. Only after the second grouping (day two to four) reliable parameters have been identified by the repeatability analysis as the source of variance. From this we conclude that, at the minimum, one day, better three days of habituation would be advisable for this or comparable set ups to be able to observe reproducible behavior. Dealing with reproducible behavior, on the other hand, supports a reliable interpretation of the variance of the data with regard to contribution of strain or individual animal personality.
Mouse strain comparisons are commonly implemented in behavioral and genetic studies with widely use of C57BL/6 and BALB/c mice [49]. However, previous reports on explorative behavior of C57BL/6J and BALB/c mice are inconsistent [49][50][51][52]. C57BL/6J mice were found to be engaged in more exploratory activity than BALB/cJ mice [50,51]. In contrast, other studies reported higher anxiety levels associated with lower activity in C57BL/6 than BALB/c mice [53]. An and colleagues reported no differences in locomotor activity between male C57BL/6J and BALB/cJ mice in a novel cage observation test [50]. In our data set only a small percentage of the variance in explorative data could be traced back to strain. From this we concluded that strain differences were not the primary source of variance using these three strains even though they derived from two different breeding facilities which could have added additional variation. The large confidence interval might be explained by the fact that only three different strains have been used.
Another possible explanation is that there is a wide variation in the individual personality of mice, regardless of the strain. In recent years, behavioral studies focused more on consistent between-individual differences giving evidence that the animal personality in addition to strain is a crucial factor to explain variance of measurements [32][33][34][35][36]54]. This assumption is supported by a meta-analysis for animal behavior of Bell and colleagues. They demonstrated that many behavioral types are more consistent within individuals than previously assumed [2]. Interestingly, such individual differences are also found in genetically homogeneous inbred mice housed under highly standardized conditions. This indicates that individual differences tend to escape standardization approaches and might reflect an evolutionary intrinsic value that perpetuates inter-individual differentiation [54]. In addition, Brust and colleagues showed that the personality of a mouse stabilizes with age and is highly repeatable over their lifespan [32]. In our study, we observed highly consistent between-individual differences for distance travelled and average activity from the second grouping onwards. Interestingly, the individual differences between the mice, i.e., their "animal personality" [54] was identified as the main contributor to variability in explorative behavior in the repeatedly conducted OF test. However, if only a single OF test was conducted, the variability in the data could not have been fully explained as the results of the OF test were only reasonably repeatable after habituation. Consequently, testing without habituation might have had a considerable effect on the recently found lack of reproducibility, known as the reproducibility crisis. In accordance to these findings, Chappell and colleagues observed variable locomotion behavior but highly repeatable between-individual differences in mice [55]. Integrating the concept of animal personality not only offers the possibility to improve the interpretation and thus the quality of scientific data, but also to enhance animal welfare science by solving welfare problems on an individual level [56].
Beside this investigated influencing factors, i.e., two different experiments, three different strains, and the personality of the individual animal, other factors are also important to consider in such kind of analysis. Since this study was a side project of a main explorative research study dealing with the analgesic effect of buprenorphine in different mouse strains we primarily focused on male mice [41]. However, scientific data from literature show that behavioral patterns can differ tremendously between male and female mice [57]. Hence, investigating the influence of different sexes on specific parameters compared to the individual mouse behavior would be of special interest. In addition to include more influencing factors, the use of this repeatability analysis to investigate other behavioral parameters within different behavioral set ups is promising. To keep this study as comprehensible as possible, we used locomotion data, e.g., distance travelled and average activity, of an OF test as an example. However, we also tested this method on the following parameters and provided them as supporting information: number of ambulations and average velocity in the OF, rearing and stretched sniffing behavior on the IHP (S3 and S4 Figs, S4-S6 Tables).

Conclusion
We demonstrated a simple strategy to improve the data quality of those behavioral experiments requiring a familiar environment, such as wheel running, nest building, burrowing behavior or baseline locomotion. This was realized by implementing the repeatability analysis method to prove successful habituation as well as explaining underlying factors for variability in the data. While the overall variability is not necessarily reduced, detailed analysis of habituation data reveals the sources of variability. Furthermore, analyzing the habituation data by means of a repeatability analysis helps to determine the optimal start point of an experiment. In our data, the variances observed in the behavior patterns were predominantly due to the individual personality of the mice rather than a strain dependent effect. Behavioral data was repeatable after successful habituation and calculated repeatability values reached a plateau after only a few habituation trials. Hence, as a refinement strategy, the habituation period in comparable experimental designs can be reduced to three days in future experiments. Taken together, this exemplary study underlines the importance of publishing and evaluating habituation data for all kind of behavioral experiments. Importantly, declaration of the effectiveness of habituation procedures could furthermore enhance the validity of experimental data and fortify reproducibility.  Table. Repeatability values for animal ID as random factor with and without adjustment for the factor strain for distance travelled and average activity. Each repeatability value (R) was calculated over three adjacent days resulting in five groupings. For every factor R, the [2.5%, 97.5%] confidence intervals (CI) and p-values calculated by likelihood ratio test were displayed (n = 38 C57BL/6J, n = 15 BALB/cJ and n = 15 129S1/SvImJ male mice). Estimation of repeatability was conducted with a linear mixed-effect model based on Gaussian distribution. The CI resulted from 500 bootstrapping runs and 100 permutations. (PDF) S3 Table. Repeatability values for strain as random factor for distance travelled and average activity. Each repeatability value (R) was calculated over three adjacent days resulting in five groupings. For every factor R, the [2.5%, 97.5%] confidence intervals (CI) and p-values calculated by likelihood ratio test were displayed (n = 38 C57BL/6J, n = 15 BALB/cJ and n = 15 129S1/SvImJ male mice). Estimation of repeatability was conducted with a linear mixed-effect model based on Gaussian distribution. The CI resulted from 500 bootstrapping runs and 100 permutations. (PDF) S4 Table. Repeatability values with multiple grouping factors: animal ID, strain and experiment for average velocity, ambulations as well as rearing and sniffing behavior. For every factor, the repeatability R, the [2.5%, 97.5%] confidence intervals (CI) and the p-values, calculated by likelihood ratio test, were displayed over the seven-day habituation period for average velocity and number of ambulations and over a six-day habituation period for rearing and sniffing behavior (n = 38 C57BL/6J, n = 15 BALB/cJ and n = 15 129S1/SvImJ male mice). Estimation of repeatability was conducted with a linear mixed-effect model based on Gaussian distribution for average velocity and number of ambulations and with a generalized linear mixedeffect model based on Poisson distribution for rearing and sniffing behavior. The CI resulted from 500 bootstrapping runs and 100 permutations. (PDF) S5 Table. Repeatability values for animal ID as random factor for average velocity, ambulations as well as rearing and sniffing behavior. Each repeatability value (R) was calculated over three adjacent days resulting in five groupings for average velocity and number of ambulations and in four groupings for rearing and sniffing behavior. For every factor R, the [2.5%, 97.5%] confidence intervals (CI) and p-values calculated by likelihood ratio test were displayed (n = 38 C57BL/6J, n = 15 BALB/cJ and n = 15 129S1/SvImJ male mice). Estimation of repeatability was conducted with a linear mixed-effect model based on Gaussian distribution for average velocity and ambulations and with a generalized linear mixed-effect model based on Poisson distribution for rearing and sniffing behavior. The CI resulted from 500 bootstrapping runs and 100 permutations. (PDF) S6 Table. Repeatability values for strain as random factor for average velocity, ambulations as well as rearing and sniffing behavior. Each repeatability value (R) was calculated over three adjacent days resulting in five groupings for average velocity and number of ambulations and in four groupings for rearing and sniffing behavior. For every factor R, the [2.5%, 97.5%] confidence intervals (CI) and p-values calculated by likelihood ratio test were displayed (n = 38 C57BL/6J, n = 15 BALB/cJ and n = 15 129S1/SvImJ male mice). Estimation of repeatability was conducted with a linear mixed-effect model based on Gaussian distribution for average velocity and ambulations and with a generalized linear mixed-effect model based on Poisson distribution for rearing and sniffing behavior. The CI resulted from 500 bootstrapping runs and 100 permutations. (PDF) S1 Text. R-script for the repeatability analysis.