Effect of Cage-Induced Stereotypies on Measures of Affective State and Recurrent Perseveration in CD-1 and C57BL/6 Mice

Stereotypies are abnormal repetitive behaviour patterns that are highly prevalent in laboratory mice and are thought to reflect impaired welfare. Thus, they are associated with impaired behavioural inhibition and may also reflect negative affective states. However, in mice the relationship between stereotypies and behavioural inhibition is inconclusive, and reliable measures of affective valence are lacking. Here we used an exploration based task to assess cognitive bias as a measure of affective valence and a two-choice guessing task to assess recurrent perseveration as a measure of impaired behavioural inhibition to test mice with different forms and expression levels of stereotypic behaviour. We trained 44 CD-1 and 40 C57BL/6 female mice to discriminate between positively and negatively cued arms in a radial maze and tested their responses to previously inaccessible ambiguous arms. In CD-1 mice (i) mice with higher stereotypy levels displayed a negative cognitive bias and this was influenced by the form of stereotypy performed, (ii) negative cognitive bias was evident in back-flipping mice, and (iii) no such effect was found in mice displaying bar-mouthing or cage-top twirling. In C57BL/6 mice neither route-tracing nor bar-mouthing was associated with cognitive bias, indicating that in this strain these stereotypies may not reflect negative affective states. Conversely, while we found no relation of stereotypy to recurrent perseveration in CD-1 mice, C57BL/6 mice with higher levels of route-tracing, but not bar-mouthing, made more repetitive responses in the guessing task. Our findings confirm previous research indicating that the implications of stereotypies for animal welfare may strongly depend on the species and strain of animal as well as on the form and expression level of the stereotypy. Furthermore, they indicate that variation in stereotypic behaviour may represent an important source of variation in many animal experiments.


Introduction
Stereotypies are commonly defined as repetitive and invariant behaviour patterns without apparent goal or function [1,2]. They are prevalent in many captive species, including laboratory rodents [1][2][3]. Stereotypies are thought to reflect impaired welfare [2], as they usually develop in barren housing conditions [4][5][6]. Various (not necessarily mutually exclusive) mechanisms have been invoked to explain their development, including a lack of sensory and motor stimulation [2], chronic thwarting of highly motivated behaviour [2,3,7,8], attempts to cope with adverse environments [9], and central nervous system dysfunction [10,11]. However, attempts to link stereotypic behaviour with physiological or behavioural indicators of impaired welfare have produced mixed results. Jumping and bar-mouthing in laboratory mice, for example, have been suggested to develop from attempts to escape the cage [3,7,12]. Furthermore, bar-mouthing level has been found to correlate positively with corticosterone levels at weaning [13], a physiological measure of stress. However, in another study [7], no correlation with corticosterone levels was observed.
Most research so far has focused on behavioural and physiological measures of welfare. These are, however, difficult to interpret in terms of affective state, as they can be confounded by arousal and are therefore not reliable measures of affective valence (whether the animal is experiencing a positive or negative affective state) [14,15]. Studies in human psychology have shown that the valence of affective states influences cognitive processes, such as judgement, expectation, memory and attention [16][17][18]. For example, people in negative affective states tend to interpret ambiguous stimuli in a negative way, displaying a negative cognitive bias, while people in positive affective states display a positive cognitive bias [17,18].
Cognitive biases are sensitive to both short and long term changes in affect [16], and while various types of cognitive biases (judgement, memory and attention bias) have been investigated in humans, the most common type studied in non-human animals has been judgement bias (for review see [14]).
Judgement bias tasks have been implemented and validated across a number of animal species using short term experimental manipulations to alter affective state [19][20][21][22]. Based on this research, increasing evidence suggests that judgement bias can be used as a proxy measure of affective valence. However, the few studies employing judgement bias tasks to investigate the relation between the expression of stereotypic behaviour and affective states have produced conflicting results. For example, back-flipping in starlings has been associated with a negative cognitive bias [23]. In contrast, grizzly bears with higher levels of pacing displayed positive cognitive bias [24]. Furthermore, in capuchin monkeys only some forms of stereotypic behaviour (e.g., head twirls) were correlated with negative cognitive bias [22]. Similar inconsistent results have been found within species, such as in the common laboratory mouse, Mus musculus. While mice with higher overall stereotypy levels displayed a positive cognitive bias, this relation seems to depend on stereotypy form as, for example, no such relation was found with stereotypic bar-mouthing [25]. On the other hand, a spatial exploration task to assess judgement bias recently found that CD-1 mice with higher stereotypy levels displayed a negative judgement bias, but this result may have been confounded by the form of stereotypy performed [26].
Two possible reasons may explain these discrepancies between studies: differences in the types of tasks used and/or differences in the reinforcer value [14]. For example, in the exploration based task [26], a more aversive negative outcome (light on, white noise) was used compared to non-exploration based tasks using food-based positive and negative reinforcers [22][23][24][25], potentially increasing the likelihood of observing a more negative bias. However, conflicting results in non-exploration based studies [22][23][24][25] suggest that effects of stereotypy levels on cognitive biases may also depend on the form of stereotypy performed.
Studies relating the expression of stereotypic behaviour to measures of brain function also yielded conflicting results. Stereotypies have been related to impaired inhibitory control of behaviour [27,28], resulting in poor extinction learning [27,[29][30][31], reversal learning [32,33], and other forms of perseverative responding [34,35]. These changes have been linked to an imbalance in the modulation of the direct and indirect pathways in the basal ganglia by dopamine [36], and may be a consequence of barren housing [37]. Current evidence indicates that stereotypies reflect a form of behavioural disinhibition termed "recurrent perseveration" [11,28,38]. Recurrent perseveration refers to the inappropriate repetition of a response to a stimulus [39,40], and positive correlations between stereotypy levels and recurrent perseveration have been found in a number of species, including blue tits [27], parrots [34], Malayan sun bears and Asiatic black bears [29,41], horses [30], bank voles [11], and mice [42]. Overall, animals with high levels of stereotypy show a strong tendency to repeat behavioural responses, and the fact that this relationship appears across a wide range of species implies a common underlying neural mechanism.
However, the evidence linking expression levels of stereotypic behaviour with perseverative responding is ambiguous. In laboratory mice for example, overall stereotypy level was positively correlated with measures of recurrent perseveration in C57BL/6 [42], but not CD-1 mice [4,5,43]. Similarly, studies in birds found that route-tracing and oral stereotypies in songbirds [27] and parrots [34], but not back-flipping and route-tracing in starlings [44], reflect recurrent perseveration. Furthermore, in mink [35,45], deer mice [32] and non-human primates [31,33] only some stereotypies, but not others, were found to correlate positively with recurrent perseveration. Such inconsistencies could be due to different tasks used in the measurement of recurrent perseveration. Many studies have used a two-choice guessing task, which requires a simple response to a stimulus [5,27,35,42,44], while other studies have implemented extinction learning [4,27,31,43], which may be affected by other processes (e.g., learning, stuck-in-set perseveration [11,27]). However, positive correlations have been found using both of the above mentioned tasks, which indicates that the relation (or lack thereof) between stereotypy and perseveration may depend on the form of stereotypy.
Animals reared under barren housing conditions tend to display elevated levels of perseverative behaviour [4,32,45] (but see [43] and [35]) compared to animals reared under enriched conditions. Similarly, wild caught striped mice are less perseverative than captive reared mice [46], supporting the hypothesis that higher levels of perseveration may be linked to poor laboratory housing conditions. However, few studies have investigated the relation between perseverative behaviour and affective states. Impaired decision making and impaired behavioural control are sources of frustration in humans [39,47] and could possibly be sources of frustration in animals as well. For example, the level of head twirling in non-human primates was not only correlated with recurrent perseveration [31,33] but was also linked to negative cognitive bias [22]. Conversely, while back-flipping starlings displayed a negative cognitive bias [23], this stereotypy was not associated with recurrent perseveration [44]. Similarly, studies comparing recurrent perseveration with indicators of frustration (motivation to gain access to enrichments and corticosterone levels) in CD-1 mice found no link between these two measures [43].
Taken together, current evidence linking the expression of stereotypic behaviour to the valence of affective states is ambiguous. The same is true for evidence linking the expression of stereotypies to measures of impaired behavioural control. These inconsistencies may at least in part be explained by different forms of stereotypy, which may involve different motivational factors relating to different underlying mechanisms and may thus also have different welfare implications. Several authors have recognised the importance of differentiating between different stereotypy forms when examining the underlying mechanisms [2,11,31,32,45], and as discussed above, different conclusions have been reached when different forms of stereotypy were considered.
Most studies in rodents have described stereotypies as a homogenous group of abnormal behaviours when exploring their relation with measures of impaired behavioural inhibition and affective state [4,5,26,42,43]. In the present study we therefore evaluated in two strains of mice, the relation between the form and level of stereotypic behaviour to variation in measures of cognitive bias and recurrent perseveration. To measure cognitive bias, we used an exploration based cognitive bias task, previously used in rats [48] and mice [26]. Mice were trained on a spatial discrimination task, where two arms in a radial maze predicted a positive outcome, and the two opposite arms predicted a negative outcome. After the training session, mice were given access to the previously unavailable intermediate arms. We hypothesised, that if some stereotypies reflect negative affective states, mice with higher levels of those stereotypies should display a more negative cognitive bias by avoiding ambiguous arms. To measure recurrent perseveration, the mice were tested in a two-choice guessing task [34]. Similarly, we hypothesized that if some stereotypies reflect recurrent perseveration, mice with higher levels of those stereotypies should generate a more perseverative pattern of responding.

Material and Methods
Animals and husbandry 80 female CD-1 and 80 female C57BL/6 mice were purchased in two replicates (40 mice of each strain) two months apart from Harlan Laboratories (Netherlands), at three weeks of age. They were randomly assigned to Type II cages (22 x 16 x 14 cm, Techniplast) in pairs, with wood chips (Lignocel select, Rettenmaier & Söhne GmbH, Germany) as bedding but no nesting material, as nesting material considerably attenuates stereotypic behaviour [6,49,50]. Food (Kliba Nafag #3430, Provimi Kliba AG, Switzerland) and water were provided ad libitum, and animals were kept on a reversed 12:12 hour dark:light cycle, with lights off at 09:00 h.

Experimental design
Previous studies using different tasks found that stereotypy level was positively correlated with measures of recurrent perseveration in C57BL/6 mice [42] but not in CD-1 mice [5,43], therefore these two strains were used in the present study. For unknown reasons, one CD-1 and one C57BL/6 mouse died before the onset of data recording, so those cages were excluded from the experiment. The remaining 154 mice were screened for the expression of different forms of stereotypic behaviour at week 26 of age. All cages were recorded for two consecutive days and videos were screened using continuous observations to assess the form (but not level) of stereotypic behaviour performed, based on our previously validated ethogram [25] (Table 1).
Based on the stereotypy forms observed at the screening phase, 60 mice per strain were chosen for testing, so that all stereotypy forms observed were included in the sample at the onset of testing. Mice were tested for recurrent perseveration in a two-choice guessing task from week 30-33 of age, followed by a cognitive bias task on a radial arm maze at 34 weeks of age. After testing, home cage behaviour was recorded again (at 35 weeks of age), to assess individual levels of expression of the different forms of stereotypic behaviour (Fig 1).
The number of mice displaying each stereotypy form recorded at screening and after testing are listed in S1 Table. Since the number of mice performing cage-top twirling with bar-mouthing (CT-BM) and back-flipping (BF) at the time of screening were low, all individuals performing those stereotypies were chosen for testing. In mice performing no stereotypy (NS) and barmouthing (BM), 18 and 20 mice, respectively, were chosen randomly (using a computer generated random sequence). Mice remained housed with the same cage-mates throughout the study. Therefore, in some cases both mice from the same cage were tested and in other cases only one mouse from the cage was tested.
Some mice which displayed BM or CT-BM during the screening phase, displayed NS at the time of testing and vice versa (which only became apparent after testing, at the time of stereotypy recording), resulting in an unequal number of mice in those two groups. In C57BL/6 mice, 20 mice from NS, route-tracing (RT) and route-tracing with bar-mouthing (RT-BM) were chosen randomly for testing, but most mice performed RT-BM at the time of testing.

Home cage behavioural observations
Home cage behaviour was recorded using IR cameras (VC Videocomponents GmbH, Germany). For individual recognition, one mouse per cage was marked one day before the start of home cage recording, using a permanent marker (Edding 500) while the cage-mate was sham marked. From both days of video recording, the mice were observed for the first 15 min of the 2 nd , 3 rd , 4 th and 5 th hour of the dark phase. Behaviour was sampled using one-zero sampling with 15 s intervals [5,43,51], yielding 480 data points per mouse across the two observation days. The ethogram used for behavioural recording is presented in Table 1. The level of each form of stereotypic behaviour was assessed as a proportion of active time (where active time was calculated as proportion of observed time).

Guessing task
We used the same experimental protocol as described by Garner et al. [42] and Gross et al. [4] to measure recurrent perseveration. In contrast to the above studies, which used 100 response sequences, we only used 80, as previous pilot studies confirmed that this was sufficient for   Table 1. Ethogram for the recording of home cage behaviour.

Category
Name Definition General activity Inactive Sitting or lying motionless throughout the 15 s interval, occasionally interrupted by brief single twitches lasting no longer than 5s.

Active
All activities except the stereotypic activities listed below.

Stereotypic behaviour
Bar-mouthing Chewing on a bar with the bar held in the gap between incisors and molars (diastema) while hanging on the cage lid (with all four paws or the forepaws only) or standing on the hind legs. Bar-mouthing may be performed on the spot or by moving along the bar while chewing.

Circling
Running around the cage in circles.
Cage-top twirling Spinning around the longitudinal body axis while hanging on the cage lid with the forepaws.
Back-flipping Backward flip from one cage wall or the food rack towards the opposite cage wall, with or without touching the cage lid and/ or the opposite cage wall during the flip.

Route-tracing on the cage-lid
Moving along the same route on the cage lid with all four legs.
Behaviour patterns were considered as stereotypic if the same movement sequence was repeated continuously for at least 3 s (bar-mouthing) or at least three times in a row without pauses longer than 3 s between bouts (cage-top twirling, back-flipping, route-tracing, circling).
doi:10.1371/journal.pone.0153203.t001 analysis. A previous study found that CD-1 mice tended to display high rates of spontaneous alternation (LRLR and RLRL) when tested on a guessing task in a T-maze [4], possibly due to a natural tendency of mice to spontaneously alternate in spatially oriented tasks [52,53]. To avoid a possible confound of alternation on measures of perseveration, we used an apparatus with two adjacent goal compartments (Fig 2) that has been shown to eliminate spontaneous alternation in both CD-1 [25] and C57BL/6 strains [42].
Apparatus. The apparatus consisted of a box made of black plastic measuring 20 x 50 cm (height: 15 cm), which contained a start box (10 x 10 cm) and two goal compartments (10 x 20 cm), each containing a goalpot (Fig 2).
Test procedure. Mice were food restricted for the duration of the task. Starting three days before the onset of testing, mice were fed a reduced amount of food once a day (3-4 g of food per day/per cage). Subjects were weighed daily to ensure that their body weight was maintained at about 90% of their body weight when fed ad libitum.The task was conducted under red light, between 10:00 h and 14:00 h. The test order of cages was randomized daily using a computer generated random sequence, and the two mice from the same cage (where applicable) were tested at the same time and in the same room, each by one experimenter. If a mouse did not perform the task, it was put back in the home cage and tested at the end of the session. In case an animal's weight dropped below 85%, it was put in a separate cage and fed ad libitum for 30 minutes. Rewards used in the task were 20 mg chocolate flavoured pellets (Dustless Precision Pellets, Bio-Serv™). In all trials, both goalpots contained an inaccessible pellet at the bottom which was covered with wire mesh and served as control for odour cues. Between mice, but not between trials, the apparatus was cleaned with a 70% ethanol solution.
Habituation to reward: One week prior to testing, mice were given chocolate flavoured pellets in their home cage daily (four pellets per cage), to reduce neophobia and to habituate them to the food reward.
Habituation to apparatus: On day one, each mouse was placed in the apparatus for ten minutes in a pre-specified random order. Both goalpots were present and both contained one chocolate pellet. Chocolate pellets were also scattered throughout the apparatus (except in the two goal compartments).
Shaping: On day two each mouse received 12 training trials in which both goalpots were baited. As soon as the mouse entered one compartment, access to the other compartment was blocked by closing the guillotine door. If the mouse chose the same side three times in succession, that side was closed in the following trial to avoid shaping the mouse to one side. A trial was completed when the animal's head (nose) was above the goalpot, after which the animal was left to eat the reward. The mouse was then returned to the start box by the experimenter and the next trial begun.
Testing: The test phase consisted of 80 trials conducted over a maximum of three sessions. For each trial, the start box door was opened and once the animal had made a choice, the other compartment was closed. The animal was left to eat the pellet and then returned to the start box. Each session was terminated after 30 minutes or as soon as the mouse started showing offtask behaviour (cf., [11]). On each trial, only one compartment was baited, with a probability equalling the proportion of responses to the other side in the previous twenty trials. In trials 1 to 19, the side bias was calculated from all previous trials (side bias was measured as probability of choosing the right goalpot). This randomization procedure was used to eliminate side biases which may confound the experimental paradigm and was determined by a custom written computer program [27]. Although reward side is unpredictable, choosing each side equally often will maximize the number of rewards. The mouse can do so by producing either a random or patterned sequence of responses. Patterned sequences (which show high sequential dependence), can be apparent as either a series of repetitions or alternations, and indicate recurrent perseveration [38,42,54].
Outcome measures. Perseveration score (logit[P]) was used as the primary outcome measure of recurrent perseveration. The score is calculated using 3 rd order Markov chain analysis [34], which describes the probability of a behaviour occurring as a function of previous behaviour (where the 3 rd order considers the three previous behavioural responses) and provide a way to assess sequential independence. These analyses were performed by a custom written computer program which calculated the observed and expected probabilities of each choice. Then the sum chi-square was calculated from the observed and expected values. The probability of each sum chi-square (p) indicates the probability of sequential independence of the observed sequence. Therefore, recurrent perseveration was calculated by (1−p), where 1 represents a completely perseverant sequence and the data were logit transformed (logit[P]).
Numbers of pure repetitions (RRRR, LLLL) and pure alternations (RLRL, LRLR) were considered as secondary outcome measures of a non-random search strategy. Distribution of tetragrams (sequences of four trials) was examined by dividing each response sequence consisting of 80 trials per mouse into 77 overlapping tetragrams. 16 configurations of tetragrams were possible, of which two were pure repetitions and two were pure alternations. A random search strategy would be characterized by an equal distribution of all possible configurations (77/16 = 4.8), whereas perseverative behaviour should result in sequences characterized by higher rates of alternations or repetitions [4]. Thus, the frequencies of repetitions and alternations were counted for each subject and compared between mice with different forms and levels of stereotypy.

Cognitive bias task
Apparatus. The cognitive bias task was implemented using an eight arm radial maze (Med-Associates Inc. ; Fig 3). Each arm was 46 cm long and 9 cm wide and the central arena was 28 cm in diameter. The bottom of the maze was backlit with infrared light which eliminated tracking errors associated with automated tracking [55]. A computer with Ethovision XT software (Noldus, Version 9) recorded the animal's movement in the maze via a video camera equipped with an infrared pass filter, and automatically activated contingencies when the animal entered an arm or the end of an arm. The detection settings for Ethovision XT were selected so that both the percentage of samples in which the subject was not found and the percentage of samples skipped were less than 1% per trial. For both training and testing, the time spent in each arm and the number of arm entries was automatically recorded.
Test procedure. Training: Mice were trained for one ten minute session each day for five consecutive days starting at 10:00 h. The order of cages was randomised daily and the order of training of the two mice per cage was reversed on alternate days. During training, only four reference arms-two pairs of adjacent arms opposite to each other-were open; two positive arms and two negative arms (Fig 3). The remaining four arms were closed during the training sessions. Each session started with the overhead light on (400 lux). Reaching the ends of the positive arms turned the overhead light off either until the mouse entered a negative arm or until it exited a positive arm and stayed in the central arena for 20 s. Reaching the end of the positive arms also activated a pellet dispenser, dispensing a 20 mg chocolate flavoured pellet (the same type as used in the guessing task). When entering negative arms, the overhead light came on and stayed on until the animal entered the end of a positive arm. Entering the end of negative arms also triggered a burst of white noise which remained on until the animal exited the negative arm.
Testing: After five days of training, the mice were tested for responses to ambiguous arms. The test session was identical to the training sessions, with the exception that all eight arms were now available for exploration. Contingencies for reference arms remained the same as during training, while entering and reaching the ends of the ambiguous arms activated no contingencies. Both during training and testing, number of arm entries and time spent in each arm were automatically recorded by Ethovision XT.
Outcome measures. Time spent in arms is presented as relative to trial duration, and number of arms entered as relative to number of all arm entries. Since time spent in arms and number of arm entries was positively correlated for all arms, time spent in arms was used as the primary outcome measure of arm preference in the analysis. When an effect of stereotypy on time spent in arms was observed, we additionally looked at the number of arm entries. For the comparison of time spent in positive and negative reference arms, we calculated a "positive arm score" by dividing the difference between the time spent in positive arms minus time spent in negative arms by the time spent in all reference arms. To compare visits to reference arms and ambiguous arms, we calculated a "reference arm score" by calculating the difference between the time spent in reference arms minus the time spent in ambiguous arms, divided by the time spent in all arms. Furthermore, we calculated an "ambiguous arm score" by calculating the difference in times spent in near positive arms and near negative arms divided by the time spent in all ambiguous arms. A higher reference score simply indicates that animals spent more time in reference arms and less time in ambiguous arms and does not allow for clear interpretation whether this difference was due to preference for reference arms or active avoidance of ambiguous arms. Therefore, our use of the term "avoidance" throughout the manuscript is equivalent, yet complementary, to the term preference. Number of all arms entered was used as a measure of activity and overall exploration in the radial maze.

Ethical statement
This study was carried out in strict accordance with the recommendations in the Animal Welfare Ordinance (TSchV 455.1) of the Swiss Federal Food Safety and Veterinary Office. It was approved by the Cantonal Veterinary Office in Bern, Switzerland (Permit Number: BE12/12).

Statistical analyses
All statistical analyses were performed with R (version 2.15.3) and R Studio (version 0.98.507). The function lmer in the R package "lmer4" and "lmerTest" was used to fit linear mixed effects models [56] and P-values below 0.05 were considered significant for all analyses. The assumptions of normally distributed errors and homogeneity of variance were examined graphically with the use of the Normal plot and the Tukey-Anscombe plot. To satisfy these assumptions, level of stereotypic behaviour was square-root transformed. Stereotypy level is reported as proportion of active time. Results shown are untransformed means ± SEM. Data for each strain were analysed separately.
For the radial maze data, training and test data were analysed for each of the previously mentioned outcome variables. For test data, stereotypy form, stereotypy level, and the interaction between stereotypy form and level were included in the model as predictors with individual nested within cage as a random effect. Additionally, for training data in the radial arm maze, session and the interaction between session and stereotypy form, stereotypy level and stereotypy form x level were included as predictors in the model. In all analyses non-significant predictors were excluded stepwise (starting with interactions then non-significant main effects) to produce a final model. Bonferroni corrected post hoc tests were used to probe significant main effects and interactions.

Missing data
For unknown reasons, two C57BL/6 mice died in the course of the study. In the guessing task, three CD-1 and six C57BL/6 mice showed off task behaviour in the shaping period and did not complete the 12 shaping trials. They were excluded from the analysis. Furthermore, due to a technical failure, guessing task data from eight CD-1 and eight C57BL/6 mice were lost. In the cognitive bias task, four CD-1 mice performed circling behaviour in the radial maze and never performed the task. Mice with missing data for either test were excluded from the analysis. Finally, two C57BL/6 mice were excluded after recording the expression of stereotypic behaviour as they only performed RT compared to all other mice which performed RT-BM, resulting in a final sample of 44 CD-1 mice and 40 C57BL/6 mice.

Expression of stereotypic behaviour
Of the 44 CD-1 mice, eight mice performed NS, 20 mice performed BM, nine mice performed BF, and seven mice performed both CT and BM (CT-BM) (Fig 4). Among these mice, levels of CT and BM were not correlated (r = 0.10, P > 0.05, df = 6, controlling for cage and replicate). To analyse the effect of form of stereotypy on outcome measures, CD-1 mice were therefore split into the four groups NS, BM, BF, and CT-BM. Total level of stereotypy was affected by the form of stereotypy (F (2,41) = 4.52, P < 0.05), with levels of BM being significantly lower than levels of BF. All 40 C57BL/6 mice performed both RT and BM (RT-BM) (Fig 4). Levels of RT and BM were positively correlated (r = 0.52, P < 0.05, df = 39, controlling for cage and replicate).
Effect of stereotypy expression on measures of recurrent perseveration. Measures of recurrent perseveration for the two strains of mice are listed in S2 Table, and the effects of form and level of stereotypy expression on perseveration measures are listed in Table 2.
In CD-1 mice, we found no effect of stereotypy level or form on the primary or secondary perseveration measures. Stereotypy form affected the probability of displaying a side bias (F (3,41) = 2.95, P < 0.05), with NS and BM mice displaying stronger side biases (NS displaying 58 ± 4% and BM; displaying 43 ± 3% choices to the right side), compared to CT-BM (48 ± 4%) or BF mice (48 ± 3%).

Cognitive bias task
All measures of exploration from the training and test sessions are listed in S3 Table. In  Tables 3  and 4.
In the CD-1 strain, mice with higher levels of stereotypy showed a stronger preference for positive arms during training (F (1,43) = 7.87, P < 0.05; Fig 7). During testing when all eight arms were open, stereotypy level also affected exploration, however, this effect depended on the form of stereotypy performed (F (3,41) = 3.59, P < 0.05). BF mice increased their time in reference arms and avoided the ambiguous arms as the level of stereotypy increased (β = 66.75), spending up to 75% of their time in the reference arms. CT-BM and BM mice decreased their time in reference arms and spent more time in ambiguous arms as stereotypy level increased (β = -38.00 and β = -7.88, respectively; Fig 8). However, both CT-BM and BM mice still spent almost half of their time in the reference arms and the other half in the ambiguous arms. This effect of time spent in arms was evident despite no difference in number of arm entries to either reference or ambiguous arms (Table 3).
The effect of stereotypy level on activity measures depended on the level and form of stereotypy performed. CD-1 mice decreased their activity across the training sessions from 61 ± 2 arms entered in the first session, to 55 ± 3 in the last session. However, this decrease was greater for mice with lower levels of stereotypy (F (4,172) = 5.18, P < 0.05). BF mice were also more active during training (F (3,41) = 8.41, P < 0.05), entering on average 74 ± 4 arms, compared to NS (53 ± 2 arms), BM (53 ± 1 arms) and CT-BM mice (51 ± 2 arms). During testing, a significant interaction between stereotypy level and form was observed, where mice with higher levels of BF were more active and entered more arms (β = 129.42), while CT-BM and BM mice with higher levels of stereotypy entered fewer arms (β = −20.66 and β = -30.30, respectively, Fig 9). C57BL/6 mice also decreased activity across training sessions, from 61 ± 1 arms entered in the first session to 48 ± 2 arms entered in the last session (F (4,39) = 17.88, P < 0.05). Additionally, mice with higher stereotypy levels were more active (F (1,39) = 22.96, P < 0.05) and this effect was associated with the level of RT (F (1,39) = 6.51, P < 0.05), but not BM (F (1,39) = 2.85, P > 0.05). There was no effect of stereotypy level on radial maze exploration in the test phase.

Discussion
The main aim of this study was to explore the effects of cage-induced stereotypies in mice on measures of affective state and recurrent perseveration, and to assess how these effects vary depending on the specific form and expression level of stereotypic behaviour. Overall stereotypy level affected exploration in the cognitive bias test in CD-1 mice, and this effect was influenced by the form of stereotypy performed. With increasing levels of stereotypic behaviour, CD-1 mice displaying BF showed increasing avoidance of ambiguous arms, indicating a more negative cognitive bias. No such effect was observed in CD-1 mice displaying BM or CT-BM or in C57BL/6 mice with RT-BM. Furthermore, stereotypy level was not correlated with perseveration score in either strain; however, in C57BL/6 mice, the level of RT, but not BM, was positively correlated with the number of pure repetitions in the guessing task, a secondary measure of recurrent perseveration.

Expression of stereotypic behaviour
In C57BL/6 mice, stereotypic behaviour was more prevalent as only three non-stereotypic mice were observed in this strain compared to 14 in the CD-1 strain. The prevalence of forms of stereotypies differed across the two strains. In the C57BL/6 strain almost all mice performed both RT and BM combined, while there was greater intra-strain variability in the forms of stereotypy performed in CD-1 mice. The different forms of stereotypies observed are consistent with what other studies have reported [4,5,57,58], but stereotypy levels were slightly lower (15.34 ± 2.58%) than what was reported for CD-1 mice housed in barren cages in some studies (from 22.67 ± 6.6% to 40.33 ± 8.6%) [4,5], indicating some variability in stereotypy level between research institutions or breeding facilities.

Effects of stereotypy expression on recurrent perseveration
We found no association between the expression of stereotypies in CD-1 mice and our primary or secondary measures of recurrent perseveration in the guessing task. These results are consistent with results by Gross et al. [4,5] and Latham and Mason [43], indicating that stereotypies in this strain may not reflect recurrent perseveration. We also found no association between stereotypy level and perseveration score, the primary measure of recurrent perseveration in the guessing task, in C57BL/6 mice. However in this strain, the level of RT, but not BM, was positively correlated with the number of repetitions, indicating that in RT mice the pattern of responding was less random.
Some other studies have similarly only found correlations between stereotypic behaviour and repetitions in guessing tasks, which were interpreted as perseverative responses [4,45]. Since mice with higher numbers of repetitions also made fewer correct (i.e. rewarded) choices, they may have been more perseverative (i.e. inappropriately repeating a response). Although a previous study found a clear positive correlation between perseveration score, which measures higher order patterns of non-random responding and stereotypy level in this strain [42], our results provide evidence that stereotypy level in C57BL/6 mice may reflect impaired behavioural inhibition. However, in contrast to Garner et al. [42] who found a correlation between overall stereotypy level and recurrent perseveration, the correlation with perseveration was restricted to RT in the present study. Clearly, more studies are needed using other strains of mice and other test paradigms to assess recurrent perseveration, before we can draw firm conclusions about the relation between stereotypic behaviour and impaired inhibitory control of behaviour in laboratory mice. However, similar to studies in primates [31] and mink [45], it seems that at least in C57BL/6 mice, this relation may be dependent on the form of stereotypy.
The absence of a clear relation between recurrent perseveration and stereotypies in mice and other species [35,45] makes drawing conclusions difficult. However, a possible reason for inconsistent results is the methods used to measure impaired behavioural control. Behavioural inflexibility and impaired inhibition may be a result of different central nervous system regions [59,60], which can manifest in different forms of behaviour, such as reversal learning [33,61], impulsive responding [35,62], set shifting [63] among others, which could differentially be related to different stereotypies.

Effects of stereotypy expression on cognitive bias
Responses to different arms in the cognitive bias task were similar to other spatial cognitive bias tasks [64,65], with mice spending more time in near positive arms compared to near negative arms. The clear discrimination between ambiguous arms indicates that mice associated near positive arms with a positive outcome and near negative arms with a negative outcome. It is possible that this difference may merely be a consequence of the close proximity to the positive arms. However, as mice spent approximately 30% of their time in the central arena and did not simply transverse from one arm to another, this explanation is unlikely. Future studies could systematically evaluate this hypothesis by rearranging ambiguous arms so that they do not mirror each other, thereby eliminating any effects of spatial proximity.
Stereotypy expression affected exploration of the maze in both strains. In CD-1 mice, increasing preference for the reference arms and avoidance of ambiguous arms was associated with increasing levels of BF. This pattern of exploration is consistent with the one previously reported for stereotyping CD-1 mice in the same task [26] and could reflect a general avoidance of novel unfamiliar arms (neophobia) [66,67]. However, in a radial maze paradigm similar to the one used here, rats which had experienced removal of housing enrichment spent more time in positive reference arms and avoided novel ambiguous arms suggestive of impaired welfare [48]. An alternative explanation for avoidance of ambiguous arms would be that BF mice were behaviourally less flexible and less responsive to the changed environment when all arms were available to explore. However, as stereotypy level selectively affected time spent in ambiguous arms, but not the number of entries to these arms, these data suggest that reduced time spent in ambiguous arms reflects an increased expectation of an aversive outcome.
Taken together, we conclude that avoidance of ambiguous arms indicates that mice with high levels of BF were displaying a negative cognitive bias. However, this is in contrast with a previous study [25], where high stereotyping mice displayed a more positive cognitive bias. Although both of these conflicting findings are preliminary until replicated, there would be two possible explanations for this difference if both findings were true: they are based on different forms of stereotypy and on different task paradigms. That different stereotypies may differ in their relationship with measures of cognitive bias has been reported in non-human primates [22]. It is also one of the findings of the present study, although here, no opposite effect of other forms of stereotypy on cognitive bias was found. However, mice with higher levels of BF were more active and their activity did not decrease across training sessions. Anxiety usually increases when mice are exposed to a novel environment, but over time they habituate as they explore the environment which is associated with a decrease in activity [68,69]. High stereotyping animals are generally more active [1,5,11], but the lack of decrease in activity during training in BF mice could indicate sensitization to the apparatus and possibly higher levels of stress and anxiety in these mice. Indeed, chronic stress has been shown to induce hyper-locomotion in exploration based tests, which can be triggered by acute stressors such as bright light [70]. Thus, the cognitive bias measured in BF mice could reflect increased anxiety levels.
An alternate explanation for these conflicting results is related to the task paradigm and the outcome measures used. Stereotyping animals are generally quicker to make a choice [11,45,71] and the positive bias found in the task based on differential food rewards reported by Novak et al. [25] may have been a result of more impulsive choices made to ambiguous cues and not related to affective valence. The relative difference between the positive and the negative outcome usually varies in cognitive bias tasks, and is likely a contributing factor to inconsistent results or difficulties in finding predicted cognitive biases [14,72,73]. Animals are less likely to anticipate the negative (or less positive) outcome in tasks using reward based outcomes [20,74,75]. Conversely, increasing the relative difference between the positive and the negative outcome, as used in the present study, may be better at detecting variation in expectations of a more negative outcome [14,73]. Although these differences require further investigation, they may be important in studying specific types of bias by choosing particular reinforcer values as certain types of cognitive bias may be more tightly linked to particular affective disorders. For example, differences in anticipation of positive events may be more relevant to depression-like states, whereas biases in anticipation of negative events have been linked to anxiety-like states (for review see Mendl et al. [14]).
Similar to BF mice in the CD-1 strain, C57BL/6 mice with higher stereotypy levels showed a lack of habituation and a higher preference for positive arms; however, unlike BF mice, C57BL/ 6 mice with higher levels of stereotypy did not exhibit a negative cognitive bias. Furthermore, regardless of whether performed alone or in combination with another stereotypy, BM had no effect on performance in the cognitive bias task. This parallels results reported by Novak et al. [25], using an active choice task to assess cognitive bias, where no evidence for cognitive bias was found in BM mice in both CD-1 and C57BL/6 mice. Development of BM in CD-1 mice has been linked to behavioural [3,7] and physiological [12] measures of stress; however, in adult mice it was found to vary strongly depending on circumstances [25,43]. Therefore it could be linked to a more general arousal or motivation to explore the outside environment [3,7] and therefore dissociated from affective states.
One of the limitations of the present study is that it only looked at stereotypies at one time point. Stereotypies develop gradually from source behaviours, generally increasing in frequency and duration while becoming less variable with time [3,76]. When fully established, stereotypies may become emancipated from the initial causal factors [77]. In CD-1 mice, it was found that stereotypies may not yet be fully established at 11 months of age [5], and the developmental stage of different stereotypy forms (and in individuals displaying the same form) may have varied in the present study. For example, in deer mice, jumping develops earlier and faster compared to back-flipping [78], possibly due to the greater complexity of back-flipping. Understanding the etiology of stereotypic behaviour, both between and within forms, and the casual factors which contribute to their development still remains an area where critical information is lacking.
Taken together, there was no consistent pattern of responses across strains of mice and forms of stereotypies that could explain the results of the present and previous studies on the relationship between stereotypic behaviour and cognitive bias in laboratory mice [25,26]. However, both cognitive bias test paradigms used so far have not been successfully validated. Therefore, further validation is needed before drawing firm conclusions about the relationship between stereotypic behaviour and cognitive biases in laboratory mice.

Conclusions
While the cognitive bias paradigm used in this study needs further validation, our results mirror findings found in other species, where only some forms of stereotypies were linked to recurrent perseveration [31,45] or associated with cognitive bias, even if performed by the same individual [22]. Our findings therefore further support the idea that different forms of stereotypy should be considered separately in studies about their relationship with brain function, affective state or other measures of animal welfare. Besides the welfare implications this may have, our data also imply that the form and level of stereotypy performed may influence the outcome of behavioural studies using exploratory behaviour, novelty seeking and choice based tasks, and in turn may contribute to variability in the data and poor reproducibility between studies.
Supporting Information S1 Table. Number of mice for each stereotypy at the time of screening, testing and used for analysis. 78 mice from each strain were screened for stereotypy forms, and 60 of each strain were chosen for testing. (PDF) S2 Table. Measures of recurrent perseveration in the guessing task for each strain. Data are presented as mean ± SEM. (PDF) S3 Table. Mean ± SEM of exploration measures in the radial maze during training and testing for each strain. (PDF)