Towards a fully automated surveillance of well-being status in laboratory mice using deep learning

Assessing the well-being of an animal is hindered by the limitations of efficient communication between humans and animals. Instead of direct communication, a variety of behavioral, biochemical, physiological, and physical parameters are employed to evaluate the well-being of an animal. Especially in the field of biomedical research, scientifically sound tools to assess pain, suffering, and distress for experimental animals are highly demanded due to ethical and legal reasons. For mice, the most commonly used laboratory animals, a valuable tool is the Mouse Grimace Scale (MGS), a coding system for facial expressions of pain in mice which has been shown to be accurate and reliable. Currently, MGS scoring is very time and effort consuming as it is manually performed by humans being thoroughly trained in using this method. Therefore, we aim to develop a fully automated system for the surveillance of well-being in mice. Our work introduces a semi-automated pipeline as a first step towards this goal. We use and provide a new data set of images of black-furred laboratory mice that were moving freely, thus the images contain natural variation with regard to perspective and background. The analysis of this data set is therefore more challenging but reflects realistic conditions as it would be obtainable without human intervention. Images were obtained after anesthesia (with isoflurane or ketamine/xylazine combination) and surgery (castration). We deploy two pre-trained state of the art deep convolutional neural network (CNN) architectures (ResNet50 and InceptionV3) and compare to a third CNN architecture without pre-training. Depending on the particular treatment, we achieve an accuracy of up to 99% for binary “pain”/”no-pain” classification. Author summary In the field of animal research, it is crucial to assess the well-being of an animal. For mice, the most commonly used laboratory animals, there is a variety of indicators for well-being. Especially the facial expression of a mouse can give us important information on its well-being state. However, currently the surveillance of well-being can only be ensured if a human is present. Therefore, we developed a first approach towards a fully automated surveillance of the well-being status of a mouse. We trained neural networks on face images of black-furred mice, which were either untreated or underwent anesthesia or surgery, to distinguish between an impaired and unimpaired well-being state. Our systems successfully learnt to assess whether the well-being of a mouse was impaired and, depending on the particular treatment, its decision was correct in up to 99%. A tool that visualizes the features used for the decision making process indicated that the decision was mainly based on the facial expressions of a mouse.

Introduction 1 (i.e., respiratory rate, heart rate, and oxygen saturation) were monitored. According to the design 98 of our previous study, mice were anesthetized either once or six times at an interval of three to 99 four days [29]. 100 Injection anesthesia with the combination of ketamine and xylazine 101 For injection anesthesia a stock solution with 160 µL Ketavet® 100 mg/mL (Zoetis Deutschland 102 GmbH, Berlin, Germany), 160 µL Rompun® 2% (Bayer Vital GmbH, Leverkusen, Germany), 103 and 1680 µL physiologic saline solution was prepared in a syringe. A dosage of 80 mg/kg 104 ketamine and 16 mg/kg xylazine [32], warmed to body temperature, was administered 105 intraperitoneally at a volume of 10 µL/g body weight using 27 3 4 Gauge needles. Then the removed, the skin was stitched with a single button suture.
Mice of the control groups in our previous studies received neither anesthesia nor surgery (i.e., 128 no treatment). Since well-being of these mice was not affected by any medical procedure, they 129 were expected not to display any signs of pain and/or distress. 130 Dataset 131 The present study is based on a large data set of images of C57BL/6JRj mice. Images were 132 obtained in previous studies, in which the impact of procedures frequently performed in animal 133 experimentation on the well-being and stress levels of mice was systematically assessed by 134 using the MGS, among other animal-based parameters [29,30]. According to the treatment of 135 the mice, our data set was divided into the three subsets: KXN (ketamine/xylazine anesthesia), 136 IN (inhalation anesthesia with isoflurane), and C (castration).

137
Image acquisition 138 Images were generated as described previously in Hohlbaum et al. [34]. All images were taken 139 in observation cages (22 × 29 × 39 cm) (Fig 1) with three white walls to contrast the black mice 140 and one clear wall. Cages were custom-made in our facility and visually varied slightly, e.g., 141 walls were attached to each other with dark-or light-colored material. The bottom of the cage 142 was covered with approximately 0.5 cm bedding material and soiled bedding was scattered on 143 top in order to minimize stress caused by the novel environment. Food pellets normally supplied 144 as diet and a water bowl were provided. After the procedures were performed, the mouse was gently transferred into an observation 146 cage and was allowed to habituate to the new environment for 30 min. Then a series of images 147 was taken within a few minutes (approximately 1-2 minutes, but in some cases even longer) 148 using a high definition camera (Canon EOS 350D, Canon Inc., Tokyo, Japan). Baseline images 149 were acquired prior to the procedures. Post-procedure images were taken at various points in 150 time as follows:

155
Images of untreated mice were generated at the same times as images of corresponding treated 156 groups. Since they were considered not to show any post-anesthetic or post-surgical distress, 157 they were added to baseline images in the present study. obviously present) with high scores reflecting high intensity of a facial action unit (for further 165 details see Langford et al. [5]). An accuracy of 72-97% for humans scorers was reported [5]. 166 Interestingly, the change of facial action units described in the MGS is also triggered by other 167 stimuli than pain such as post-anesthetic distress, situations associated with fear (i.e., whisker 168 contact, social proximity, cat odor exposure, rat exposure), and aggression or 169 subordination [29,30,35].

170
For MGS scoring, one image of high quality showing the mouse face from frontal or lateral 171 view was randomly selected per mouse, and point in time [29,30,34] and the face of the mouse 172 were manually cropped from the image so that, if possible, the body posture was not visible. MGS score. As we could not predict the MGS score directly, we followed the approach of Tuttle 184 et al. [36] and assigned one of two defined states, "pain" and "no pain", to all images in order to 185 train a binary classifier on the whole data set. To do so, time points of image acquisition were 186 either defined as "pain" or "no pain" based on the MGS scores obtained by humans and the 187 statistical analysis performed in our previous studies [29,30,34]: if MGS scores were 188 significantly higher when compared to untreated mice (Fig 2), all images of this point in time 189 were considered to display a "pain face" and were assigned to the label "pain" (Table 1).

190
Furthermore, 300 min post-castration was also defined as "pain" state, though significance rather than pain [29,30].   Sotocina et al. [6] created the "Rodent Face Finder" for white rats which was used in Tuttle 209 et al. [36]. It combines two detectors, one for ears and one for eyes. Groups of detections are 210 filtered according to heuristic expectations of a typical face (e.g., the ears must be above the Finder" has a similarly low detection rate on white mice, but for the application on video 217 sequences, as in Sotocina et al. [6] and Tuttle et al. [36], this is not a hindrance as a face needs 218 only to be successfully detected in a fraction of the acquired frames for a correct classification. 219 If a frame of the face is grabbed from a video every few seconds, the classification can be 220 performed accurately. However, to achieve a bigger resulting training set for the facial 221 expression recognition on our data set of still images, we decided to use a face detector with  recognition. Additionally, we compared to a simpler architecture without pre-training (Fig 3).

246
All networks were implemented and trained using TensorFlow [43]. The output of the top layer 247 of the network are the activations of two output neurons, one for each class. We classify an 248 image as "pain" if the activation of the "pain" neuron is greater than the activation of the "no 249 pain" neuron and vice versa. Furthermore, we interpret the resulting activation of the two output 250 neurons as the confidence of the network for the two classes "pain" and "no pain".  study received inhalation anesthesia with isoflurane, which caused statistically significant 280 changes in the facial expression for a relatively short period only [29]. Therefore, images taken 281 30 min post-anesthesia were considered to display a "pain" face and images generated at 282 baseline or 150 min post-anesthesia were labeled with "no pain". Since our previous study did 283 not reveal any statistically difference in the facial expressions according to the MGS between the 284 time points baseline and 150 min post-anesthesia [29], MGS scores were still slightly increased 285 in some animals at 150 min post-anesthesia. As a consequence, the binary classification led to a 286 pool of "no pain" images with a high range of intensities (Fig 2, S3 Fig). The "no pain" and

291
As Tuttle et al. [36] showed that results can be improved by using multiple frames from  between the two classes in the KXN subset. Injection anesthesia is known to intensively impair 300 the general condition of a mouse and to significantly affect its facial expression for a longer 301 period (i.e., up to at least 150 min) [30]. Therefore, images generated 30 min as well as 150 min 302 post-anesthesia were labeled with "pain" and images of the remaining time points were assigned 303 the label "no pain". In contrast to the subset IN, the range of intensities of facial expressions in 304 the "no pain" class is smaller in the subset KXN (Fig 2, S3 Fig). MGS scores of the "no pain" 305 class (median: 0.00, interquartile range: 0.2) and "pain" class (median: 0.60, interquartile range: 306 0.60) overlap to lesser extent than seen for IN, which may have contributed to a clearer decision 307 boundary between the two classes.

308
The performance of the algorithms in binary classification of "pain" versus "no pain" images 309 cannot be directly compared to the human performance in our data set because we used the 310 MGS scores obtained from humans as ground truth. Langford et al. [5] reported an accuracy of 311 97% and 81% for experienced and inexperienced human scorers, respectively, when high 312 resolution images (1,920 × 1,080 pixels) were used. A lower accuracy of 72% was found for 313 inexperienced humans who scored low resolution images (640 × 480 pixels) [5]. This  (Table 2d). In Fig. 4, Fig. 5, and Fig. 6, we present the network confidence values for the "pain" class of  However, in general, the data suggests that the confidence for the "pain" class was higher for 331 images with high MGS scores. This can be clearly seen for 30 min and 150 min post-anesthesia 332 with ketamine/xylazine combination (Fig 5) or 30 min post-castration (Fig 6). The other way 333 around, a very low confidence for the "pain" class was found for images with low MGS scores, 334 indicating a "no pain" state. In intermediate cases however, while the network output still tends 335 to follow the human evaluated MGS score, the deviation of network confidence and MGS score 336 increases (e.g. 150 min and 300 min post-castration).

337
Regarding inhalation anesthesia with isoflurane, the confidence for the "pain" class reflects 338 the difficult decision boundary between the two classes "pain" and "no pain" in this data subset 339 (Fig 4), as discussed above. In brief, both classes "pain" and "no pain" contain images within a  [29].
distinguish between the two classes. inhalation anesthesia only, which can be assessed by behavioral parameters, for instance nest 362 building [44]. Depending on the treatment, a mouse is exposed to different stimuli and 363 experiences different states. Post-surgical pain accompanies castration, whereas anesthesia is 364 unlikely to induce pain. However, the injection with the combination of ketamine and xylazine 365 can damage the tissue at the injection site, which may be associated with a painful 366 sensation [45], but we assume the degree of pain to be much lower when compared to nociceptive ion channels [47,48]. In humans, the inhalation of isoflurane causes coughing and 370 subjective sensations of burning as well as irritation [49]. In addition, if 100% oxygen is used as 371 carrier gas, the inhalation gas is very dry and can impair the function of the respiratory 372 mucosa [50]. When anesthesia is induced, distress of a mouse can additionally increase by 373 fixation and injection stress or by exposure to the (irritant) volatile anesthetic. In the latter case, 374 aversion towards this inhalant agent elevates with repeated exposure [51]. Distress a mouse 375 experienced during the induction phase may influence its well-being after anesthesia as well.

376
When mice recover from anesthesia, they can suffer from post-anesthetic nausea [52]. Moreover, 377 in humans emergence delirium can occur during the recovery period and hallucinogenic effects 378 were reported for the use of ketamine [46,53]. We also have to consider different 379 pharmacological effects of anesthetics on the facial expressions of the mice with longer lasting 380 effects following injection anesthesia due to the pharmacokinetic properties of ketamine and 381 xylazine. In contrast to isoflurane, ketamine and xylazine are subject of an intensive liver 382 metabolism [54][55][56][57], which results in longer recovery periods. Ketamine increases the muscle 383 tone, whereas the combination of ketamine and xylazine causes muscle relaxation [46]. All in 384 all, inhalation anesthesia, injection anesthesia, and castration produce different affectional states 385 in a mouse. Against the background that the weight of the five facial action units varies between 386 different states like illness and pain [5,58], our data suggests that the procedures we investigated 387 in the present study induce different facial expressions. This may explain the reduction in 388 performance when the algorithms were trained and tested on mouse images of different 389 treatments and is a disadvantage for the pure binary classification of "pain" and "no pain".  significant impact on the decision making process, particularly in mouse images labeled with 407 "no pain" (S1 Fig). To understand why the nose tip seems to play an important role in the "no 408 pain" state, but less in the "pain" state, the color of the nose and its position is of special interest. 409 While the nose points forwards or is slightly elevated in the "no pain" state in general, the head 410 rather is dropped and the nose tip points downwards after anesthesia or surgery. The nose tip is 411 colored (pale) pink in mice in good general condition. If mice recover from anesthesia or surgery, 412 circulation can be affected in the early post-anesthetic period, hence the color of the nose may 413 turn paler. Overall, the nose tip may play a more important role in the "no pain" state because it 414 is clearly visible and the color is very prominent. S1   The long term goal is to devise a smart-surveilled environment for laboratory mice. The 458 proposed approach will be the foundation for a "smart mouse cage", i.e., an integrated system