The composition and initial evaluation of a grimace scale in ferrets after surgical implantation of a telemetry probe

Reliable recognition of pain is difficult in ferrets as many currently available parameters are non-specific, inconsistent and/or impractical. Grimace scales have successfully been applied to assess pain in different animal species and might also be applicable to ferrets. To compose a Ferret Grimace Scale (FGS), we studied the facial musculature of ferrets and compared lateral photographs of 19 ferret faces at six time points before and after intraperitoneal telemetry probe implantation. We identified the Action Units (AUs) orbital tightening, nose bulging, cheek bulging, ear changes and whisker retraction as potential indicators of pain in ferrets. To evaluate whether these AUs could reliably be used to identify photographs taken before and after surgery, the photographs were scored 0, 1 or 2 (not, moderately or obviously present) by 11 observers that were blinded to the treatment and timing of the photographs. All AU-scores assigned to the photographs taken five hours after surgery were significantly higher compared to their time-matched baseline scores. Further analysis using the weights that were obtained using a Linear Discriminant Analysis revealed that scoring orbital tightening alone was sufficient to make this distinction with high sensitivity, specificity and accuracy. Including weighted scores for nose bulging, cheek bulging and ear change did not change this. As these AUs had more missing values than orbital tightening, their descriptions should be re-evaluated. Including whisker retraction, which had a negative weight, resulted in lower accuracy and should therefore in its current form be left out of the FGS. Overall, the results of this study suggest that the FGS and the AU orbital tightening in particular could be useful in a multifactorial pain assessment protocol for ferrets. However, before applying the FGS in practice, it should be further validated by incorporating more time points before and after applying (different) painful stimuli, and different levels of analgesia.

Introduction authors were kindly allowed to observe and photograph the ferrets that were undergoing a surgical procedure as part of this unrelated influenza study.

Facial musculature
To study the anatomy of the facial muscles, three humanely euthanized ferrets (from DEC201400137) were used to document the facial musculature. Dissection was started with a surgical cut through the skin layer in caudo-rostral direction. For proper investigation of the superficial musculature, the skin layer was removed in direction of the eye and muzzle at one lateral side. This was followed by the removal of the superficial muscles to further investigate the deep musculature. Methods of dissection were the same for both sides of the face. The facial muscles were compared to detailed anatomical drawings of the facial musculature of the cat, dog and rodents [22] and a professional veterinary anatomic illustrator (R. van Deijk) composed a detailed anatomical drawing of the ferrets' facial muscles.

Animals and husbandry
To compose and evaluate the FGS, we used 19 domestic intact female ferrets of different coat colour and length (12 regular sables, 3 angora-like sables, 4 regular black-eyed whites). The ferrets were 16-32 weeks old, weighed 783±86 grams (range: 660-940 grams), and were obtained from Schimmel BV, The Netherlands. The ferrets were group-housed indoors in open phenolic face plywood ground cages (94.5x 166 x 64.6 cm) on wood shavings (IRS LIGNOCELL1 Hygienic animal bedding). Ferrets were maintained on a 8:16 hour light:dark cycle (lights on from 8.00AM to 16.00PM) at 18-25˚C. Auditory stimulation in the form of a radio was present 24 hours/day. Food (Hope farms1 Ferret balance pellets) and water were given ad libitum in stoneware bowls. Cage enrichment consisted of a ferret ball (25 cm diameter with 6 holes of 7.5 cm each) and a large, 24L flexible sleeping bucket. The animals were acclimated to their housing conditions for 27 days prior to the surgery. The ferrets were weighed weekly and their health was monitored daily.

Surgery and recovery
Prior to anaesthesia a physical examination was performed on the ferrets to ascertain that they were in good health. Anaesthesia was induced by administering 0.2 ml of a 9:1 mixture of ketamine (Narketan, 100 mg/ml, range: 16-26 mg/kg)-dexmedetomidine (Dexdomitor1, 0.5 mg/ ml, range 8-12 μg/kg) in the left caudal thigh musculature. The ferrets were placed in a dorsal recumbence and the ventral aspect was shaved and disinfected with 70% alcohol. A midline incision, with a maximum length of two centimetres, was made in between the most cranial nipples, through the skin, linea alba and peritoneum to enter the abdomen. The time of incision was registered as T 0 . Two telemetry probes (DST micro-T and -HRT, STAR ODDI, 8.3 mm diameter, 25.5 long) were inserted into the abdomen. The linea alba and skin were separately closed with interrupted sutures (Vicryl1 3-0, Ethicon). All surgeries were performed by experienced technicians who were able to perform the surgery procedure on a single animal within 15 minutes, thereby enabling all surgeries to be completed between 9:00 and 11:30 AM of the same day. Because of the short duration of the surgery, no additional heat support was needed to allow the ferrets to maintain their body temperature. Following surgery, a single intra-muscular injection of 0.02 ml Antisedan1 (Atipamezole, 5 mg/ml) REG NL 7744, average: 121 μg/kg, range: 92-145 μg/kg) was given in the right caudal thigh area to reverse the anaesthesia. The ferrets were then returned to their group-housing for recovery, where the water bowl was temporarily removed for safety reasons. The ferrets were continuously monitored until they were fully awake, at this time the water bowl was returned. Slowly recovering animals were placed in a separate cage with extra towels and a warm-water bottle wrapped in a towel for thermoregulation. As part of the influenza study protocol and with approval of the IACUC of Intravacc no post-surgical antibiotics or analgesics and other pain-relieving medication were administered.

Photographs
In order to take photographs, the ferrets were individually taken out of their home cage and carried to an adjacent room. The ferret was then placed on a table in front of a PVC tube (75 mm diameter, 30 cm length), through which the ferret had to walk. For five days prior to the surgery the ferrets were individually habituated to this procedure, once per day for a period of 5 minutes.
Based on the anaesthesia recovery time [24,25] and grimace scale studies in other animal species (e.g. rats [12]), photographs were taken when the ferrets were fully recovered from the anaesthesia (i.e. 2 hours after surgery, T 2 ) and when they were expected to show the most obvious grimace (as was seen in rats [12]) (i.e. 5 hours after surgery, T 5 ). Baseline photographs were taken at time-matched moments on the day prior to surgery (i.e. 22 and 19 hours prior to surgery, T B2 and T B5 ). Additionally, time-matched photographs were taken on the day after surgery (i.e. 26 and 29 hours post-surgery, T P2 and T P5 ) (Fig 1).
The photographs were taken with a camera (Canon EOS 7D Mark II Body + Canon EF 50 mm 1.8 f II objective) that was placed lateral to the table at 58 cm distance from the end of the tube on a tripod. When the ferrets' head was just outside of the tube, lateral photographs of the ferrets' head were taken in burst mode (10 frames/second, 1/250, F = 5.0, ISO 10.000). This was repeated for the other side of the face. In total, each ferret walked through the tube 2-6 times per time point, depending on the suitability of the photographs taken. Selection of the photographs for further analysis was based on full visibility of the eyes, ears and whiskers. In addition, the position of the head needed to show the least angled (both horizontal and vertical). The selection of the photographs was performed blinded for time point and before identifying AUs to prevent selection bias. Selected photographs were cropped to contain only the head of the ferret and edited to contain a neutral background using Adobe Photoshop CC 1 .

Composition of the Ferret Grimace Scale
Time-point blind comparisons were made between individual baseline photographs from each animal and the matching post-surgery photographs (T B5 vs T 5 ) (within-subject comparison). Changes in facial expression within each ferret were recorded. Changes that were observed in five or more ferrets (>25%) were listed as action units (AUs). All AUs were listed in a Ferret Grimace Scale (FGS), designed to be scored on a three-point scale that is generally used for grimace scales (e.g. [11][12][13]). A score of '0' indicated that the change in the AU was not present, a score of '1' indicated that the change in the AU was moderately present and a score of '2' indicated that the change in the AU was obviously present. Additionally, a short training module was composed for each AU, which showed three example photographs of each score (i.e. 0, 1 and 2) with lines indicating the changes visible for the AU.

Evaluation of the Ferret Grimace Scale
To evaluate whether each of the five selected AUs could be used to differentiate ferrets before and after surgery, a survey was designed in Survey Monkey1 and distributed to eleven observers. These observers included three ferret veterinarians, five ferret owners, one ferret researcher and two members of a ferret society/rescue that were blinded to the study objectives and time points. Each observer was asked to complete a 7-part survey, with each survey part sent to the observer one week after completion of the previous part.
Five of these survey parts (i.e. part 2-6) dealt with the individual AUs. In each of these, the observers were asked to rate each photograph (N = 114: 19 ferrets, 6 time points, i.e. 1 photograph per ferret per time point) on the three-point scale (0 = not present, 1 = moderately present, 2 = obviously present or? = I don't know), after having completed a short training module for that specific AU. The order in which the AUs and the photographs were scored was randomized between observers (the order of the AUs was randomized using a balanced Latin square; the order of the photographs was randomized automatically by Survey Monkey1; Table 1).
Two survey parts (i.e. parts 1 and 7) comprised questions regarding the general assessment of pain. In these survey parts, the observers were asked to rate each photograph on the overall pain that the observer estimated the ferret to be in, using a subjective scoring method from an overall discomfort assessment scale used during animal experiments (0 = no pain, 1 = mild pain, 2 = moderate pain, 3 = severe pain or? = I don't know). The scores from the first part (overall pain score) are the 'pre-training' scores, as the observers are not (yet) familiar with the AUs and have to rely on their own experience to assign a score. The scores from the last part (also overall pain score) are the 'post-training' scores, as the observers completed the survey part and the accompanying training for each AU (orbital tightening, nose bulging, cheek bulging, ear changes and whisker retraction)( Table 1).

Statistical analysis
All analyses were performed using R (R core team, R Foundation for Statistical Computing). Packages are listed between brackets for each analysis. Differences were considered statistically significant if P<0.05. The p-values were corrected for multiple comparisons using the False Discovery Rate (FDR). Table 1. Time schedule of the 7-part survey with the scores that were assigned to the photographs of the ferrets. AU1-5 = action unit 1-5 of the Ferret Grimace Scale (orbital tightening, nose bulging, cheek bulging, ear changes and whisker retraction), presented in random order. Pre-training = intuitive overall pain assessment, post-training = overall pain assessment after being trained on the Ferret Grimace Scale. First, to get an indication whether the descriptions of the AUs were clear and whether training aided in assigning an overall score to the photographs, the percentage of missing data (scored with "I don't know") per AU per time point ð¼ P missing data observer 1À 11 on T x 209 ð11 observersÂ19 ferretsÞ Â 100Þ and per ferret ð¼ P missing data ferret x on T ðÀ 22ÞÀ 29 for AU 1À 5 30 ð6 timepointsÂ5 AUsÞ Â 100Þ were calculated. In addition, to further determine whether the descriptions of the AUs were clear and whether training aided in the agreement on the overall pain assessment, the inter-observer agreement was calculated for each AU and for the overall pain scores and the intra-observer agreement was calculated for the overall pain scores using an Intraclass Correlation Coefficient (ICC, "irr" package) [26]. A score of <0.40 is regarded as poor agreement, 0.40-0.59 as fair agreement, 0.6-0.74 as good agreement and 0.75-1.00 as excellent agreement [27].
Second, to assess whether the AUs could be used to differentiate photographs of ferrets' faces before and after surgery, a Cumulative Link Mixed Model (CLMM, "ordinal" package) [28] was performed to assess how the time-point, the observer and the ferret influenced the score for each AU. This method is analogous to a linear mixed model, but specific to ordinal response values. The CLMM was performed with time point as a fixed effect and observer and ferret as random effects. The baseline scores (T B2 and T B5 ) were compared with each other and with the time-matched post-surgery (T B2 was compared to T 2 and T P2 , T B5 was compared to T 5 and T P5) .
Third, the importance of each AU in discriminating the pain condition of a ferret was measured using a Linear Discriminant Analysis (LDA, "MASS" package) [29]. To quantitatively assess this importance, LDA assigns a weight to each AU, with the most important AUs having higher weights with respect to the others. If the weights are all equal, the linear weighted combination of the five AU scores is the mean of the scores. However, the weights are often not equal and the linear weighted combination of the five AU scores were used to maximize the discrimination between the photographs taken five hours after surgery (T 5 ) and the corresponding baseline photographs (T B5 ). The AUs that were assigned a negative or near-to-zero weight were removed from the analysis and another LDA was performed. Using the optimal weights of the AUs obtained by performing the LDA, the sensitivity, specificity and accuracy with which the AUs could be used to discriminate photographs taken five hours after surgery (T 5 ) from the corresponding baseline photographs (T B5 ) were calculated. The sensitivity (i.e. true positive rate) is the proportion of photographs taken at T 5 (i.e. five hours after surgery) that were assigned a high score for the AUs properly weighed by the LDA weights ð T 5 scored high according to LDA 19 ðall T 5 picturesÞ Þ. The specificity (i.e. true negative rate) is the proportion of photographs taken at T B5 (i.e. 19 hours before surgery) that were assigned a low score for the AUs properly weighed by the LDA weights ð T B5 scored low according to LDA The accuracy (i.e. the ratio of the sum of true positives and true negatives, to the total number of correct or incorrect classifications) is the proportion of photographs taken at T B5 and T 5 that were assigned a low and high score, respectively, properly weighed by the LDA weights ð T 5 scored high according to LDAþT B5 scored low according to LDA

Facial musculature
The detailed anatomical drawing of the ferrets' facial muscles is shown in Fig 2. The name, origin, insertion and action of these muscles are listed in Table 2.

The Ferret Grimace Scale
Five action units (AUs) were defined from the within-subject comparisons: orbital tightening, nose bulging, cheek bulging, ear changes and whisker retraction. Orbital tightening consists of closing of the eyelids, which can cause a wrinkle to become visible around the eye. Nose bulging consists of pulling the nose down, causing the nose to round off, the nostrils to point down instead of straight forward and bulging of the bridge of the nose. Cheek bulging occurs due to  constriction of the cheek muscles, making the contour of the cheeks become visible. Additionally, the cheek may be pulled up at the side of the ear. Ear changes consist of pulling back the ears against the body, possibly forming a pointed shape and folding over. Whisker retraction consists of pulling back the whiskers against the cheek, clumping together of the whiskers and caudal convergence of the whisker follicles (Fig 3).

Missing scores
The number [percentage] of photographs that were scored "I don't know" (missing scores) by the observers was different between the AUs (LMM: F 55 = 3.108, P = 0.022

Intra-and interrater agreement
The inter-observer agreement for all AUs and for overall pain score were excellent: More specifically, the scores at baseline for orbital tightening and whisker retraction were significantly lower at T B5 than at T B2 (P = 0.0024 and P = 0.0354, respectively). The scores for nose bulging and ear changes were significantly higher at T B5 than at T B2 (P = 0.0024 and P = 0.0162, respectively). When evaluating the AU orbital tightening, time was found to exert a significant effect on the score assigned by the observers (CLMM: LR = 210.9036, P<0.0001). More specifically, the post-surgery scores at T 2 were significantly lower than at T B2 (P<0.0001). Additionally, the post-surgery scores at T 5 , T P2 and T P5 were significantly higher than their respective baseline scores (T B5 : P<0.0001, T B2 : P<0.0001 and T B5 : P = 0.0428 respectively) (Fig 4A).
Similarly, differences were present in score over time for nose bulging (CLMM: LR = 220.8676, P<0.0001), cheek bulging (CLMM: LR = 232.0036, P<0.0001), ear changes (CLMM: LR = 180.4370, P<0.0001), and whisker retraction (CLMM: LR = 217.2884, P<0.0001). More specifically, for nose bulging the post-surgery scores at T 2 , T 5 and T P2 were significantly higher than their respective baseline scores (T B2 : P = 0.0003, T B5 : P = 0.0048, T B2 : P = 0.0002 respectively), whereas the post-surgery scores at T P5 were significantly lower than at baseline (T B5 : P = 0.0393) (Fig 4B). For ear changes, post-surgery scores at T 5 , T P2 and T P5 were significantly higher than their respective baseline scores (T B5 : P = 0.0004, T B2 : P = 0.0039, T B5 : P = 0.0107 The composition and evaluation of a grimace scale in ferrets after surgical implantation of a telemetry probe respectively) (Fig 4D). For cheek bulging and whisker retraction, only the post-surgery scores at T 2 and T 5 differed significantly from their respective baselines, with the post-surgery scores at T 2 being significantly lower than at baseline (T B2 : P = 0.0029 and P<0.0001 for cheek bulging and whisker retraction, respectively) and the post-surgery scores at T 5 being significantly higher than at baseline (T B5 : P<0.0001 and P = 0.0003 for cheek bulging and whisker retraction, respectively) (Fig 4C and 4E).

Weight, sensitivity, specificity and accuracy
From the LDA, it was clear that whisker retraction had a negative weight, close to zero ( Fig  5A). Therefore, whisker retraction was removed from the model and the weights of the other AUs were re-estimated by performing another LDA. In this model, like the first model, cheek bulging, nose bulging and ear changes had weights close to zero (Fig 5B).
Using the optimal weights of the AUs obtained by performing the LDA, the sensitivity, specificity and accuracy of the observer scores were calculated 1) using all five AUs, 2) using the FGS without whisker retraction and 3) using only orbital tightening. In all these three options, the score 1 for the orbital tightening AU was the cut-off value for which a photograph was classified as taken at T B5 (score of orbital tightening below 1, i.e. score of orbital tightening equal to 0) or at T 5 (score of orbital tightening equal to 1 or 2). The sensitivity (85%) and specificity (74%) were equal for all three options. The highest accuracy (80%) was achieved using the FGS without whisker retraction and using only orbital tightening.

Discussion
With this study, we aimed to compose a Ferret Grimace Scale (FGS) using action units (AUs) that were based on the facial musculature of ferrets. Additionally, we wanted to evaluate whether the descriptions of the AUs were clear and aided observers in assigning an overall pain score to photographs that were taken before and after telemetry probe implantation surgery. Furthermore, we aimed to determine whether the AUs could be used to distinguish ferrets before and after this surgery. Finally, we aimed to make an informed decision on which AUs could be included in a final FGS in their current form and which AUs should be re-evaluated by assessing the weight, sensitivity, specificity and accuracy of the AUs comprising the scale.

Facial musculature and the FGS
The ferrets' facial musculature is very similar to that of mice and rats [31,32], containing all the necessary muscles to show different facial expressions. Therefore, it is not surprising that the Table 3

Photographs with missing scores (of 209)
Orbital tightening

Nose bulging
Cheek bulging

Ear changes Whisker retraction
Pre-training Posttraining The composition and evaluation of a grimace scale in ferrets after surgical implantation of a telemetry probe potential AUs that were identified for the ferrets (orbital tightening, nose bulging, cheek bulging, ear changes and whisker retraction) are very similar to those in the other grimace scales, e.g. the mouse, rat and rabbit grimace scale [11][12][13]. Comparing the different grimace scales, the similarities are striking, raising the question whether it is necessary to develop a grimace scale for every animal species or whether we can reduce the amount of animals that are used in research by making a generalized vertebrate grimace scale. However, there are also species differences in the AUs of established Grimace Scales. For example, where rats and rabbits flatten their nose/cheek when in pain, mice (and ferrets) bulge their nose/cheek [11][12][13]. The only AU that seems to appear in the same form in all species is orbital tightening.

Missing values and agreement
Based on the number of missing values and the inter-observer reliability, the observers seemed to find the action unit (AU) orbital tightening the most clearly described and/or the most recognisable. The observers seemed to find the AUs ear changes and cheek bulging the least clear, as there were more missing values for these AUs. Therefore, it would be worthwhile to The composition and evaluation of a grimace scale in ferrets after surgical implantation of a telemetry probe re-evaluate the descriptions of at least ear changes and cheek bulging, but preferably also nose bulging and whisker retraction. The ability of observers to assign a score for the AUs could have potentially been influenced by the type of coat the ferret had, which seemed to be more difficult in the longer-haired ferrets. This has not previously been reported, but is not surprising, as the long hair might have obscured muscle tension and/or ear position. This should therefore be taken into account when applying the FGS and when selecting ferrets as animal models (e.g. selecting only shorthaired ferrets).
The observers scored the overall pain with excellent and good consistency (inter-and intraobserver, respectively), indicating that the observers agreed on which score to assign to a photograph, as well as that an observers' first (pre-training) assessment was equal to the second (post-training assessment). The intra-observer reliability might have lower agreement due to a training effect, i.e. as the observers have been trained to look more closely at the ferrets' faces in the survey-parts concerning the AUs. The observers did appear more confident to assign an overall pain score to the photographs after seeing the examples of and having scored the photographs on the AUs, as there were less missing values for the overall pain scores post-training than pre-training. This indicates that systematically looking at the ferrets' faces using the AUs can aid in formulating an overall pain score.

Scores over time
The baseline FGS scores at T B2 were significantly different from the baseline scores at T B5 for the AUs orbital tightening, whisker retraction, ear changes and nose bulging, which suggests that The composition and evaluation of a grimace scale in ferrets after surgical implantation of a telemetry probe the time of day influences the facial expression of the ferrets. These differences in AU-scores might possibly be explained by the polycyclic sleep-wake cycle of ferrets: ferrets sleep in bouts of approximately three hours [33], which is exactly the window between the two baseline measurements. As the ferrets were used to being disturbed in the morning (around T B2 ), but not in the afternoon (around T B5 ) and they are known to adapt their sleep-wake cycle to human activity, the ferrets might have been awake around T B2 and just woke up around T B5 . This is supported by asleep, waking or sedated animals showing false positives with other grimace scales (i.e. mice [11] and rats [12]). Using more than two measurements on one day and using a smaller inter-measurement period could provide more information on the background of these differences in scores in ferrets.
Two hours after the surgery (T 2 ), the scores for orbital tightening, nose bulging and cheek bulging were significantly lower than baseline (T B2 ), indicating that there might still have been a carry-over sedative or analgesic effect of ketamine (as the effects of dexmedetomidine were reversed using atipamezole). Studies in mink reported recovery within 60 or even 15 minutes after reversal with the chosen anaesthetic regime [24,25]. However, possible carry-over effects of ketamine have not been addressed in these studies. Therefore, the changes observed in the ferrets' facial expressions two hours after surgery might be a reflection of recovery from the anaesthesia. This hypothesis is supported by the observation that some ferrets were repeatedly yawning during handling at this time point, which was not seen at other time points (but would be expected when yawning was expressed as a normal behaviour when waking up). It has been shown that isoflurane, and not buprenorphine, increased the mouse grimace score in male DBA/2 mice [34], which underlines the need to carefully choose the post-surgery time to ensure the animals are no longer experiencing the major sedative effects of anaesthesia or the anaesthetic regimes for grimace scale research. Additionally, it should be taken into account that the ferrets received equal amounts of anaesthetics, while their bodyweights differed. Some of the ferrets might therefore have experienced stronger/longer lasting sedation/analgesia than others.
The scores for all AUs and overall pain scores peaked five hours after surgery (T 5 ), which is in line with the results found in other grimace scale studies (e.g. the rat grimace scale [12]). This high FGS-score suggests that the ferrets were experiencing pain at that moment, supported by the observation that the ferrets showed repeating bouts of pronounced shivering when they were handled at this time point. The ferrets did not show shivering during handling at T 2 or at any other time point, which makes it unlikely that the shivering was caused by hypothermia, stress or as a result of recovery from the anaesthesia. However, due to the lack of control animals receiving analgesics, clear conclusions cannot be drawn on the pain-state of the ferrets. Follow-up studies using multiple analgesic schedules should be performed to confirm this as this will further confirm that the changes in the FGS observed here are pain-related. In these follow-up studies, it should be taken into account that there might be individual and sex differences in pain perception and therefore in pain expression.
The scores of the photographs taken the day after surgery (T P2 and T P5 ) were significantly lower than the scores at T 5 , but were still higher than baseline scores for orbital tightening, nose bulging and ear changes. This could possibly indicate that the chosen time frame for this study was too short to capture the point at which the ferrets' grimace (and pain-state) was back to baseline. It was unexpected that the ferrets would still show changes in the AUs 26 and/or 29 hours after surgery as the recovery time in rats was shorter, their grimace scores were back to baseline twelve hours after a laparotomy [12]. However, these differences between the species/ studies might have occurred due to other pain-related causes, e.g. the size of the incision (relative to the animal) or the method with which the wounds were closed (sutures or tissue glue) or non-pain-related causes, e.g. the anaesthesia that was used.
The decline in AU scores over time might indicate a slow recovery from pain, but again, this should be further confirmed using a control group that receives analgesics. In a follow-up study, it would be informative to measure more than two time points per day to identify the FGS-peak and to take FGS-scores later than 29 hours after surgery, to determine the point at which the AU scores are all back to baseline. Additionally, it would be preferable to include other clinical signs that are considered indicative of pain (physiological and behavioural changes) to further validate this scale.

Weight, sensitivity, specificity and accuracy
Even though the scores for all AUs increased significantly from baseline to five hours after surgery (T 5 ), just the scores for orbital tightening were required to achieve a high sensitivity, specificity and accuracy, which is comparable to the findings of other studies [11][12][13][14]. Additionally, orbital tightening had the least missing values, indicating that it was the most clear/most visible AU for observers. It is not surprising that orbital tightening is the most important AU as this is also reported for other grimace scales (e.g. [19]) and humans tend to focus on the eyes of people/animals when assessing emotions such as pain [35,36].
Including the AUs nose bulging, cheek bulging and ear changes alongside orbital tightening in the FGS resulted in a similar accuracy of score assignment as orbital tightening alone. However, in comparison to orbital tightening, the weights assigned to these three AUs in the overall scoring were low, indicating that they contributed only slightly to the overall score. Moreover, many values were missing indicating that observers had difficulty in assigning a score for these three AUs. For the AU whisker retraction, a negative weight was even found and inclusion of this AU actually resulted in a lower accuracy, which indicates it might have confused the observers in the pain assessment. However, it should be taken into account that sensitivity, specificity and accuracy were calculated using the assumption (supported by LDA results) that all AUs were absent at T B5 and moderately or obviously present at T 5 in each photograph, which might not have been the case as the FGS is not yet validated. This assumption might have led to an erroneously low or high accuracy. Moreover, since this, to the authors' knowledge, is the first study to assess the weight of individual AUs, it is currently not possible to compare our results with other grimace scales that incorporate whisker retraction, nose bulging, cheek bulging and ear changes, such as those of the mouse, rat and rabbit [11][12][13]. Nevertheless, our findings suggest that the description of the four AUs that resulted in low or negative scores should be re-evaluated to explore whether clarity and/ or visibility can be increased. Additionally, the effects of live scoring, other photograph angles and settings should be explored as this might result in more reliable results for all AUs.

Conclusion
Overall, the results of this study suggest that the FGS and the AU orbital tightening in particular, could be useful for pain assessment of ferrets. The other AUs (whisker retraction, nose bulging, cheek bulging and ear changes) should be re-evaluated before they can be included in the FGS. Furthermore, prior to incorporating the FGS in a multifactorial pain assessment protocol, it should be further validated using different painful stimuli, analgesic regimens and measuring more time points.