Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A critical evaluation for validation of composite and unidimensional postoperative pain scales in horses

  • Paula Barreto da Rocha ,

    Contributed equally to this work with: Paula Barreto da Rocha, Bernd Driessen, Stelio Pacca Loureiro Luna

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Surgical Specialties and Anesthesiology, Medical School, São Paulo State University (Unesp), Botucatu, São Paulo, Brazil

  • Bernd Driessen ,

    Contributed equally to this work with: Paula Barreto da Rocha, Bernd Driessen, Stelio Pacca Loureiro Luna

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Clinical Studies, New Bolton Center, School of Veterinary Medicine, University of Pennsylvania, Kennett Square, Pennsylvania, United States of America

  • Sue M. McDonnell ,

    Roles Conceptualization, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Clinical Studies, New Bolton Center, School of Veterinary Medicine, University of Pennsylvania, Kennett Square, Pennsylvania, United States of America

  • Klaus Hopster ,

    Roles Conceptualization, Project administration

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Clinical Studies, New Bolton Center, School of Veterinary Medicine, University of Pennsylvania, Kennett Square, Pennsylvania, United States of America

  • Laura Zarucco ,

    Roles Data curation, Investigation

    ‡ These authors also contributed equally to this work.

    Affiliation Dipartimento di Scienze Veterinarie, Università degli Studi di Torino, Grugliasco, Italy

  • Miguel Gozalo-Marcilla ,

    Roles Data curation, Investigation

    ‡ These authors also contributed equally to this work.

    Affiliation The Royal (Dick) School of Veterinary Studies and the Roslin Institute, The University of Edinburgh, Edinburgh, Midlothian, United Kingdom

  • Charlotte Hopster-Iversen ,

    Roles Data curation, Investigation

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Veterinary Clinical Sciences, Section of Medicine and Surgery, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark

  • Pedro Henrique Esteves Trindade ,

    Roles Formal analysis

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Veterinary Surgery and Animal Reproduction, School of Veterinary Medicine and Animal Science, São Paulo State University (Unesp), Botucatu, São Paulo, Brazil

  • Thamiris Kristine Gonzaga da Rocha ,

    Roles Data curation, Investigation

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Veterinary Surgery and Animal Reproduction, School of Veterinary Medicine and Animal Science, São Paulo State University (Unesp), Botucatu, São Paulo, Brazil

  • Marilda Onghero Taffarel ,

    Roles Conceptualization, Methodology

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Veterinary Medicine, Maringá State University, Maringá, Paraná, Brazil

  • Bruna Bodini Alonso ,

    Roles Investigation

    ‡ These authors also contributed equally to this work.

    Affiliation Faculty of Animal Science and Food Engineering, Sao Paulo State University, Botucatu, Brazil

  • Stijn Schauvliege ,

    Roles Conceptualization, Project administration, Writing – original draft

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Anesthesiology and Domestic Animal Surgery, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium

  • Stelio Pacca Loureiro Luna

    Contributed equally to this work with: Paula Barreto da Rocha, Bernd Driessen, Stelio Pacca Loureiro Luna

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Veterinary Surgery and Animal Reproduction, School of Veterinary Medicine and Animal Science, São Paulo State University (Unesp), Botucatu, São Paulo, Brazil


Proper pain therapy requires adequate pain assessment. This study evaluated the reliability and validity of the Unesp-Botucatu horse acute pain scale (UHAPS), the Orthopedic Composite Pain Scale (CPS) and unidimensional scales in horses admitted for orthopedic and soft tissue surgery. Forty-two horses were assessed and videotaped before surgery, up to 4 hours postoperatively, up to 3 hours after analgesic treatment, and 24 hours postoperatively (168 video clips). After six evaluators viewing each edited video clip twice in random order at a 20-day interval, they chose whether analgesia would be indicated and applied the Simple Descriptive, Numeric and Visual Analog scales, CPS, and UHAPS. For all evaluators, intra-observer reliability of UHAPS and CPS ranged from 0.70 to 0.97. Reproducibility was variable among the evaluators and ranged from poor to very good for all scales. Principal component analysis showed a weak association among 50% and 62% of the UHAPS and CPS items, respectively. Criterion validity based on Spearman correlation among all scales was above 0.67. Internal consistency was minimally acceptable (0.51–0.64). Item-total correlation was acceptable (0.3–0.7) for 50% and 38% of UHAPS and CPS items, respectively. UHAPS and CPS were specific (90% and 79% respectively), but both were not sensitive (43 and 38%, respectively). Construct validity (responsiveness) was confirmed for all scales because pain scores increased after surgery. The cut-off point for rescue analgesia was ≥ 5 and ≥ 7 for the UHAPS and CPS, respectively. All scales presented adequate repeatability, criterion validity, and partial responsiveness. Both composite scales showed poor association among items, minimally acceptable internal consistency, and weak sensitivity, indicating that they are suboptimal instruments for assessing postoperative pain. Both composite scales require further refinement with the exclusion of redundant or needless items and reduction of their maximum score applied to each item or should be replaced by other tools.


Assessing intensity and duration of nociception/pain in animals is one of the challenges veterinarians are faced with since animals, unlike humans, cannot verbally communicate whether they have an unpleasant or aversive sensory and emotional experience [1]. In the past decade, equine species have been in the center of attention with regard to developing improved pain monitoring tools [2]. However, as in any prey and flight animal, pain assessment in the horse is more complex than in some other domestic species because they have evolved to show little behavioral signs of discomfort whenever being approached or sensing a dangerous situation [3]. Previously employed nonspecific and unidimensional scales/scoring systems to assess pain in this species, such as visual analog, simple numeric, or simple descriptive scales, have major limitations. Shortcomings of those scoring methods include brief observation periods and the scoring system per se, particularly when assessing animals with supposedly mild to moderate pain [46]. Among those, the simple descriptive scale is presumably the least sensitive to identify discrete changes in pain expression [5]. Another limitation related to the evaluator is a lack of experience with the species and/or understanding of species-specific discomfort behaviors [5].

To improve pain evaluation in horses, more complex behavioral [7] or multidimensional scoring systems including the recording of physiological variables [8], have been proposed for specific conditions such as colic [6] and orthopedic trauma [8, 9]. More recently, also, horse grimace scales have been developed to supplement pain assessment [10, 11]. These scales have been tested in animals undergoing castration [10, 12] or suffering from musculoskeletal pain [13], acute pain after experimental noxious stimulation [14], somatic and visceral pain [15], colic [15, 16] or laminitis [17].

Intra- and inter-evaluator reliability and scoring system sensitivity and specificity have been determined for the Unesp-Botucatu composite horse pain scale (UHAPS) [12] and the behavior-based horse pain scales [7, 18] in animals experiencing acute pain. Still, the performance of a pain-scoring instrument in the clinical setting is a critical step in confirming its reliability, which assesses the ability of the instrument to produce similar results when used at different times by different individuals [19], as well as its effectiveness in accurately capturing manifestations of pain and discomfort in equine patients. So far, the efficacy of the composite orthopedic pain scale (CPS) and the Equine Utrecht University scale for facial assessment of pain (EQUUS-FAP) [9] and Equine Acute Abdominal Pain Scale (EAAPS) [18] have been evaluated in a clinical setting.

Multiple confounding factors limit the value of currently available pain scoring systems for use in clinical practice. Among those are the requirement for extended observation periods, the physical proximity of an evaluator to the animal, which may interfere with the animal’s display of pain-related behaviors, interactions of the evaluator with the animal and physical recording of vital parameters (e.g. heart and respiratory rates and body temperature), and the effect of time of day, anesthesia, and analgesia affecting the spontaneous pain behaviors [20]. Furthermore, physical restraint of locomotion by for example bandages and casts, food intake restraint by muzzles, or limited continued visibility of the entire animal or its face (in case of horse grimace scale scoring), due to less than optimal light conditions in the stall, compromise the proper assessment of the animal and therefore scoring.

The primary objective of the present clinical study was to compare psychometric properties obtained with unidimensional (visual analog, simple numeric, and simple descriptive scales) and two composite pain scales (UHAPS and CPS) by evaluators with variable backgrounds in grading perioperative pain in horses. For this purpose, intra- and inter-observer reliability, criterion and construct validity, item-total correlation, internal consistency, responsiveness, sensitivity, specificity, and cut-off scores for the need of rescue analgesia were determined. We hypothesized that the two composite pain scales are more reliable and valid than unidimensional scoring systems for assessing perioperative pain under clinical practice conditions.

Materials and methods

The study was approved by the School of Veterinary Medicine (Unesp) Animal Use and Ethics Committee (protocol 1228/2017) and by the University of Pennsylvania Animal Care and Use Committee (protocol 806321). In Ghent University, no ethics committee protocol was required because the animals did not undergo any procedures except videotaping. Owners, however, signed an authorization stating that they permitted collection of scientific data, both during the hospitalization and treatment of the animal at the hospital, and to use these data anonymously for scientific publications.

The horses enrolled in this study included equine patients from the veterinary teaching hospitals of the Schools of Veterinary Medicine at the University of Pennsylvania, USA, at Unesp-Botucatu, Brazil, and at Ghent University, Belgium. Client consent forms explaining the experimental protocol were obtained from each owner before enrolling an animal into the study. As this was a study performed in client-owned animals, all surgical procedures were performed upon the request of the owners. The anesthetic protocols and analgesic therapies used were those of routine clinical practice at each institution. The primary clinician solely defined the need for rescue analgesia without any interference by the investigators.


Thirty male and twelve female horses of different breeds mean weight of 493 ± 87 kg (280–662 kg) and age ranging from 1 to 25 (8 ± 6.7) years were studied (S1 Table). For inclusion in the study, horses were required to be over one year of age, weigh more than 200 kg, and accept halter placement. Of a total of 59 horses initially scheduled for inclusion in the study, 17 were excluded due to discharge before 24 hours post-surgery, technical problems with filming, or difficulty with handling. All horses underwent a physical examination and, when necessary, laboratory testing based on their specific needs; however, no animals were excluded because of the results of these tests. The majority of horses were acclimated for six or more hours in the stall. The acclimation interval was less for those that were admitted to the hospital in the morning of the day of their surgical procedure.


Each horse was placed in an individual stall, without physical contact to other animals, although the stalls allowed visual, auditory, and olfactory contact with other hospital patients. Hay and water were available ad libitum to all horses, except during food and water withholding in the 4–12 hours prior to induction of anesthesia. Horses were kept in their stalls on straw or wood shaving bedding or on a concrete floor.

Anesthetic protocols varied based on the type of procedure performed (S2 Table). Patients were sedated with α2-agonists (xylazine, detomidine, dexmedetomidine, or romifidine). Acepromazine was also used in 22 horses as pre-anesthetic medication and perioperative opioids were used in 24 horses (butorphanol and/or morphine) (Fig 1). In 36 horses, non-steroidal anti-inflammatory drugs were administered IV or IM either in the preoperative or intraoperative periods. Anesthesia was induced with ketamine and midazolam or diazepam. Two horses also received guaifenesin. For maintenance of anesthesia isoflurane in 100% oxygen was used, except in three horses, in which anesthesia was maintained with desflurane in 100% oxygen. Two horses underwent procedures under sedation with continuous rate infusion of detomidine or romifidine. Thirty-two surgeries were performed in the morning and 10 in the afternoon. The responsible surgeon defined the need and timing of postoperative rescue analgesia, which always consisted of phenylbutazone 4.4 mg/kg BID (Butatabs E® or ButaJect®, Henry Schein Animal Health, Ohio–USA) orally (n = 34) or IV (n = 2), or flunixine meglumine 1.1 mg/kg BID IV (n = 6) (Injectable Flunixin UCB®, UCB Vet Saúde Animal, Jaboticabal- Brazil). Additionally, one horse received morphine 0.1 mg/kg via an epidural catheter (Morphine sulfate injection, USP, West-Ward ®, New Jersey, USA). Anesthetic recovery occurred in padded recovery rooms (n = 41) or a swimming pool (n = 1).

Fig 1. Timeline of interventions and points at which drugs were given and pain ratings were performed.

Presential evaluation

Preoperative evaluation was performed between 1 and 12 hours before sedation (M0) and postoperative evaluations were performed up to 4 hours after the end of surgery (M1), up to 3 hours after IV or oral administration of analgesic drugs (M2), and 24 hours after the end of surgery (M3). The camera (GoPro Hero5 GoPro, Inc., California–USA) used to video record the evaluations was positioned outside the stall but in view of the horse. The recording was performed for 10 minutes without the presence of the evaluator (lead investigator), who remained at a distance from the stall, with no eye contact with the horse, and up to 5 minutes with the evaluator present, following the sequence of the CPS. Before entering the stall, the evaluator observed the horse’s behavior for one minute and then entered the stall to measure physiological variables (Fig 1). After the evaluation, the evaluator completed the scoring sheet in the following sequence: need or not for rescue analgesia according to the evaluator´s clinical experience (RA; being 0 = no and 1 = yes), simple descriptive (0 –no pain, 1 –mild pain, 2 –moderate pain, 3 –severe pain, 4 –very severe pain, 5 –worst possible pain), simple numeric (ordinal numbers form 0 –no pain to 10 –worst possible pain), visual analog scale (10 cm line from 0 –no pain to 10—worst possible pain), CPS [8] and the UHAPS [12]. Twenty of 32 horses at the University of Pennsylvania that were included in this report, simultaneously participated in another study that included uninterrupted 24-hour video recording for the duration of hospitalization [3].

Remote evaluation

The video recordings were cut into sequences lasting between 2 and 4 minutes per moment by the lead investigator who produced clips that included examples of all behaviors displayed by the animals during the 15-minute recordings. The first 30 seconds consisted of displaying the horse without the presence of the evaluator in the stall and the remaining time included the presential appraisal period with the evaluator entering the stall, interacting with the horse, halter placement in cases where the horse was not already wearing a halter, free and guided locomotion, hay offering when allowed, and palpation of the surgery site(s). For remote evaluations, the video clips for the four time points (M0-M3) and for the horses were viewed in random order (Microsoft Excel® 2016). Six evaluators with varying degrees of equine clinical experience each independently viewed and evaluated the video clips in the order described before for on-site evaluation. These included one senior equine surgeon with many years of experience in assessing horses for pain and discomfort (later determined to be RE–reference evaluator), one anesthesiologist; one equine internist; the lead investigator who recorded and edited the video clips; an equine veterinary technician with over 20 years of clinical and research experience assessing discomfort and pain in horses; and a graduate student in veterinary medicine without clinical experience. These evaluators viewed each clip once (without any rewinding and reviewing), and immediately completed the scoring of the various scales in the same order as for the on-site evaluation. At least 20 days later, these evaluators repeated the procedure (second evaluation cycle). At the time of the first evaluation cycle, the evaluators had not been informed that they would be asked to repeat the evaluations. Physiological parameters that had been recorded during the on-site evaluation were used for remote scoring.

Statistical analysis

Statistical analyses were performed as described in previous studies [12, 2125] using R software in the Rstudio integrated development environment (Version 1.0.143 - ©2009–2016, Rstudio, Inc.). For all analyses, α was set to 5%. The rationale behind the psychometric tests performed in this study was based on premises described in Table 1. The methods are also described in Table 1.

Table 1. Methods for statistical validation of the Unesp-Botucatu horse acute pain scale (UHAPS), Composite Pain Scale (CPS), and unidimensional scales to assess postoperative pain in horses.

The traditional methodology for examining concurrent criterion validity is to correlate the scale under investigation against another instrument, ideally a gold standard [19]. For lack of any definitive fully validated scale that allows reliable and accurate pain assessment in horses that could serve as a reference or ‘gold standard’, an alternative approach for testing concurrent criterion validity was (1) to correlate the total scores of the composite scales with the simple numerical, simple descriptive and visual analog scales and (2) to compare all scales among each other scored by the reference evaluator in the 2nd evaluation phase. Another method used for concurrent criterion validation was the agreement between the reference evaluator and the other five evaluators. These methods have been previously used for assessing pain scales in cats [22, 25], cattle [21], pigs [23], and sheep [24].


According to the confidence intervals of all evaluators, intra-observer reliability (repeatability) varied from good (minimal of 0.70) to very good (maximal value 0.97) in the sum of scores for both composite scales and ranged from moderate (minimal of 0.58) to very good (maximal of 0.99) for the unidimensional scales (Table 2). Repeatability based on Kappa coefficient for each behavioral item of the UHAPS and CPS evaluated by the reference evaluator was very good for the majority of items (0.86–1) and only good for ‘pawing’ (0.77) and ‘kicking the abdomen’ (0.66) for each respective scale (S3 Table). The matrix Spearman inter-observer correlation was poor for the veterinary technician (0.46) and student (0.39) vs the equine internist for the UHAPS and veterinary student vs the other three evaluators for the CPS (0.47). Other correlations were moderate (0.53–0.71) or good (reference evaluator vs lead investigator; 0.84 and 0.97 for UHAPS and CPS respectively) in 80% of the comparisons (Table 3). The reference evaluator with the best repeatability among the other evaluators for both composite scales was the same that presented the best inter-observer correlations (Table 3).

Table 2. Repeatability of the Unesp-Botucatu horse acute pain scale (UHAPS), Composite Pain Scale (CPS), and unidimensional scales to assess perioperative pain in horses.

Table 3. Observer matrix correlation for the Unesp-Botucatu horse acute pain scale (UHAPS) and the Composite Pain Scale (CPS) to assess perioperative pain in horses.

Reproducibility combining phases 1 and 2 of video analysis based on inter-observer reliability comparisons against the reference evaluator was reasonable in 10%, moderate in 20%, good in 35% and very good in 35% of comparisons, for both composite scales (Table 4). Reproducibility ranged from poor to very good when each behavior was evaluated separately (S4 Table) and for the indication of rescue analgesia according to clinical experience, simple descriptive, simple numerical and visual analog scales (Table 4). Reproducibility changed little when data from phase 1 were compared to phase 2.

Table 4. Reproducibility of the Unesp-Botucatu horse acute pain scale (UHAPS), Composite Pain Scale (CPS), and unidimensional scales to assess perioperative pain in horses.

Only one dimension of the principal component analysis of the CPS and UHAPS revealed a representative eigenvalue and variance. Therefore, both scales are unidimensional. Only 31% of the CPS items and 50% of the UHAPS items showed a significant association (Table 5).

Table 5. Load, eigenvalue, and variance of the Unesp-Botucatu horse acute pain scale (UHAPS) and the Composite Pain Scale (CPS) submitted to principal component analysis.

In the criterion analysis, when comparing the UHAPS and CPS with the simple descriptive, simple numeric, and visual analog scales, the Spearman correlation was between 0.72 and 0.77, and 0.67 between the composite scales (Table 6), demonstrating criterion validity.

Table 6. Criterion validity based on the correlation between the Unesp-Botucatu horse acute pain scale (UHAPS) and the Composite Pain Scale (CPS) vs unidimensional scales to assess perioperative pain.

When pooled data, i.e. ratings at all four-time points were considered, for the UHAPS, item-total correlation was below the acceptable value (< 0.3) for ‘looking at the flank’, ‘kicking at the abdomen’, ‘lifting hind limbs’, ‘pawing on the floor’ and heart rate (50% of the items—Table 7). For CPS scale, item-total correlation was below the acceptable value (< 0.3) for the items kicking at the abdomen’, ‘response to the observer’ and ‘to palpation’, heart and respiratory rates, ‘digestive’, and ‘temperature’, therefore 38% of the items were below the acceptable value (Table 8). The practical information of these results is that the items that are below the acceptable correlation value are not contributing to the entire scale.

Table 7. Item-total correlation and internal consistency, specificity, and sensitivity for the Unesp-Botucatu horse acute pain scale (UHAPS).

Table 8. Item-total correlation and internal consistency, specificity, and sensitivity for composite pain scale.

Both UHAPS and CPS presented minimally acceptable internal consistency (Cronbach’s α coefficient ≈ 0.60) (Tables 7 and 8). Internal consistency defines how the items mutually influence each other. If the target item is well fit with the other items, one would expect that, after removing it, internal consistency would reduce compared to the internal consistency of the full scale. Both for UHAPS and CPS, the same items not approved by item-total correlation (not in bold) were the ones that, when removed, either affected very little or even increased internal consistency results, indicating that they do not contribute significantly to the scale, because when they were removed, internal consistency was improved.

With the exception of ‘appetite’ in CPS, for both scales, the same items showing low load values in the principal component analysis were the ones that both were not approved by item-total correlation and showed weak internal consistency.

For the UHAPS, the specificity ranged from 79–100% (Table 7) and for the CPS, specificity ranged from 69% (respiratory rate) to 100%. (Table 8). Neither one of the scoring systems showed proper sensitivity (Tables 7 and 8). Low sensitivity results indicate that none of the scales were good enough to recognize pain in horses that were possibly suffering pain (true positives), otherwise the UHAPS and CPS correctly diagnosed pain-free horses in 90% and 79% of the cases, respectively (specificity).

With regard to responsiveness, the scores given by all evaluators were greater at M1 (postsurgery, before analgesic rescue) and for four evaluators at M2 (after analgesic rescue) when compared to the baseline (M0) for the UHAPS (Fig 2; Table 9; S5 Table). Except for the veterinary student, the same result occurred for the unidimensional scales (Table 9; S5 Table). CPS scores were greater at M2 than at M0 only for the reference evaluator and lead investigator, and greater at M1 than M0 only for the anesthesiologist and veterinary technician (Fig 3; Table 9; S5 Table). Indication for the need of rescue analgesia based on the clinical experience-driven judgment was greater at M1 and M2 vs M0, only according to the veterinary student (S5 Table).

Fig 2. Box plots of the total scores of the Unesp-Botucatu horse acute pain scale (UHAPS) over time.

M0 –before surgery; M1 –up to 4 hours after surgery; M2—up to 3 hours after analgesic treatment; M3–24 hours after surgery. Small letters represent statistically significant differences (a > b).

Fig 3. Box plots of the total scores of the Composite Pain Scale (CPS) over time.

M0 –before surgery; M1 –up to 4 hours after surgery; M2—up to 3 hours after analgesic treatment; M3–24 hours after surgery. Small letters represent statistically significant differences (a > b).

Table 9. Median (range) scores of the Unesp-Botucatu horse acute pain scale (UHAPS), the Composite Pain Scale (CPS), and unidimensional scales in the perioperative period in horses.

The distribution of scores for each item is presented for each time point in Figs 4 and 5 for the UHAPS and CPS, respectively. This information assesses the relevance of each graduation within the item. ‘Position in the stall’ was the only item where score 2 appeared to be relevant for the UHAPS. ‘Appetite’ and ‘respiratory rate’ were the only items where score 3 were relevant in the CPS.

Fig 4. Distribution of scores for each item of the Unesp-Botucatu horse acute pain scale (UHAPS) at all perioperative moments.

Fig 5. Distribution of scores for each item of the Composite Pain Scale (CPS) at all perioperative moments.

The cut-off points for rescue analgesia were defined by the receiver operating characteristic (ROC) curve and the Youden index. They were ≥ 5 for the UHAPS (Area under the curve—AUC 0.89), ≥ 7 for the CPS (AUC 0.90), ≥ 6 for visual analog (AUC 0.99), ≥ 3 of 5 for simple descriptive (AUC 0.98) and ≥ 5 of 10 for simple numeric scales (AUC 0.99) (Fig 6; Table 10). When these scales are used to assess pain, these scores suggest that rescue analgesia is indicated, as they represent the optimal simultaneous result of sensitivity and specificity, minimizing the possibility of providing analgesia to a pain-free horse and maximizing the possibility of providing analgesia to a horse suffering pain.

Fig 6. ROC curve and area under the curve (AUC) for the Unesp-Botucatu horse acute pain scale (UHAPS) (left) and the Composite Pain Scale (CPS) (right).

*ROC: Receiver operating characteristic curve; AUC 1 excellent; AUC 0.9–0.8 very good; AUC 0.8–0.7 good; AUC 0.7–0.6 sufficient; AUC 0.6–0.5 bad; AUC < 0.5 not-discriminative.

Table 10. Youden index based on highest sensitivity and specificity for the Unesp-Botucatu horse acute pain scale (UHAPS), the Composite Pain Scale (CPS), and unidimensional scales.

The number of horses presenting reference evaluator pain scores equal to or above the Youden Index increased progressively from M0 to M2 and decreased at M3 (24h after surgery) in both composite scales (Table 11). Differences were observed between M0 and M2 for the UHAPS only (Chi-square X2 10.41, p 0.00). Indication of rescue analgesia based on clinical experience-driven judgment and based on the Youden index of unidimensional scales was similar in all moments (Table 11). At M2 number of horses demanding rescue analgesia according to composite scales was higher than clinical experience and unidimensional scales.

Table 11. Number of horses requiring rescue analgesia would be indicated based on clinical experience and the Youden index.


The current study tested the UHAPS, CPS and unidimensional scales for assessment of perioperative pain in a clinical population of horses undergoing a wide variety of surgical procedures. To investigate whether observers with different backgrounds and skills could use the pain scoring instruments well, a rather diverse group of evaluators was picked, including veterinarians with different levels of experience in assessing horses in pain, a technician with extensive familiarity with behavioral assessment of horses and a student as an inexperienced (naive) observer. In addition, the contribution of each evaluation item to the total score was conducted, thereby identifying its relevance within each scoring system.

While the repeatability of the total scores for both composite scales ranged from good to very good and was similar to that observed in the original studies presenting and evaluating the scales [8, 12], the reproducibility of the composite scales was rather variable. Most of the evaluators´ correlations against the reference evaluator were good to very good, however, except for the naive observer, our original hypothesis that the two composite scoring systems would offer the advantage of better reliability over the unidimensional scales, must be rejected. This is a somewhat surprising result because in other species, composite behavioral scales perform better than unidimensional ones. The ability to distinguish different levels of pain/discomfort in rats was better with a behavioral vs a visual analog scale [38], and this was unrelated to previous evaluators’ experiences in laboratory animal management and behavior. In dogs, reproducibility of visual analog scale scoring of pain/discomfort was poor or moderate among anesthetists [39]. A very good agreement in pain scoring was found only between the reference evaluator and lead investigator, similar to previously reported data for the CPS scale, even when inexperienced persons (students) were the evaluators [9]. However, the raters in those previous studies were not blinded, as much as the lead investigator in the present study, who was aware of the case management, had performed on-site evaluations and video-recorded most of the equine patients. In contrast, the other five evaluators, for which inter-observer reliability was lower, were blinded. Therefore, the high inter-observer reliability for the lead investigator was likely due to observer bias, a commonly noted confounding factor in animal behavior research [40].

The wide variation in reproducibility discovered in this study for all scales requires critical analysis. Only differences in familiarity with the species in general and pain assessment in particular or with the pain scales tested do not explain the highly variable inter-observer reliability data. At the time of the study, except the veterinary student, all evaluators had 15–25 years of experience in clinical practice and thus with assessment of discomfort and pain in equine patients, yet their pain scorings varied a lot, independent of the scale. More experience obtained after phase 1 analysis did not improve reproducibility, even for the naive observer. Training evaluators in the use of the scoring systems and providing them with more detailed instructions and/or video demonstrations for each item of those scales, as is available online at for pain scales for cats [22], cattle [21], sheep [24] and pigs [23], may improve intra- and inter-observer reliability, even with experienced evaluators [38].

It is noteworthy that the need for rescue analgesia according to clinical experience had the lowest intra-observer reliability and a considerable variation on inter-observer reliability. This assessment depends on many factors other than the perceived level of pain. Some individuals believe that it is reasonable for postoperative patients to experience moderate pain while others believe that any level of pain should be treated.

Considering that unlike in human patients, where self-reporting of pain is conceivable, there is no gold standard instrument to compare against a new scoring tool in animals; an alternative model is to use a visual analog scale as a ‘gold standard’ as described in previous studies [7, 11, 12, 2124] and in children and elderly people [41, 42]. In our case, the criterion validity analysis, aimed at determining the efficacy of the scales based on mutual comparisons, showed a moderate to good correlation among the different scoring tools. An alternative method used to assess criterion validity was to compare the inter-evaluator agreement against the reference evaluator as reported before in cats [22, 25], cattle [21], sheep [24], pigs [23] and children [43]. According to those criteria, the inter-observer agreement was very good only for the lead investigator and ranged from reasonable to good for the other evaluators for both composite scales.

The item-total correlation coefficient provides information regarding the importance of each item to the scale and identifies items that afford a relevant contribution to the total score. This correlation was acceptable for only 50% of the items of the UHAPS (5 out of 10) and 38% of the CPS (5 out of 13). In agreement with the item-total data, the principal component analysis revealed that both scales presented only one dimension and only 50% and 31% of the items of the UHAPS and CPS, respectively were associated in representative dimension. Furthermore, low load values indicated that some items contribute little to both scales, suggesting that some of the evaluation criteria in those scales might be inadequate or obsolete, calling for their exclusion from the rating system in future refinements prior to retesting them again in clinical patients.

The internal consistency was close but below 0.6 for most of the items in both scales, indicating that the ratio of scale items to each other is minimally acceptable. The low internal consistency is probably related to the limited sensitivity of the scale items. When interpreting Cronbach’s α coefficient, it should be considered that, firstly, this index is a characteristic of the test score, not the test as a whole [37], that is, it depends on both the population tested and the test itself. A high α value is a prerequisite for internal consistency but does not guarantee it. Multidimensional scales with many items tend to have higher α values, which are not always desirable [37]. Comparing the results of this study with other studies in other species, the need for refinement of both composite scales is apparent. For example, in other studies examining the composite pain scales in cats [22], cattle [21] and pigs [23], Cronbach’s internal consistencies were equal or above 0.84, and thus considered excellent. Results of internal consistency followed the same pattern of the principal component analysis and item-total correlation, strengthening the argument that these items (looking at the flank, kicking at the abdomen, lifting hind limbs, pawing on the floor and heart rate for UHAPS and kicking at the abdomen, response to observer, response to palpation of the painful area, heart and respiratory rates, digestive sounds, temperature and sweating for the CPS) should be excluded from the scales because they do not contribute to the construct. The unsatisfactory results for looking at the flank and lifting hind limbs are somewhat surprising. In a recent study, their frequency actually increased in horses with postorchiectomy pain, and they were unaffected by the time of day, anesthesia regimen, and analgesia. Otherwise in accordance with our results, reduction in time spent eating (equivalent to the appetite for CPS) and looking at the back of the stall (equivalent to positioning in the stall for UHAPS), and decreased walking (equivalent to locomotion for UHAPS) were the other behaviors associated only with pain [20].

Sensitivity and specificity are commonly used tests for laboratory diagnostic purposes [19]; however, pain is not a binary response. Specificity was good for the UHAPS and moderate for CPS, so these instruments correctly diagnose pain-free horses in most cases. Still, neither composite scale was sensitive, so they fail to correctly reveal whether horses experience pain. In previous studies examining the value of pain scales in other species and horses [12, 2125], the animals were submitted to elective surgeries and were pain-free preoperatively, different from this study. For that reason, only pain-free horses (at M0) were considered when calculating specificity (total scores of UHAPS < 5 and CPS < 7). Sensitivity was, nevertheless, weak with both scales, either because M1 and M2 did not necessarily represent periods of intense pain due to the variety of perioperative analgesic protocols used or because none of these scales is sensitive enough, a point that requires further studies.

Responsiveness is ‘the ability of an instrument to measure a meaningful or clinically important change in a clinical state’ [44]. It is either considered a third measurement property of an instrument/tool together with reliability and validity or a construction validity test [19]. All scales tested in this study demonstrated responsiveness with pain scores higher postsurgery than presurgery, for all observers for UHAPS and for all evaluators except the veterinary student for the unidimensional scales and for four of six observers for the CPS. Likewise, the number of horses judged as needing rescue analgesia according to the Youden index progressively increased from M0 to M2 for both composite scales. However, the low sensitivity of the scales reflects the low pain scores postsurgery (below the cut-off point) and in the small percentage of horses requiring rescue analgesia according to the cut-off point determined on the basis of the Youden index.

The postsurgery pain scores in this study are similar to those reported previously in horses undergoing orthopedic surgery [9], however in our patient cohort, postoperative analgesia with NSAIDs did not significantly reduce pain scores as would be anticipated based on study design used and a number of references [12, 2125]. Otherwise pain scores at M2 remained unexpectedly high, either because the analgesic treatment was not effective enough to suppress pain expression at the time the video-clip was recorded, given that neither time of treatment (ordered in a dose decided by the primary clinician) nor the time of video-recordings were matched or because the choice of postoperative analgesia was oral phenylbutazone in most horses. Surprisingly, a shift was observed at the time of greatest pain from M1 (prior to postoperative analgesia) to M2 (1 to 3 hours after analgesic drug treatment), which was approximately 10 hours postsurgery. The fact that horses were offered hay ad libitum after surgery possibly affected the bioavailability of oral phenylbutazone, which depends on food availability. Hhay given to horses near the time of medication prolongs intestinal drug absorption for at least 10 hours with a maximum time to peak levels (Tmax) increasing to 12 hours [45, 46]. Thus, the period evaluated as ‘most painful’ ranged from M1 to M2.

An important limitation of all scales tested was that at the moment of supposedly maximum pain (M1), the median scores were 13% for CPS, 18% for UHAPS and about 25% for the unidimensional scales of the maximum possible scores. Watching the video clips in a random order of horses and moments made 5 of the 6 observers blinds to the time points, even though certain clues, such as the presence of IV catheters and leg bandages or similar, allowed the attentive observer to at least distinguish between pre and postoperative recordings. The differences in reproducibility among evaluators and limited responsiveness of the scales in this study might be related to the different methodological design from that used in previous studies. Although like in this one, in the original studies that developed the CPS [8] and the UHAPS [12] the evaluators were blinds, the differences were that those studies were performed under controlled experimental conditions by using only one standard nociceptive stimulus (i.e. amphotericin-B intra-articular injection for CPS and orchiectomy for the UHAPS), unlike here horses were pain-free at the baseline moment; only moments, rather than horses and moments (like here) were randomized, and evaluators were not allowed to rewind and re-review the videos. There are ‘pros’ and ‘cons’ in this approach. The ‘cons’ is that that the observers might miss some details/behaviors that could be better assessed or picked up by re-viewing the videos and ‘pros’ was that a single video viewing would guarantee more uniformity in assessment among all evaluators. Pain assessment within veterinary hospitals is typically in real-time without video recordings that could be replayed.

The present study tried to mimic a clinical situation in which there is little time to evaluate the animals. In the study by Van Loon et al. [9], responsiveness was satisfactory even in a clinical setting, possibly due to a larger number of evaluated animals, the fact that the evaluators were not blinded to the conditions of the animals; pain was assessed in real time and during a longer (i.e. 10-minute) observation period, compared to the short length of video clips in this study, even considering that lead investigator carefully tried to condense all behaviors displayed in the 15-minute recordings. Those factors might have significantly influenced the evaluators’ judgments and contributed to the assignment of higher pain scores in the immediate postoperative period on that study [9]. In the present study, there was less bias based on the expectation for five blind observers. Another factor that may have curtailed responsiveness was the fact that 32 of the 42 animals underwent orthopedic surgery (S1 Table) and thus most likely suffered some degree of pain prior to surgery. The important point is that the sensitivity to change is not only a characteristic inherent of an instrument but it is also related to the effects of an intervention [19]. Therefore, the low percent of change in pain scores observed in the current study also reflected the impact of the intervention, given the variety of procedures and intensities of pre-operative pain, and variable response to postoperative analgesia.

When analyzing the reference evaluators’ ratings individually, prior to surgery four and nine horses already had a score calling for rescue analgesia according to the UHAPS and the CPS scales, respectively. In addition, on-site evaluations or distraction of the horses by people and/or other animals passing through the barn may disrupt ongoing discomfort behavior in hospitalized horses, thereby compromising the pain rating and detecting responsiveness. Simultaneously to the current study, a parallel study was performed, in which 20 of the clinical patients were video-recorded for 24 hours, which included a caretaker visit (either to observe and examine or to administer treatment) that was both preceded and followed by one hour of no disturbance (no one interacting with the horse and no indication from the video of the presence of staff or other disturbance in the barn) [3]. This study revealed immediate changes in discomfort behavior in horses as soon as an observer or caretaker approached the animal or entered its stall. Consequently, even in moments of severe pain, horses may not have displayed certain pain-related behaviors as a self-protective strategy [3]. This emphasizes the importance of monitoring animals when they are undisturbed by using remote recording modalities [12, 20]. To allow for undisturbed pain/discomfort behavior monitoring, one should consider i) omitting any recording of physiological parameters and hence eliminating those items from the composite pain scales, or ii) obtaining them at a different time than remote observation, or iii) obtaining those data by means of telemetry technology, such as heart rate monitors. Still, the importance of physiological parameters in assessing pain in animals is disputable [47, 48], and therefore they may all be eliminated after refinement. Another point to consider is that after surgery, all animals were under the effect of perioperative analgesics, which probably decreased or even suppressed pain and reduced pain scores.

The maximum total scores reached 9 out of 17 in the UHAPS and 18 out of 39 in the CPS, corresponding to 53% and 46% of the maximum possible scores. These scales include some mutually exclusive items; most of them are probably unspecific (e.g. ‘pawing’, ‘head movements’, ‘locomotion’), while at least ‘kicking at abdomen’ is specific for abdominal pain; however only one item contributes little to the total score. Based on the scales published to date, except for abdominal surgery pain, the maximum pain scores are low even in horses with severe pain, i.e., at most 60% in relation to the total sum of the scales [8, 10, 15]. In none of the previous CPS studies [8, 9, 49] scores ever reached this maximum, including horses not previously treated with analgesics [8] and even if the patient had to be euthanized due to unrelenting pain caused by hoof cancer. Trauma patients reached a maximum of 22 of 39 [9]. Considering the distribution of scores for each item recorded with the composite scales to rate perioperative pain (time-pooled data) (Figs 4 and 5), except for ‘positioning’, all score 2 items for the UHAPS seem redundant, suggesting that the scores should only be binary (0 and 1). For the same reason CPS score levels could also be reduced to three instead of four. Therefore, high scores indicative of intense pain would be closer to the total score.

A well-defined cut-off point defined by the Youden index was obtained for all scales, with an area under the curve between 0.89 and 1. These results agree with the study of Van Loon et al. [9] who suggested scores between 5 and 8 as signifying mild pain. Rescue analgesia scores derived from ROC curves represented 29% (5/17) and 18% (7/39) of maximally possible scores for the UHAPS and CPS scales, respectively, compared to 27% of maximally possible scores in cats [22] and 41% in cattle [21] and 33% in pigs [23] and sheep [24], slightly higher than those determined for the CPS. Interestingly, corresponding rescue analgesia scores were equal to 60% of the maximum visual analog scale and simple descriptive scores and to 50% of the maximum simple numeric score, thus well above values determined for the UHAPS and CPS and for the composite and unidimensional scales in other species [2224]. At the moment of most intense pain (M2), the number of rescue analgesia indication according to Youden index of unidimensional scales was, therefore, lower than that indicated for composite scales showing that unidimensional scales may be even less sensitive to recognize pain. Those cut-off points shall serve the equine practitioner and clinician as a guide in prescribing analgesic treatment, as they ensure with greater certainty that an animal in significant pain receives proper analgesic treatment (sensitivity). In contrast, an animal not in pain will not receive such therapy (specificity).

The present study had limitations. Considering that evaluators already made a subjective assessment of whether the animal needed analgesia before performing any scoring, the assessors would bias themselves to score patients they deemed in need of analgesia beforehand higher than patients they considered not in need of analgesia. This methodological approach has been used to define the cut-off point for rescue analgesia in cats, cattle, pigs, and sheep [2125] and was necessary to build the ROC curve and define the Youden index. Regarding the lack of randomization for assigning scoring systems, the unidimensional scales were scored first because if the composite scales were employed first, their descriptive levels indicating pain behaviors might have influenced the assessment/outcome with the unidimensional scales potentially overestimated their reliability and validity.

A number of factors might have confounded the outcome and thus, interpretation of the data analysis: i. one (the lead investigator) of the 6 evaluators was not blinded and therefore likely biased, ii. the reference evaluator selection was based on an arbitrary basis and iii. on-site evaluations and video recordings did not occur at predefined time points relative to analgesic treatments, thus the impact of analgesic medications during times of pain assessments might have been variable from animal to animal.

A truly validated (‘gold standard’) method of pain assessment in horses is not available to reliably determine the accuracy and efficacy of an assessment tool in clinical practice. Meanwhile, studies should increase evidence towards validation of a pain assessment scoring system rather than consider the scoring system as validated. These studies should include not only the traditional approach of the three ‘Cs’ validities (content, construct and criterion), but follow psychometric scoring standards including, for example, test content, response processes, internal structure, relations to other variables, and consequences of testing, to broaden the spectrum of these analyses [50].

In summary, according to our results, the practical outcome we suggest for future studies aiming to validate UHAPS and CPS is to reduce the maximal score of each item of these instruments to 1 and 2, respectively. According to the distribution frequency, the highest scores are not present in most of the items and other behaviors than ‘positioning in the stall’, ‘locomotion’, ‘locomotion when led by the evaluator’, ‘response to palpation of the painful area’, and ‘head movements’ for UHAPS and ‘appearance’, ‘pawing on the floor’, ‘posture’, and ‘head movement’ for CPS should be individually reassessed to investigate their importance, because their results of PCA, item-total correlation, and internal consistency were below the ideal. Except for appetite, the contribution of other physiological data for equine pain assessment should be reinvestigated or changed to remote estimation.


Reliability and responsiveness of CPS and UHAPS were similar to the unidimensional scales. Both, the UHAPS and the CPS presented overall good or very good repeatability. In contrast, the overall reproducibility was variable. Criterion validity was good (UHAPS) and moderate (CPS), responsiveness occurred to postoperative pain but not to rescue analgesia, both instruments were specific, and rescue analgesia cut-off points were well defined. The UHAPS might be somewhat superior with regard to criterion validity, the association among items, item-total correlation, and responsiveness when compared to the CPS. However, both the UHAPS and CPS scales are, in their current form, suboptimal instruments for assessing pain in equine patients given the poor association between the items of each scale, minimally acceptable internal consistency, and weak sensitivity. Therefore, both composite scales, except for the definition of intervention analgesia, are not apparently superior to unidimensional scoring systems, when used by experienced observers, until they undergo further refinement to exclude unnecessary items or be replaced by other tools for a more consistent pain and discomfort assessment in horses.

Supporting information

S1 Table. Demographic data including sex, breed, age, weight, procedure and perioperative analgesia, and institution of each horse included in this validation study.


S2 Table. Anesthetic and analgesic drug protocols.


S3 Table. Repeatability of the UHAPS, CPS and unidimensional scales to assess perioperative pain in horses.


S4 Table. Reproducibility of the UHAPS, CPS and unidimensional scales to assess perioperative pain in horses.


S5 Table. Median (range) scores of the UHAPS and CPS and unidimensional scales in the perioperative period in horses.



The authors would like to thank Jaime Miller, CVT, for participating in the remote video evaluations; Dr. Juliana Alonso, surgeons Dr. João Pedro Pfeifer, MSc Gustavo dos Santos Rosa and DVM Heitor Cestari for performing the surgeries and postoperative care, especially during the colic syndrome emergency.


  1. 1. Molony V, Kent JE. Assessment of acute pain in farm animals using behavioral and physiological measurements. J Anim Sci 1997;75:266–72. pmid:9027575
  2. 2. De Grauw JC, Van Loon JPAM. Systematic pain assessment in horses. Vet J 2015. pmid:26831169
  3. 3. Torcivia C, Mcdonnell S. In-person caretaker visits disrupt ongoing discomfort behavior in hospitalized equine. Animals 2020;10. pmid:32012670
  4. 4. Lindegaard C, Thomsen MH, Larsen S, Andersen PH. Analgesic efficacy of intra-articular morphine in experimentally induced radiocarpal synovitis in horses. Vet Anaesth Analg 2010;37:171–85. pmid:20230568
  5. 5. Ashley FH, Whay HR, Waterman-Pearson EA. Behavioural assessment of pain in horses and donkeys: application to clinical practice and future studies. Equine Vet J 2005;37:565–75. pmid:16295937
  6. 6. Sutton GA, Paltiel O, Soffer M, Turner D. Validation of two behaviour-based pain scales for horses with acute colic. Vet J 2013;197:646–50. pmid:23993390
  7. 7. Sutton Abells G, Dahan R, Turner D, Paltiel O. A behaviour-based pain scale for horses with acute colic: Scale construction. Vet J 2013;196:394–401. pmid:23141961
  8. 8. Bussières G, Jacques C, Lainay O, Beauchamp G, Leblond A, Cadoré JL, et al. Development of a composite orthopaedic pain scale in horses. Res Vet Sci 2008;85:294–306. pmid:18061637
  9. 9. van Loon JPAM, Van Dierendonck MC. Pain assessment in horses after orthopaedic surgery and with orthopaedic trauma. Vet J 2019;246:85–91. pmid:30902195
  10. 10. Dalla Costa E, Minero M, Lebelt D, Stucke D, Canali E, Leach MC. Development of the Horse Grimace Scale (HGS) as a pain assessment tool in horses undergoing routine castration. PLoS One 2014;9. pmid:24647606
  11. 11. van Loon JPAM Van Dierendonck MC. Monitoring acute equine visceral pain with the Equine Utrecht University Scale for Composite Pain Assessment (EQUUS-COMPASS) and the Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP): A scale-construction study. Vet J 2015;206:356–64. pmid:26526526
  12. 12. Taffarel MO, Luna SPL, Oliveira FA, Cardoso GS, Moura Alonso J, Pantoja JC, et al. Refinement and partial validation of the UNESP-Botucatu multidimensional composite pain scale for assessing postoperative pain in horses. BMC Vet Res 2015;11. pmid:25888751
  13. 13. Dyson S, Berger J, Ellis AD, Mullard J. Development of an ethogram for a pain scoring system in ridden horses and its application to determine the presence of musculoskeletal pain. J Vet Behav Clin Appl Res 2018;23:47–57.
  14. 14. Gleerup KB, Forkman B, Lindegaard C, Andersen PH. An equine pain face. Vet Anaesth Analg 2015;42:103–14. pmid:25082060
  15. 15. van Loon JPAM, Back W, Hellebrekers LJ, van Weeren PR. Application of a composite pain scale to objectively monitor horses with somatic and visceral pain under hospital conditions. J Equine Vet Sci 2010;30:641–9.
  16. 16. Sutton GA, Atamna R, Steinman A, Mair TS. Comparison of three acute colic pain scales: Reliability, validity and usability. Vet J 2019;246:71–7. pmid:30902193
  17. 17. dalla Costa E, Stucke D, Dai F, Minero M, Leach MC, Lebelt D. Using the Horse Grimace Scale (HGS) to assess pain associated with acute laminitis in horses. Animals 2016;6:9. pmid:27527224
  18. 18. Maskato Y, Dugdale AHA, Singer ER, Kelmer G, Sutton GA. Prospective feasibility and revalidation of the Equine Acute Abdominal Pain Scale (EAAPS) in clinical cases of colic in horses. Animals 2020;10:2242. pmid:33260428
  19. 19. Streiner DL, Norman GR, Cairney J. Health measurement scales: a practical guide to their development and use. 5th ed. Oxford: Oxford University Press; 2015.
  20. 20. Trindade PHE, Taffarel MO, Luna SPL. Spontaneous behaviors of post-orchiectomy pain in horses regardless of the effects of time of day, anesthesia, and analgesia. Animals;11:1629. pmid:34072875
  21. 21. Oliveira FA, Luna SPL, do Amaral JB, Rodrigues KA, Sant’Anna AC, Daolio M, et al. Validation of the UNESP-Botucatu unidimensional composite pain scale for assessing postoperative pain in cattle. BMC Vet Res 2014;10:200. pmid:25192598
  22. 22. Brondani JT, Vogel PR, Ambrosio J, Niyom S, Luna SPL, Padovani CR, et al. Validation of the English version of the UNESP-Botucatu multidimensional composite pain scale for assessing post-operative pain in cats. BMC Vet Res 2013;9:143. pmid:23867090
  23. 23. Luna SPL, Araújo AL, Nóbrega Neto PI, Brondani JT, Oliveira FA, dos Santos Azerêdo LM, et al. Validation of the UNESP-Botucatu pig composite acute pain scale (UPAPS). PLoS One 2020;15:1–27. pmid:32480399
  24. 24. Silva NEOF, Trindade PHE, Oliveira AR, Taffarel MO, Moreira MAP, Denadai R, et al. Validation of the Unesp-Botucatu composite scale to assess acute postoperative abdominal pain in sheep (USAPS). PLoS One 2020;15. pmid:33052903
  25. 25. Brondani JT, Luna SPL, Padovani CR. Refinement and initial validation of a multidimensional composite scale for use in assessing acute post-operative pain in cats. Am J Vet Res 2011;72:174–83. pmid:21281191
  26. 26. Kaiser HF. The varimax criterion for analytic rotation in factor analysis. Psychometrika 1958;23:187–200.
  27. 27. Altman D. Practical Statistics for Medical Research. London, UK: Chapman and Hall/CRC; 1991.
  28. 28. Schuster C. A Note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educ Psychol Meas 2004;64:243–53.
  29. 29. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74 pmid:843571
  30. 30. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213–20. pmid:19673146
  31. 31. Altman D. Some common problems in medical research. In: Practical Statistics for Medical Research. London, UK: Chapman and Hall/CRC; 1991. p. 404–8.
  32. 32. DeVellis RF. Scale development: theory and applications. 4th ed. Thousand Oaks: SAGE Publications Inc; 2017.
  33. 33. Scherer M, Blozik ÆE, Himmel ÆW, Laptinskaya D, Kochen ÆMM, Herrmann-lingen C. Psychometric properties of a German version of the neck pain and disability scale 2008:922–9. pmid:18437433
  34. 34. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials 1991;12:142S–158S. pmid:1663851
  35. 35. Streiner DL, Cairney J. What’s under the ROC? An introduction to receiver operating characteristics curves. Can J Psychiatry 2007;52:121–8. pmid:17375868
  36. 36. Šimundić A-M. Measures of diagnostic accuracy: basic definitions. EJIFCC 2009;19:203–11. pmid:27683318
  37. 37. Streiner DL. Starting at the beginning: An introduction to coefficient alpha and internal consistency. J Pers Assess 2003;80:99–103. pmid:12584072
  38. 38. Roughan JV., Flecknell PA. Training in behaviour-based post-operative pain scoring in rats—An evaluation based on improved recognition of analgesic requirements. Appl Anim Behav Sci 2006;96:327–42.
  39. 39. Holton LL, Scott EM, Nolan AM, Reid J, Welsh E, Flaherty D. Comparison of three methods used for assessment of pain in dogs. J Am Vet Med Assoc 1998;212:61–6 pmid:9426779
  40. 40. Kaufman AB, Rosenthal R. Can you believe my eyes? The importance of interobserver reliability statistics in observations of animal behaviour. Anim. Behav 2009; 78(6):1487–1491.
  41. 41. Ferrell BA, Stein WM, Beck JC. The geriatric pain measure: Validity, reliability and factor analysis. J Am Geriatr Soc 2000;48:1669–73. pmid:11129760
  42. 42. Hesselgard K, Larsson S, Romner B, Strömblad L-G, Reinstrup P. Validity and reliability of the behavioural observational pain scale for post-operative pain measurement in children 1–7 years of age. Pediatr Crit Care Med 2007;8:102–8. pmid:17273124
  43. 43. Gauvain-Piquard Rodary, Rezvani Serbouti. The development of the DEGR(R): A scale to assess pain in young children with cancer. Eur J Pain 1999;3:165–76. pmid:10700346
  44. 44. Liang MH. Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments. Med Care 2000;38:II84–90. pmid:10982093
  45. 45. Maitho TE, Lees P, Taylor JB. Absorption and pharmacokinetics of phenylbutazone in Welsh Mountain ponies. J Vet Pharmacol Ther 1986;9:26–39. pmid:3701913
  46. 46. Gerring EL, Lees P, Taylor JB. Pharmacokinetics of phenylbutazone and its metabolites in the horse. Equine Vet J 1981;13:152–7. pmid:7297544
  47. 47. Manteca X, Deag JM. Use of physiological measures to assess individual differences in reactivity. Appl Anim Behav Sci 1993;37:265–70.
  48. 48. McCann JS, Heird JC, Bell RW, Lutherer LO. Normal and more highly reactive horses. I. Heart rate, respiration rate and behavioral observations. Appl Anim Behav Sci 1988;19:201–14.
  49. 49. Van Loon JPAM, Jonckheer-sheehy VSM, Back W, Van Weeren PR, Hellebrekers LJ. Monitoring equine visceral pain with a composite pain scale score and correlation with survival after emergency gastrointestinal surgery. Vet J 2014;200:109–15. pmid:24491373
  50. 50. American Educational Research Association, American Psychological Association N, Council on Measurement in Education JC on S for E and, (U.S.) PT. Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 2014.