Developing a pain intensity prediction model using facial expression: A feasibility study with electromyography

The automatic detection of facial expressions of pain is needed to ensure accurate pain assessment of patients who are unable to self-report pain. To overcome the challenges of automatic systems for determining pain levels based on facial expressions in clinical patient monitoring, a surface electromyography method was tested for feasibility in healthy volunteers. In the current study, two types of experimental gradually increasing pain stimuli were induced in thirty-one healthy volunteers who attended the study. We used a surface electromyography method to measure the activity of five facial muscles to detect facial expressions during pain induction. Statistical tests were used to analyze the continuous electromyography data, and a supervised machine learning was applied for pain intensity prediction model. Muscle activation of corrugator supercilii was most strongly associated with self-reported pain, and the levator labii superioris and orbicularis oculi showed a statistically significant increase in muscle activation when the pain stimulus reached subjects’ self -reported pain thresholds. The two strongest features associated with pain, the waveform length of the corrugator supercilii and levator labii superioris, were selected for a prediction model. The performance of the pain prediction model resulted in a c-index of 0.64. In the study results, the most detectable difference in muscle activity during the pain experience was connected to eyebrow lowering, nose wrinkling and upper lip raising. As the performance of the prediction model remains modest, yet with a statistically significant ordinal classification, we suggest testing with a larger sample size to further explore the variables that affect variation in expressiveness and subjective pain experience.


Introduction
Pain as a subjective experience is difficult to assess in situations in which patients have no ability to self-report their pain. These situations are common in intensive care units (ICUs), where pain related to critical illnesses, major surgeries and everyday care procedures is common [1,2] PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.0235545 July 9, 2020 1 / 15 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and many of the patients are unable to communicate verbally due to mechanical ventilation or sedation. Most behavioral pain assessment scales validated for critically ill patients share an item measuring the facial expressions connected to pain [3], which is found to be the strongest marker associated with pain assessment in non-communicative patients [4,5]. On behavioral pain assessment scales, facial expressions connected to pain are often referred to as grimacing, frowning and wrinkling of the forehead [6][7][8], scored by observer interpretation according to the intensity of the expression. However, observer-based subjective evaluations often underestimate pain in others [5,9]. According to the study by Arif-Rahu et al. [10], the actions of critically ill patients' facial muscles during pain are similar to those of healthy volunteers. There is a consensus on four facial actions most associated with pain, forming the core pain expression: brow lowering, nose wrinkling and lip raising, orbit tightening, and eye closure [11,12]. In some studies, the muscles affecting the mouth have also been associated with pain [13,14]. However, facial expressions have been decoded on the level of action units (AUs) more often than a singlemuscle basis. The Facial Action Coding System (FACS), a framework for identifying facial expressions by Ekman and Friesen [15], divides facial movements into action units representing individual components of muscle movements. While the FACS has high reliability [16], using the FACS requires specific training, and the scoring is dependent on the subjectivity of the scorer [17].
Advances have been made in developing systems employing facial expressions of pain for automatic pain detection. Computer vision-based pattern recognition has been developed with fair to excellent classification performance [18] using the FACS coding, videotaped material of the BioVid heat pain database [19,20] and the UNBC-McMaster shoulder pain expression archive database [21,22], as well as in clinical situations in post-operative patients [23]. However, monitoring facial expressions using computer vision is challenging when coping with head orientation changes and the interference of medical accessories over a patient's face, such as tubes or oxygen masks. Most methods are based on coding every frame of a video based on the FACS, which could be a time-consuming process in real-time application [18]. A promising technology used in automatic pattern recognition is surface electromyography (sEMG), a non-invasive technology measuring the electrical activity of superficial muscles with electrodes placed on the skin [24]. The advantage of a sEMG system is the ability to objectively detect subtle facial muscle activity that can be invisible to observers [25]. However, a recent review by Dawes et al. [26] found only one study using the sEMG method to detect facial pain expressions objectively. Furthermore, the suggested correlation between pain intensity and muscle tension has remained unproven.
The current study is a part of the Smart Pain Assessment Tool project, which is intended to develop a clinically useful automatic pain assessment tool for critically ill patients. The main objective of this study was to evaluate the feasibility of the sEMG method for pain detection using two kinds of gradually increasing pain stimuli in healthy subjects. The aim was to examine the facial muscles that can most feasibly be used to detect pain and investigate predictability of pain intensity with machine learning algorithms based on those muscles.

Study subjects
The study was approved by the Ethics Committee of the Hospital District of South West Finland (ETMK:83/1801/2015). Each study subject provided a written informed consent. The study subjects were recruited via advertisements on the university campus and university websites. Thirty-one (15 male and 16 female) healthy volunteers aged 21-51 years (mean age 33 ± 9.0 years) were included in the study. Following the inclusion criteria, all the study subjects had healthy facial skin and no excess facial hair in the areas where the sEMG sensors would be placed. The general health of the study subjects was ensured through oral enquiry. The study exclusion criteria consisted of having chronic or acute illnesses, being pregnant and taking regular medication during or two weeks preceding the study. The data were collected between December 2015 and April 2016.

Materials and procedures
Facial muscles. Pain-related facial descriptors have been investigated in several studies. Facial muscles for the sEMG measurement were chosen compiling previous study findings as shown in Table 1 [17,27,28]. The facial muscles measured in the study were corrugator supercilii, orbicularis oculi, levator labii superioris, zygomaticus major and risorius.
Electromyography device. For facial muscle activation recognition, the sEMG signal was captured with a multi-purpose biosignal acquisition device that was developed for health monitoring [29]. This device was designed and manufactured by the IoT4Health research group. Version 1.0 of the device used in the study has previously been tested and reported on by Jiang et al. [30]. The multi-purpose device is capable of eight-channel signal acquisition with 24-bit analogue-to-digital resolution. The sample rate is adjustable, and in this study, it was set to 1000 samples per second. The amplitude of the analogue signals were amplified 24 times before digitalization. Each channel was set as a single-ended connection. The muscle activities of selected facial muscle regions were captured with surface electrodes on the right side of the face with monopolar configuration. Additionally, a reference electrode was placed on the bony area behind the ear. The frontalis sEMG channel on the same side was taken as noise reference signal for adaptive noise cancellation.
Test procedures. The study was done using a randomized crossover design. The facial sEMG was recorded among other biosignals including heart rate, respiratory rate and skin conductance. The procedures are described and reported with greater detail in Jiang et al. [31]. The study procedures were conducted in a quiet room with a comfortable armchair. A study technician and a study nurse were present during each data-collection session. A non-harmful, slowly increasing pain stimulus was induced with heat and electrical stimulus (shown in Fig 1) on the right of left arms. The subjects were tested four times during each session; two times with each stimulus. The starting test number of the pain induction was randomized to control the order effect. We defined the four tests as 1-left heat, 2-right heat, 3-left electrical pulses and 4-right electrical pulses. A random number between 1 and 4 was generated before all the tests. For example, if the start number was 3, then the order of the test was 3-4-1-2. Heat pain was induced using a round heating element with a diameter of 3 cm in the subject's inner arm. The heat increased slowly from 30˚C at intervals of 0.2-0.3˚C per second until reaching 52˚C, which is considered a safety limit and the heating process stopped. However, in some cases in which the pain tolerance was not reached before 52˚C, the heating element kept warming one to two degrees past the limit with inertia, leading to a mean heat of 52.9˚C. A cold pad was applied to prevent any burn marks after each heating session. Electrical stimulus was induced in the fingertip of the ring finger with a transcutaneous electrical nerve stimulation (TENS, Sanitas, Hans Dinlage GmbH, Germany). A pre-installed program with pulse width of 250 μs and the frequency of 100 Hz was selected. The TENS output can be manually increased from level 0 to 50 (peak to peak voltage 2V per level at 500 Ohm) and the levels were increased in every three seconds. The main motivation of using two pain stimuli in this study was to model generalized experimental pain from physiological signals rather than one specific experimental type of pain. The perception of pain starts from nociceptors, the sensing neurons, sending signals to the spinal cord and brain in response to potentially damaging stimuli. A-Delta and C fibers both carry a certain type of sensory information. The slowing increasing contact heat activates A-Delta fibers, which are responsible for the sensation of a quick and shallow initial pain, and then C fibers, which respond to a deeper, secondary pain. By contrast, electrical stimuli excite nerve fibers directly in the epidermis including the aforementioned nociceptors as well as non-nociceptive fibers [32]. Due to this primary aim, the collected facial sEMG data from all the tests were analyzed altogether.
Electromyography data collection. The facial skin under the electrodes was cleaned using cleansing swabs with 70% alcohol before the electrode placement. The pre-gelled H124SG Ag/AgCl sensors (30 mm × 24 mm) were placed on the predetermined facial muscles unilaterally along the right side of the face. The lead wires were attached firmly with tape to avoid disturbing movements. A baseline recording was performed before pain induction began. The pain induction starting place was pre-randomized. Subjects were instructed not to talk during pain induction, to avoid speech-related muscle movement inference. The facial muscle sEMG signals were continuously collected throughout the sessions.
Pain intensity assessment. The subjects were instructed to press an alarm signal on two occasions during the pain induction; first, when the sensation was perceived as pain for the first time (pain threshold time point) and second, the pain was intolerable (pain tolerance time point). The pain inducement was stopped within the safety limits even if the pain tolerance was not reached.

Data analysis
Preprocessing and feature extraction. Signal processing and data recomposition were implemented using MATLAB R2014a. The sEMG signal was preprocessed with a 20-Hz Butterworth high-pass filter to remove the movement artefacts and baseline drifts. An adaptive filter was applied for removal of the electrical pulse signals caused by the electrical tests and 50-Hz power line interference. A high-pass filtered frontalis sEMG was used as a noise reference signal in the adaptive noise cancellation. Then, the data were recomposed and down sampled from 1000-Hz signals to 1-Hz features with root mean square (RMS) and waveform length (WL) ( Table 2) transformations for each sEMG signal for analysis. The RMS models acted as an amplitude modulated Gaussian random process, whereas the WL represents the sEMG complexity over the time segment [33]. The recomposition produced ten separate features, named with the abbreviation of each muscle (cor = corrugator supercilii, orb = orbicularis oculi, lev = levator labii superioris, zyg = zygomaticus major, and ris = risorius) and feature transformation (rms = root mean square, and wl = waveform length): corrms, corwl, orbrms, orbwl, levrms, levwl, zygrms, zygwl, risrms and riswl (Table 3).
Data analysis tasks were performed using Python versions 2.7.14 and 3.6.3 (Anaconda Python for data science). During the feature processing, an additional outlier removal in the form of Hampel filtering was applied to all the data to remove artifact spikes of extremely high amplitude [34] (Table 2). All features were standardized on an individual level for the intraand inter-person comparison of the sEMG activity between muscles and subjects [35]. The standardization was applied on each subject's individual test level using z-transformation (zero mean and one standard deviation with a "no pain" period included).
Data labelling. The test periods defined by stimulus start (t1), subject-reported pain threshold (t2) and pain tolerance (t3) (Fig 2, Table 3) were split linearly to allow more detailed exploration of the level of pain intensity. This resulted in four shorter period labels; P1-P4 (see Table 3). After the data collection, additional test periods P0 were derived by including the collected sEMG signals captured during the 30-second period preceding the start of pain induction.
Statistical evaluation of the sEMG features. In the statistical comparison, first, the sEMG features were visually evaluated across the test periods P0-P4. For this, period-specific sEMG feature medians were calculated and plotted. Secondly, a pair-wise statistical comparison of the feature medians was performed. Subject level sEMG feature medians were computed for each period (P1-P4). Finally, inter-correlations between the sEMG features and the test period P1-P4 were determined.
Hampel filtering The Hampel filter (K = 3, t = 3) [34], where K is the halfwindow and the outlier threshold is t standard deviations 5 Z-score standardization on test level z ¼ ðxÀ mÞ s , where μ is the mean, and σ is the standard deviation https://doi.org/10.1371/journal.pone.0235545.t002 Non-parametric methods were used in the data analysis due to an imbalance of label classes and non-normally distributed sEMG feature values and to avoid linearity assumptions. Thus, we computed Wilcoxon signed-rank statistics [36] for the pairwise comparison of sEMG across the pain periods and Spearman's rank-correlation for the monotonic relationships.
The k-nearest neighbour (kNN) algorithm was chosen as the machine learning model. The small sample size, unbalanced label classes, non-normally distributed feature values, and the individual differences in the subjects led us to use a simple but efficient non-parametric machine learning method kNN that is able to also capture possible linear, but especially the inherent nonlinearity in the problem. Machine learning modeling. A supervised machine learning model was applied for pain intensity prediction. To study the primary aim, feasibility of pain recognition using sEMG signals, research focused on one method only. The model was constructed using selected sEMG features as the input and periods P1-P4 as the labelled output. The selection process of the sEMG features is described with more detail in the results section. The results from the sEMG feature comparison were used in addition to a table of Spearman's rank-order correlation coefficients [37] to limit the number of features used in the pain intensity prediction.
The k-nearest neighbour (kNN) algorithm was chosen as the machine learning model. The kNN is a non-parametric method, which performs well with both linear and non-linear patterns. A c-index [38] was used as the performance measure to calculate the concordance between the real ordinal outcomes and model predictions. A c-index is a generalization of the area under the ROC curve (AUC) [39], and values above 0.50 correspond to concordance between the predicted and real categories; values below 0.50 correspond to discordance, and values of 0.50 correspond to random predictions. Unlike the commonly used accuracy, a cindex also gives a reliable measure for unevenly distributed output classes.
In meta learning, the model is designed to automatically select the optimal learning model. The feature selection and hyperparameter k tuning were performed in the prediction model using nested cross-validation. Parameter optimizations were run within an inner loop, while an outer loop was applied for model selection. This method estimated the overall predictability performance without observed data and optimization bias [40]. The within-subject dependence-resulting bias was handled by using leave-subject-out cross-validation [41].
Sometimes, the classifier may produce feasible analysis results without the data itself containing actual patterns due to a small dataset or having too many features [42]. This can be statistically tested using permutation-based p-value. The competence of the meta learning classifier was assessed before the final model evaluation with a permutation-based p-value using randomized labels [42] in 1000 permutation tests; the null hypothesis was that the model classifier performance is a result of a random change. This evaluation tested if there is a real connection between the input data and class labels. Leave-subject-out nested cross-validation was applied to the real input values in each test in addition to randomized output classes. A plain non-nested leave-subject-out cross-validation evaluated the final model estimate, with the most suitable parameters produced by the meta learning model.

Results
In total, 120 tests were included in the analysis; four tests were excluded due to technical problems with the electrical stimulus device. The length of the tests varied between the study subjects and the nature of the pain stimuli. The average duration of a study test involved in the analysis was 110 seconds (SD = 42 seconds). In this study, all the tests were analyzed togetherin alignment with our initial aim to build a general model rather than one specific to a single stimulus.
Distribution of the self-report-based test period length was almost balanced (Fig 3, 'All'), especially during the pain induction. A more detailed visual inspection on the pain stimuli level average period length (Fig 3, 'Heat', 'Electrical') revealed that the heat and electrical stimuli had different threshold period lengths and that the distribution of all sEMG signal distributions were skewed heavily to the right. Therefore, the study data were considered to be nonnormally distributed.
The sEMG signal across the test periods was analyzed visually and then statistically. The visual comparison of the standardized sEMG feature medians (see Fig 4) exhibited clear differences between different muscle areas across periods. The corrugator supercilii sEMG-derived RMS and WL feature medians were distinguishable from the other sEMG features. The corrugator muscle activity grew simultaneously with the pain stimulus, whereas the point estimates of the other muscle areas altogether followed similar patterns across study periods P0-P4 from high activation during no pain to low activation at period P1. Eventually, they exhibited slower growth throughout the increasing pain stimulus intensity during the periods P3-P4. The study  subjects were not allowed to talk during the pain induction periods. Because this restriction was not present during the later-added no-stimulus period P0, and the visualization showed that period P0 sEMG signals were impacted by talking, all P0 period sEMG signals were excluded from the following analysis.
The statistical analysis over test periods P1-P4 included a pairwise comparison of sEMG features. According to the Wilcoxon signed-rank statistics, corrugator supercilii activity increased across periods. The increase was statistically significant (p < 0.05) during all test periods, except the RMS values during period P3 vs. P4. Corrugator features were the only ones that showed a statistically significant difference during the period P1 vs. P2 and P3 vs. P4.
The activity increase of the levator labii superioris was not statistically significant during the test period P1 vs. P2 or the period P3 vs. P4 but showed significant increase in all tests when comparing the periods P1 and P2 before the pain hold vs. periods P3 and P4 after the pain threshold. The pairwise test results of orbicularis oculi activation was similar to levator labii superioris, with the difference in that RMS was not statistically significant when comparing the P1 period and P4 periods.
The change in activation of all five tested muscles in both feature transformations was statistically significant when comparing periods of P2 and P3, representing the time that the subject reported the pain threshold. The p values of the Wilcoxon signed-rank statistics are presented in Table 4.
Irrelevant and correlated features were identified using the sEMG signal visual and statistical analysis together with Spearman's rank correlation coefficient [37] (Table 5). The features selected for the predictive meta learning phase were corrms, corwl, levwl, and orbwl. The overall predictability performance of the meta model produced an average c-index value 0.63. The best performing feature combination in the prediction model included the features presenting the waveform lengths of muscles corrugator supercilii and levator labii superioris.
The permutation test results of the meta model showed a statistically significant difference (p < 0.01) and therefore rejected the null hypothesis of classifier performance being a result of a random change, which confirmed the significance of the model classifier.

Discussion
The feasibility of sEMG for pain assessment can be examined from various angles. In this study, we tested the sEMG method measured using a non-commercial multipurpose biosignal acquisition device on healthy subjects to explore which facial muscles are the most feasible for a pain detection and prediction model.  In our results the facial muscles most associated with pain were congruent to the muscle actions described as the core facial pain expression in existing literature [11,12,17]. The corrugator supercilii seemed to be the most feasible muscle area for the recognition of gradually increasing continuous experimental pain in all of our analyses. This is in line with previous findings, which have found corrugator supercilii to be the "muscle of pain" [11,17,43]. The levator labii superioris also reacted strongly to the pain stimuli when the stimulus reached the pain threshold causing cheek raising and nose wrinkling. Our findings are also partly consistent with the study of Wolf et al. [44], in which nine facial muscles were analyzed using the sEMG method during experimental pain induction with a laser system in a sample of 10 male healthy volunteers. They found two muscle groups to be mainly related to pain expression: the orbicularis oculi, as strongest muscle related to eye narrowing, and the mentalis and depressor anguli oris, which cause movements around the mouth. The corrugator supercilii also showed significant results in their analysis as a part of eye narrowing.
The lower perioral muscles around the mouth were not included in our study due to our aim to avoid placing electrodes in impractical positions, considering the care procedures and medical accessories in the ICU. Furthermore, the anatomical differences in facial musculatures and soft tissues form some challenges to the feasibility of sEMG use in clinical care. The medial part of the corrugator supercilii is located just superior to the eyebrow [45], forming an easily located landmark for electrode placement. On the contrary, the upper perioral region muscles located on the cheek are more difficult to distinguish, because the facial muscles are typically very thin and located in layers [46]. Especially risorius cannot be identified in many individuals [47]. Our findings suggest that the risorius might not be a feasible muscle to be used as a predictive measure of pain expression.
Using sEMG signals during the continuously increasing pain stimuli period, the data were classified into four pain intensities. The ordinal classification of the most feasible sEMG signals, the corrugator supercilii and levator labii superioris, resulted in model performance of 0.64. This is better than a random result, but it does not reach a good prediction level. However, for some subjects, the ordinal classification provided a fair pain intensity prediction (eight subjects with c-index > 0.70), whereas some were somewhat randomly ranked (six subjects with the c-index values of 0.43-0.55). This could be a result of the subjects' differences in facial expressiveness and non-expressiveness [48]. Pain experience is subjective and has variability; therefore, the predictive model could benefit of more items that support the individual differences rather than one-fits-all generalization [49]. With 31 study subjects, a more detailed model was not realizable, but the sample size was reasoned with the feasible nature of the study. Furthermore, in a study design with intentional pain induction, the sample size must be ethically considered. In addition to the subjectivity of the pain experience, pain expression is a diverse non-verbal action to communicate pain to others, involving both voluntary and involuntary actions. It may be influenced by situational differences, but not as easily as the selfreport of pain. The influence of the social presence and characteristics of the experimenter may influence the pain tolerance and pain intensity, but the pain threshold seems to be more resistant to external influences [50].

Limitations
Our study design has weaknesses. Each participant was tested four times with experimental pain, and a carryover effect can occur when multiple interventions are tested on the same study subject [51]. In this study, the carryover effect might have affected the pain experience or the facial pain expression in spite of our attempts to control the carryover effect with randomization. Furthermore, adding a non-painful control intervention that distinguishes expressions unrelated to pain during the study and possibly monitoring a decreasing pain stimulus would have strengthened the design.
Demographic information collected from the subjects consisted of age and sex, as we aimed to have equal numbers of male and female subjects in our sample. These variables were not used in the analysis due to the limited sample. Adding demographic variables related to ethnic backgrounds would have strengthened the insight into the variability of facial actions [52]. Furthermore, the complex phenomenon of facial aging influences changes in the facial bones, soft tissues and skin [53]. Also, muscle contraction amplitude in facial muscle sEMG may be influenced by aging [54], and skinfold thickness may affect signal selectivity [55].We did not test the system of the person's natural social situation, and the test might have involved contextual factors that affected the results in a study [56]. The participants were aware of the overall study aims to detect various biosignals during pain induction. The study subjects may have applied different emotion regulation strategies during pain inducement that can moderate their pain expression [57]. However, in addition to facial electrodes, the participants wore electrodes on their fingers and in a belt around their chests to detect other biosignals (Reported by Jiang et al. [31]). This might have distracted them from concentrating on the facial expressions only.
The electrodes were applied according the guidelines of the human electromyography study by Fridlund and Cacioppo [58], but it remained unclear whether the recorded muscle activities actually reflected the muscle over which the electrodes were supposed to be fixed or a neighboring muscle. Therefore, when referring to the sEMG measurement of a specific muscle, it would be more appropriate to refer to the muscle group. Also, our choice to use the monopolar configuration may have had an effect on lower selectivity but was reasoned with better usability of the system.

Conclusions
In conclusion, the feasibility study results show that the muscles that gave the most information in the sEMG measurement during experimental pain were connected to eyebrow-lowering, nose wrinkling and upper lip-raising movements. This is congruent with the previously described core expression of pain. The performance of the prediction model remains modest, but the ordinal classification performance was statistically significant, confirming the pattern between the sEMG features and labels. Pain detection based on facial expressions may need further assurance of subject demographics and other relevant attributes due to the variability in expressiveness of the subjective pain experience.