Associations between facial emotion recognition and young adolescents’ behaviors in bullying

This study investigated whether different behaviors young adolescents can act during bullying episodes were associated with their ability to recognize morphed facial expressions of the six basic emotions, expressed at high and low intensity. The sample included 117 middle-school students (45.3% girls; mean age = 12.4 years) who filled in a peer nomination questionnaire and individually performed a computerized emotion recognition task. Bayesian generalized mixed-effects models showed a complex picture, in which type and intensity of emotions, students’ behavior and gender interacted in explaining recognition accuracy. Results were discussed with a particular focus on negative emotions and suggesting a “neutral” nature of emotion recognition ability, which does not necessarily lead to moral behavior but can also be used for pursuing immoral goals.


Introduction
The ability to recognize emotional facial expressions is important for everyday interpersonal relationships and for social adjustment [1][2]. Indeed, past research involving children and adolescents showed that accurate recognition of emotions is associated with higher social and academic competence, and with less externalizing and internalizing behaviors ( [1,[3][4][5]; see [2] for a meta-analysis).
Quite surprisingly, this basic ability has almost been overlooked in three decades of bullying research, despite the fact that scholars have repeatedly stressed the importance of emotions in this phenomenon (e.g., [6][7][8]). In particular, in the emotional domain, bullying research has widely focused on empathy, as recently summarized in a systematic review [8] and in two meta-analytic studies [9][10]. However, although these studies showed that understanding and sharing other people's emotions is related to behavior students' decide to act during bullying episodes, to date only two studies specifically investigated a more basic skill, that is, emotion recognition ability [11][12].
These two studies certainly have merits and we built on them to develop the present work. Woods and colleagues [12] found that, controlling for gender, peer-nominated bullies did not differ from students not involved in bullying in their ability to recognize emotions, whereas PLOS  victims scored lower both in the overall ability to identify emotions and, in particular, in recognition of anger and fear. In contrast, Ciucci and colleagues [11] recently found no significant relation between self-reported bullying and victimization and emotion recognition abilities. However, both studies only focused on bullying and victimization, neglecting the social nature of bullying [13]. Indeed, they compared students who bully or who are victims with a general, rather vague category of "uninvolved" students, which could lead to spurious results. Nonaggressive bystanders are not a homogenous category, but quite different behaviors can be adopted by witnesses of bullying (i.e., defending or passive bystanding; [14]) and different bystanders' behavior are associated with different individual characteristics (e.g., [15][16][17]). Methodologically, both studies assessed emotion recognition through static photographs [18][19], which have been suggested to resemble facial expression in everyday communication to a lesser extent than dynamically morphed facial expressions [20][21]. Furthermore, movement is acknowledged to play an important role in emotion perception affecting recognition accuracy [22]. Additionally, both studies investigated four basic emotions (i.e., happiness, sadness, fear, anger) and the rationale for excluding the other two basic emotions (surprise and disgust) was not provided. Finally, intensity of emotions was not considered. However, in real life, emotions are not always full-blown expressed and it has been suggested that presenting facial expressions at both lower and higher intensities could detect more subtle performance differences [21,23].

The current study
The novelty of the current study is to investigate the relations between four different behaviors in bullying, namely bullying others, being victimized, defending the victim, and passive bystanding, and young adolescents' ability to recognize facial expressions of the six basic emotions [24], dynamically expressed at different intensities. Although our theoretical interest was focused on participants' recognition of the negative emotions, which are more likely to characterize bullying episodes (e.g., sadness and fear of victims, disgust and anger of bullies), happiness and surprise were also considered. This allowed us to detect whether observed performances were limited to specific emotions or could be described as a more general (in)ability. Moreover, this may provide other researchers with a more complete picture useful as a basis for future studies.
At a general level, in the whole sample we expected to replicate previous findings about differences in recognition accuracy depending on specific emotions (e.g., [21]), that is, that some emotions (i.e., happiness and anger) are more easily recognized than others (i.e., fear and sadness). Concerning the main goal of this study and given the paucity of studies on this topic in the bullying field, we deemed not appropriate to formulate specific hypotheses on the relation between the recognition of each emotion, at both intensities, and each bullying-related behavior. However, on the basis of both findings in other fields of research (e.g., bystanders' intervention in social psychology) and previous results in the bullying literature involving other constructs, such as social-emotional skills, some anticipations can be made.
First, consistent with the view of bullies as "competent" individuals [25][26][27], it is not surprising that, in general, previous studies did not find particular deficits in emotion recognition in youth who bully. However, some differences may be expected considering different emotions at different intensities. We hypothesized that higher levels of bullying could be related with higher ability in detecting fear, which may be a useful means to identify more vulnerable victims within the group and, subsequently, to recognize-and even maximize-the "success" of the aggression. Second, regarding victimization, we expected to replicate previous findings both in bullying and in other fields of research, showing general difficulties in recognizing emotions (e.g., [12,28]). In particular disgust and anger, if correctly identified, especially when they are not yet full-blown expressed, could be used to predict bullies' attacks and, potentially, avoid them. Third, in line with research that showed that emotion recognition abilities correlate with empathic skills and prosocial behavior [1], we hypothesized that defending behavior would be associated with higher emotion recognition accuracy. This hypothesis also stems from the picture of defenders as socially and emotionally competent [15,29]. Moreover, given that classic research on bystander intervention suggested that the recognition of target's distress predicts the likelihood of helping [30][31], a link between recognition of sadness and fear with defending (positively) and passive bystanding (negatively) behavior was expected. Associations in the same directions were hypothesized for disgust and anger. Indeed, recognizing these emotions, for example in the bully's face, can help to distinguish playful from intentional aggressive behaviors and could, therefore, represent one of the first steps for deciding to intervene in a potentially risky situation like bullying [32].
Finally, literature extensively showed that gender might affect both students' behavior during bullying episodes [17,[33][34] and emotion recognition abilities [35][36]. Therefore, even though full exploration of gender differences was not the focus of the current study, participants' gender was considered both as a control variable and a potential moderator in the analyses.

Method
The research project has been approved by the Ethical Committee for the Psychological Research of the University of Padova (number 17-2151).

Participants
Participants were recruited from one middle school (6 th to 8 th grade) located in a medium sized city in the North of Italy. Of 129 students invited to participate, 127 (98.4%) obtained written parental consent. However, due to school absences on the days of data collection, the final sample consisted of 117 students (45.3% girls; mean age = 12 years, 3 months, SD = 9 months). All the students provided verbal assent to participate in this study.
Concerning socio-economic background, measured through the Family Affluence Scale (FAS; [37]), the majority of the participants came from medium-and high-class families (low FAS: 3.5%; medium FAS: 58.2%; high FAS: 38.3%). Consistent with national statistics about student population [38], the 81.2% of the participants had both parents born in Italy, 13.7% of them had one parent born outside Italy, and 5.1% had both parents born in foreign countries.

Measures and procedure
Behavior during bullying episodes. Participants were presented with sixteen behavioral descriptions (adapted from [39][40]) and asked to nominate an unlimited number of classmates who better fitted each of them. Specifically, four items for each behavior (i.e., bullying, victimization, defending, and passive bystanding) were used (see the S1 Appendix for the complete list of items). In order to assure anonymity, students nominated classmates by indicating their corresponding number of the class roster. For each behavior a continuous score was computed by dividing the mean number of nominations received in the four items of each scale by the number of nominators. All scores showed satisfactory levels of internal consistency reliability (i.e, Cronbach's alphas were .92, .85, .91, .68 and McDonald's omegas were .93, .88, .92, .70 for bullying, victimization, defending, and passive bystanding, respectively).
Emotion Recognition Task. To assess the ability to recognize and label facial emotional expressions, the Emotion Recognition Task (ERT) was individually administered in a quiet room in the school. The ERT is a computerized paradigm in which morphed video clips of the six basic facial emotional expressions are presented at different intensities (40, 60, 80 and 100%; [21,23]) by four actors, for a total of 96 trials. Each clip shows a face gradually changing from a neutral expression to one of the six emotions at a different level of intensity. Participants are asked to label each expression using a six-alternative force choice response without time restriction. The order of presentation of the morphs was fixed for all participants, starting with the lower intensities. Instructions and other technical aspects (e.g., number of frames, length of the videos) are detailed in Kessels' and colleagues' [23] paper. For sake of simplicity and parsimony, and based on preliminary analyses indicating that intensity had a substantially dichotomous effect on emotion recognition, in the following analyses intensity was split into low (40-60%) and high (80-100%) intensity.

Statistical approach
Given the complex structure of our data, a Bayesian Generalized Mixed-Effects Models approach was used. Specifically, data were characterized by the presence of: (1) a dichotomous dependent variable (i.e., accuracy); (2) observations nested within subjects; (3) between-and within-subjects factors; (4) quantitative independent variables. Furthermore, to evaluate our research questions we needed to test and explore 2-way, 3-way and 4-way interactions between independent variables, and to compare several models.
As well documented in the statistical literature (e.g., [41][42]; see [43][44] for recent applications in psychology), the Bayesian approach is a valid alternative to the traditional frequentist approach to deal with our data structure and research questions. Without going into philosophical reasons, which are beyond the scope of the present paper, the Bayesian approach allows to: (1) accurately estimate mixed-effects models as suggested by Bolker and colleagues [45]; (2) coherently assess the variability of parameter estimates and provide associated inference via 95% Bayesian Credible Intervals (BCI). BCIs provide a direct representation of the most credible values of the estimated parameters given the prior distribution of the parameters and the observed data incorporated into the model. As a result, BCIs permit probabilistic statements to be made regarding confidence that the estimated parameters fall within any particular range. This is similar to the way researchers often misinterpret frequentist confidence intervals [44]. Therefore, Bayesian modeling allows us to interpret results in a manner that is both intuitive and more rational than common alternatives (see also [43]). BCIs were calculated using the percentile method; (3) compare the models in terms of evidence within a unified framework. In particular, the Watanabe-Akaike Information Criterion (WAIC) was used to select the best model among a set of candidate models fitted to the same data, and WAICweights are presented to compare the evidence of each model with regard to all candidate models. With this method, models were compared using a continuous and informative measure (i.e., evidence), rather than a series of simplified accept-reject dichotomous decisions typically adopted with the Null Hypothesis Significance Testing approach [42]; (4) appropriately evaluate the interaction effects using posterior distributions of planned comparisons between estimated parameters [46].
Two sets of analyses were conducted. First, we focused on performances in the Emotion Recognition Task, considering type and intensity of emotion, and controlling for participants' gender. This allowed both to investigate our first goal (i.e., to replicate findings concerning differences in recognition accuracy depending on specific emotions), and to verify for the first time the accuracy pattern in a sample of Italian young adolescents. Three hypothesized logistic mixed-effects models were estimated and compared. The most plausible model was interpreted by means of estimated parameters, graphical representations and planned comparisons.
Second, in order to answer our main research questions, participants' behaviors during bullying episodes (i.e., bullying, victimization, defending, and passive bystanding) were evaluated as predictors of accuracy. As a measure of effect size, Odds Ratios and associated 95% BCIs are presented and discussed.
We estimated our models using the no-U-turn sampler [47], a variant of Hamiltonian Monte Carlo [48] as implemented in the STAN probabilistic programming language [49]. The basic idea is to iteratively poll possible parameter values from pre-specified prior distributions until convergence upon those model parameters that optimally represent the data.
We used the default prior specifications of the R package brms [50][51]. These priors could be considered less informative, and lead to posterior distributions of estimated parameters that are mostly influenced by the observed data rather than by prior information outside the study of interest. In our case, as stated below, this choice allowed to have appropriate parameter estimates and yielded satisfactory convergence of all tested models. Furthermore, from a reproducibility perspective, default priors allow other researchers to immediately reproduce our analyses and results.
Iterations of the estimation procedure were, as usual, split among independent "chains". The purpose of including independent chains is to ensure that the model reliably converges on the same parameters. However, because each chain is initialized with random starting parameters, they require a certain number of iterations before the optimal solution is reached-after which the posterior distribution is sampled directly and used for inference purposes. To ensure exclusion of this "warm-up" (also known as "burn-in") period, we discarded the initial samples from each chain prior to collapsing the chains for analysis [43]. All our models included 4 chains of 2,000 iterations each (8,000 in total) with a "warm-up" period of 1,000 iterations per chain (4,000 total) resulting in 4,000 usable samples.
Convergence was evaluated via visual inspection of the chains and using the Gelman-Rubin convergence statistic, R-hat, with values around 1 indicating convergence, and 1.100 considered as acceptable limit [41]. According to these diagnostics our models showed satisfactory convergence, with stationary distributions of estimated parameters and all associated Rhat's 1.017. All related graphics and indices are available upon request from the authors.
Moreover, all models were also estimated with the traditional maximum likelihood approach using the lme4 package of R. In several cases, convergence was not reached. Overall, estimated model parameters were very similar to those produced by the Bayesian approach. Results of these analyses are available from the authors upon request

Performances in the Emotion Recognition Task
Overall, the mean proportion of accuracy in emotion recognition was .62 (SD = .30). As for intensity of emotion, the marginal mean accuracy was .56 (SD = .29) for low intensity and .66 (SD = .30) for high intensity. For types of emotion, the marginal mean accuracy was .74 (SD = .23) for anger, .68 (SD = .27) for disgust, .39 (SD = .27) for fear, .44 (SD = .27) for sadness, .91 (SD = .13) for happiness, and .52 (SD = .22) for surprise. Overall, girls showed higher mean accuracy (.70, SD = .28) than boys (.56, SD = .30). In Table 1, the mean proportion of accuracy in emotion recognition by intensity, type of emotion and gender is presented.
Three plausible Bayesian logistic mixed-effects models were performed to analyze the data. In each model, the dependent variable was accuracy in emotion recognition (0 = incorrect, 1 = correct). The first baseline-reference model (M1) was a null model including only the random effect of subjects (i.e., a random intercept term for subjects was used). In the second model (M2), intensity and type of emotion as well as gender were added as main fixed effects.
Finally, in the third model (M3) the interaction between intensity and type of emotion was also added. Results indicated that M3 (see Table 2) was clearly the most plausible model that has generated the observed data, having the lower WAIC (WAIC M1 = 14,565, WAIC M2 = 12,600, WAIC M3 = 12,499) and a probability of being the best of .99.
Beyond the effect of gender (OR girls vs boys = 1.774; 95%BCI = 1.45-2.16), the recognition of emotion was moderated by intensity. The interaction effect between intensity and type of emotion is depicted in Fig 1. Bayesian comparisons across intensity showed that anger (95%BCI = .134-.211), disgust (95%BCI = .153-.236), happiness (95%BCI = .075-.121), and sadness (95%BCI = .065-.156) were better recognized in the high intensity condition than in the low intensity condition. No differences were found for fear (95%BCI = -.024-.066) and surprise (95%BCI = -.042-.051). Bayesian pairwise comparisons across emotions by type of intensity are presented in Table 3, for the interested readers. Baseline category for Gender was "boy". Baseline category for Intensity was "low". Baseline category for Emotion was "Anger". BCI = Bayesian Credible Intervals

Relations between emotion recognition and behaviors during bullying episodes
For the sake of transparency, in Table 4 descriptive statistics of emotion recognition accuracy by levels of behaviors, type and intensity of emotion, and participants' gender, are shown.
To examine the associations between emotion recognition and behaviors during bullying episodes, we started by comparing two Bayesian logistic mixed-effects models for each behavior with emotion recognition as dependent variable. The first model included participants' gender, intensity and type of emotion, the participant score on the behavior of interest and the related 2-, 3-and 4-way interactions as fixed effects. Additionally, the scores on the other three behaviors were included (and thus controlled for) as main fixed effects. This approach allowed to partially overcome issues regarding model complexity (in terms of number of parameters), multicollinearity among behaviors (correlations are reported in Table 5), and interpretability of results. In the second model, the 4-way interaction was dropped. In both models, subjects were treated as random effects (i.e., a random intercept term for subjects was used).
According to common guidelines [42,52], model comparisons showed that (i) for bullying and defending, the model with the 4-way interaction should be strongly preferred; (ii) for victimization, the model with the 4-way interaction and the one without it were substantially equally plausible; (iii) for passive bystanding, there was weak evidence in favor of the model without the 4-way interaction (see Table 6).
Consistent with these results and to facilitate interpretation, we chose to focus on the models that included the 4-way interactions (see Table 7).
The estimated 4-way interactions for the four models are presented in Fig 2. For each combination of gender, intensity and type of emotion the Odds Ratio associated with an increment of 10% in the bullying (Fig 2a), victimization (Fig 2b), defending (Fig 2c), and passive bystanding (Fig 2d) score and the associated 95%BCI are displayed. All the corresponding numerical indices are included in Table 8.
Bullying. Ninety-five percent BCIs indicated that higher levels of bullying were associated with better recognition of fear in both intensity conditions among boys, with worse recognition of low and high intensity sadness among girls and with less accuracy in recognizing disgust in the low intensity condition in both gender groups. Moreover, bullying was associated to better recognition of happiness, at high intensity among boys and at low levels among girls (Table 7).  Victimization. Among boys, higher levels of victimization were associated with less accuracy in recognizing fear in the low intensity condition, and sadness in the high intensity condition. Among girls, higher levels of victimization were associated with less accuracy in recognizing disgust and sadness in both intensity conditions, surprise in the high intensity condition and with better recognition of happiness in the low intensity condition (Table 7).
Defending. Among girls, higher levels of defending were related to better recognition of anger in the low intensity condition, disgust, fear, and sadness in both intensity conditions and surprise at high intensity. Moreover, higher defending was associated with the recognition of happiness in the high intensity condition, so that it was lower among boys and higher among girls (Table 7).
Passive bystanding. Among girls, passive bystanding behavior was associated with more accuracy in recognizing disgust in the high intensity condition. Moreover, higher levels of passive bystanding in boys were related to better recognition of surprise in the high intensity condition (Table 7).

Discussion
The aim of this study was to offer, for the first time, a fullest possible overview of the relation between facial emotion recognition abilities and young adolescents' behavior during bullying episodes. In particular, four different behaviors in bullying (i.e., bullying others, being victimized, defending the victim, and passive bystanding) and recognition skills of morphed facial expressions of the six basic emotions, expressed at two different intensities, were considered. Given the complexity of both data structure and research questions, we used a Bayesian approach rather than the traditional frequentist approach. This represents an important novelty in the field. Beyond theoretical reasons, this approach allowed us to obtain robust estimates Facial emotion recognition in bullying Facial emotion recognition in bullying and model convergence in spite of the non-optimal sample size-number of estimated parameters ratio. First, we verified that the accuracy in the facial emotion recognition, measured through the recently developed Emotion Recognition Task [21], paralleled previous findings in the Facial emotion recognition in bullying  literature. Results showed a similar pattern in the recognition of the six basic emotions, with higher mean accuracy in the recognition of happiness, anger and disgust, and lower performances concerning surprise, sadness and fear. Moreover, the importance of considering different emotion intensities [21,23] was confirmed, in that, for four out of six emotions (i.e., anger, disgust, sadness, and happiness), intensity influenced the recognition performance. Finally, as expected, girls showed generally higher accuracy in recognizing emotions compared with boys (e.g., [36]). Concerning the main goal of this study, which dealt with the relation between the recognition of negative facial emotions and young adolescents' behaviors during bullying episodes, an interesting picture emerged. In general, one notable result concerned the recognition of emotions as an ability that can be related with both moral (i.e., defending the victim) and immoral (i.e., bullying others) behavior in the context of bullying dynamics. A prominent example is represented by fear recognition, which was positively related with both higher levels of bullying, among boys, and defending, among girls. We could hypothesize that recognizing fear may help aggressive youth to identify vulnerable victims and make the aggressive behavior more efficacious; at the same time, it could promote prosocial behavior, for example by alerting bystanders that something wrong (and potentially dangerous) is happening and eliciting their empathic responses towards victims.
A link with empathic skills could be also hypothesized analyzing sadness recognition in girls, which was negatively related to bullying and positively with defending. It could be speculated that detecting sadness in victims, even when not full-blown expressed, could elicit empathic concern for them and make clear that what is happening is neither pleasant nor desirable for the victims; this, in turn, could increase the likelihood of helping. Likewise, girls' greater ability in recognizing disgust and low intensity anger could be positively associated with defending behavior because it allows to better understand hostile bullies' intentions. Although we were not specifically interested in positive emotions, likely less crucial in bullying dynamics, it should be noticed that among girls defending was also associated with higher recognition of surprise and happiness. Thus, we may hypothesize that defending behavior in girls is connected with a general ability in recognizing facial emotions. Overall, these findings confirm the growing literature showing that defending behavior in bullying is more frequent among girls and is associated with a pattern of social-emotional skills [15,29].
Conversely, our expectation on the associations between difficulties in recognizing negative emotions and higher levels of passive bystanding was not confirmed. Indeed, the only relevant result concerning passive bystanding behavior was a positive association with Facial emotion recognition in bullying disgust recognition at high intensity among girls. This is another example of the idea that emotion recognition is a "neutral ability" that does not necessarily represent a driving force for moral and prosocial behavior. Regarding victimization, consistent with Woods and colleagues' study [12], our findings overall confirmed that higher levels of victimization were associated with a general difficulty in recognizing emotions. This result is not surprisingly and complements the large body of research that has documented the social-cognitive, emotional, and interpersonal deficiencies of frequently victimized youth (e.g., [26,[53][54]). However, the pattern of results with respect to specific emotions that emerged from our analysis was not clear-cut and easy to interpret; future studies should try to replicate the current findings and test more precise hypotheses about victims' impairment in recognition of specific emotions, also adopting experimental (e.g., scenarios) and longitudinal designs.
This study has also some limitations. For example, the collected data were cross-sectional. Although from a theoretical point of view emotion recognition abilities can be more easily conceived as a precursor of a conduct rather than a consequence of behavior, the cross-sectional design of this study did not allow us to drive conclusions about the direction of the effects. Therefore, the model proposed in the present study will need to be retested with longitudinal data. Second, our sample was small and restricted to young adolescents. Future studies should replicate these results and test the association between emotion recognition skills and students' behaviors during bullying episodes in both younger children and older adolescents.
Despite these limitations, taken together, the findings of this study documented the significance of considering a basic skill, namely recognizing facial emotions, for understanding different behaviors during bullying episodes. To date, this is the first study to offer a global picture on the association between these two variables and it aims to represent a basis for future studies. Indeed, several new research questions can arise from the current results. For example, it would be interesting to investigate which individual and contextual variables may mediate or moderate the relation between the individual's ability of recognizing a specific emotion and his/her behavior and may, at the same time, help distinguish among different behaviors. Moreover, future studies could explicitly investigate the relations between emotion recognition and empathy as precursors of defending behavior; for example they could test whether the hypothesis about a possible direct link between better recognition of sadness and higher empathic concern is warranted. Furthermore, knowing which emotion students identify when they fail in recognizing the correct one (that is, what kind of "error" they do) may provide new insight on the relation between emotion recognition and students' behavior.