Effect of robot’s vertical body movement on its perceived emotion: A preliminary study on vertical oscillation and transition

The emotion expressions of social robots are some of the most important developments in recent studies on human–robot interactions (HRIs). Several research studies have been conducted to assess effective factors to improve the quality of emotion expression of the robots. In this study, we examined the effects of a robot’s vertical oscillation and transition on the quality of its emotion expression, where the former indicates the periodic up/down movement of the body of the robot, while the latter indicates a one-time up or down movement. Short-term and long-term emotion expressions of the robot were studied independently for the four basic emotions described in the circumplex model of emotions: joy, anger, sadness, and relief. We designed an experiment with an adequate statistical power and minimum sample size of human subjects based on a priori power analysis. Human subjects were asked to evaluate the robot’s emotion expressions by watching its video with/without vertical movement. The results of the experiment showed that for the long-term emotions, the speed of vertical oscillation corresponded to the degree of arousal of the emotion expression as noted in the circumplex model; this indicated that fast oscillations improved the emotion expression with a higher degree of arousal, such as joy and anger, while slow or no oscillations were more suited to emotions with a lower degree of arousal, such as sadness and relief. For the short-term emotions, the direction of the vertical transition corresponded to the degree of valence for most of the expressed emotions, while the speed of vertical oscillation reflected the degree of arousal. The findings of this study can be adopted in the development of conversational robots to enhance their emotion expression.


Introduction
Emotion expression is one of the most important abilities of social robots in human-robot interactions (HRIs) [1][2][3]. This ability improves the quality of the robot's interactions with humans in many aspects [4][5][6][7] and is useful even for conveying a robot's intention to humans [8,9], e.g., by expressing the robot's state of mind [10] or establishing more empathy during interactions with people [11]. As an application, it was also shown that emotion expression enabled the robot to play better with children [12,13], or communicate better with elderly people [14,15]. However, although efforts to develop expressive social agents have been widely studied [16,17], designing expressive behaviors that can correctly convey a robot's emotions in different situations is not simple and these features are difficult to improve because of hardware constraints such as a limited number of joints, facial expression capabilities, and the degree of freedom (DoF) of the social robot. Previous studies have proposed several methods to improve the emotion expression of a robot in terms of both verbal and non-verbal communications. For verbal communication, several parameters of a robot's voice and speech, such as the pitch of the voice, the loudness of the speech, or the type of the vocabulary [18] have been reported as effective in expressing a robot's emotion. It was reported that the emotion of a speaker affected the voice parameter of his/her speech, where parameters such as the pitch, loudness, and prosody of the speech were the most influential ones [19]. To produce an emotional synthetic voice, a mapping model was proposed [20] in which the features of the emotion of a speech manipulated the parameters of the synthesizer, such as articulation, pitch of the voice, and quality of the voice. In another work [21], expressive utterance by an interactive robot was implemented by assembling strings of phonemes with the accents of the pitch. For a tour guide robot [22], the synthetic feature of the robot's speech was changed based on the emotion of the robot, that is, the robot changed the pitch and voice level of its speech based on its current emotion or change of emotion.
For non-verbal communication, several factors such as the gesture of the robot [23,24], gaze direction of the robot [25,26], facial expression [27][28][29][30], movement [31][32][33][34][35][36] and even the color [37,38] or vibration [39] of the robot were mentioned as effective parameters for improving the robot's emotion expression. For example, Fiore et al. [25] explored the effectiveness of different types of a robot's social cues in expressing its emotion, and showed that the proxemic behavior of a robot significantly affects human's perception about the emotion and social presence of the robot. For a robot with a simple facial expression [28], it was revealed that the recognition of the robot's facial expression about fear and surprise is difficult, while it is easier for the other basic emotions, i.e., anger, sadness and happiness. The movement of the robot was also mentioned as an effective factor to improve the robot's emotion expression [31].The speed of the movement, distance to human and the poses of the robot were reported as the important parameters. Additionally, studies about expressing a robot's emotion and/or naturalness during the robot's walk or movement have been widely conducted, e.g. determining a suitable set of body joints for expressing different emotions during the walk of a robot [32], realizing the involuntary motion and natural behavior of a robot by utilizing the oscillation of the robot [33], exploring the suitable walk characteristics for the emotion expression based on the stance phase, frequency and the length of footsteps [34], and adopting vertical movement of the robot to express the robot's emotion during the walk [35].
communication robot seems to be also feasible without damaging the impression of the robot's behavior.
As a hypothesis, we propose that by adding a vertical oscillation and/or transition to the body of the robot, the quality of the emotion expression represented by the previously designed gesture of the robot can be improved. Since the effective vertical oscillations and/or transitions for different emotions may be different, the study was separately conducted for four basic emotions, i.e., joy, anger, sadness and relief. Also, since short-term and long-term emotion expressions have different characteristics [40], the study was conducted for the shortterm and long-term emotion expression of the robot separately. To avoid complexity in the study, we focused on the emotion expression during the conversation of a conversational robot. In this study, the improvement of expression was evaluated subjectively by humans for the expressivity and clarity of the robot's expression. Here, expressivity is defined as the extent to which the robot could express its intended emotion, while clarity is defined as the extent to which the emotion of the robot conveyed to them clearly. To verify the proposed hypothesis, an experiment featuring human subjects to evaluate the quality of the emotion expression of the robot with/without the vertical movement was designed and conducted based on a priori power analysis in order to have minimum sample size and enough statistical power. As a preliminary stage of the study, the experiment was conducted utilizing the recorded video of the robot, and the exploration of physical interaction with the robot in a real-world scenario was left for future studies. A statistical hypothesis test was performed on the gathered data and based on the results, the suggested effective vertical movements for each type of emotion were reported. The suggested relation among the speed and/or direction of the vertical movement with the degree of the valence and/or the arousal of the expressed emotion is discussed. Also, the result of the study is compared with a previous report about the characteristics of the body movement of a robot for expressing different emotions [41] based on Laban's movement analysis [42], and the similarity, difference and the conceivable reasons for such similarity/difference are discussed.
The remainder of this paper is organized as follows. In the section "Materials and method", the general hypothesis of the study as well as the method of this research are explained in detail. In the section "Experiment", the conducted experiments are described. In the sections "Results" and "Discussion", the results of the experiment and the discussion about the findings of the study are reported. Finally, the conclusion of the study and future works are mentioned in the conclusion section.

Hypothesis and experiment design
The aim of this study is to find effective vertical movement(s) that can improve the emotion expression of a conversational humanoid robot. However, for transient emotion expressions and long-lasting ones, the gesture as well as the probable vertical movement of the robot seem to be totally different. For the transient one, the gesture and vertical movement of the robot would consist of short-term representations and movements. For the long term, it is expected that the robot will have long-term movements and expressions. From a psychological perspective, it has been reported that emotion and mood are two distinct phenomena, and usually the term affect is used as an umbrella term for them [40]. "While the emotions have a stimulus event, are comparatively intense, short in duration, and have behavioral implications, the mood consists of rather global, undirected, and mostly unconscious background sensations that are more stable than emotions" [43]. Although the definitions, similarities, and differences in terms such as emotion, feeling, mood, and affect have been widely discussed in psychology for many years [44], there is still discussion about the precise distinctions between them, especially emotion and mood [45]. In this work, as mentioned above, we focus on the transient/ long-lasting feature of the expression, and divided the emotion expression of the robot into two categories: short-term expressions and long-term expressions, and studied them independently. For the short-term emotion expression, the effects of both vertical oscillation and transition were studied, while for the long-term expressions, only the effects of vertical oscillation were considered because the non-periodic feature of the vertical transition does not match with the stable and long-lasting features of the long-term emotion expressions, but it is suitable for transient emotions.
Considering the description above, the hypothesis for the short-term and long-term emotion expression of the robot was set as follows: the vertical movement of the robot improves its emotion expression; for the short-term expression, the vertical movement was (the combination of) the vertical oscillation and the vertical transition of the robot as the two independent variables of the study, while for the long-term expression, only the vertical oscillation of the robot was considered. As a measure of the improvement of emotion expression, that is, the dependent variables of the study, the improvement of the expressivity as well as the clarity of the expression were considered, which was defined in the previous section (also see subsection "Apparatus, subjects, and procedure" in section "Experiment" for details about how to measure them in practice). To verify this hypothesis, an experiment containing human subjects was designed with a randomized control-group pretest-posttest design utilizing the analysis method of ANOVA on the gain scores [46], in which each subject firstly evaluates the expressivity and clarity of the emotion expression of the gesture of the robot without the vertical movement as the pretest score of the experiment, and after that evaluates same gesture with a vertical movement as the post-test score of the experiment; the difference of these scores are treated as the gain scores and utilized in the statistical analysis as the dependent variables of the study. Note that the randomization of the order of the videos, in terms of bringing the video of the no-vertical movement to the second order, was not considered in order to prevent probable concern/confusion of the participants about the stop of the vertical movement of the robot in the second video; they may concern if the robot was broken, or focus too much on the eliminated vertical movements but not on the evaluation of the gesture. For the post-hoc multiple comparison, Tukey's HSD test was considered in this study. See section "Experiment" for the details of the experiment. All the participant agreed with the written consent form approved by the ethics committee for research involving human subjects at the Graduate School of Engineering Science, Osaka University. The consent for the underaged participants were also obtained from their parents/guardians.

Basic emotions of the robot
To determine the basic emotions for the study, the circumplex model of emotions [47] was considered as the model describing the robot's emotions (Fig 1). This model explains the emotions based on two independent factors: 1) the degree of valence and 2) the degree of arousal. Because the model is divided into four areas based on the degree of each factor, namely high/ low degree for valence/arousal, a representative emotion that seemed to be easy to understand intuitively by both the designers of the robot's behavior as well as the participants of the experiment was chosen for each area. In this study, joy (high degree of valence and arousal), anger (low degree of valence but high degree of arousal), sadness (low degree of valence and arousal), and relief (high degree of valence but low degree of arousal) were utilized for the representative emotion of each area.

Adopted robot and its behavior
Robot specifications. A humanoid robot named locomotive-CommU was used in this study (Fig 2A). This robot consists of a mecanum rover at the bottom, a humanoid robot CommU on the top, and a vertical cylinder connecting these two components. The mecanum rover has four mecanum wheels and is able to move forward, backward, and rotate without changing its position, that is, without moving back and forth. The humanoid robot CommU is a small on-table robot with a cute appearance, which has 14 degrees of freedom (DoF); 2 DoF for its waist, 2 for each arm, 3 for its neck, 3 for the eyeball, 1 DoF for its eyelid, and 1 for its mouth. The vertical cylinder of locomotive-CommU can move up, down, and oscillate. The range of the movement of the cylinder is 30cm, where the highest and the lowest vertical heights of the robot are 1.3m and 1m, respectively.

PLOS ONE
Vertical oscillation and perceived emotion Robot vertical movements for emotion expression. The vertical movement of the robot was composed of two elements: vertical oscillation and vertical transition of the robot's cylinder. For the vertical oscillation, six different oscillation speeds for the cylinder including no oscillation were implemented (see Fig 2B). For the vertical transition, changing the height of the cylinder was adopted. It consisted of 3 types: moving up, moving down, and not changing the height (see Fig 2C). Then, the vertical movement of the robot was realized by combining the vertical oscillation and transition. To avoid unnatural vertical movement of the robot, when both the oscillation and transition were adopted, the robot did not start oscillating until the vertical transition was complete (see Fig 3 for examples of the movement). Note that since the utilized robot was ROS compatible including its vertical cylinder, the vertical movements were simply realized by sending the standard ROS locomotion message type, i.e., geome-try_msgs/Twist, with the signaling features shown in the figures.
Robot gestures for emotion expression. In order to study the effect of a robot's vertical movement on the improvement of its emotion expression, basic gestures expressing the robot's emotions are required, so that the vertical movement could be added to them and the improvement of the expression could be studied. Note that in order to avoid complexity in the study, the utterance of the robot was not considered or utilized and left for future works, although it does seem to be another important factor to improve the expression of the robot's emotions. To determine such gestures for each emotion of the study, we conducted a robot gesture contest in our laboratory. Three different researchers designed their best gesture for each emotion of the robot. Since short-term and long-term emotion expressions were considered in the study, each researcher designed a total of 8 gestures (short-term and long-term emotion expression for each of the 4 emotions, joy, anger, sadness, and relief). Then, the students of the laboratory were asked to compare the video of these gestures and choose the best one for each emotion.
Considering 3 different gestures designed for each 4 emotions, 12 videos were prepared for the contest of short-term emotions and 12 for long-term emotions. For the evaluation, first the participants were asked to watch all 12 videos in a random manner (the counterbalance was considered), and after watching each of them the participant was asked to answer the following question, Q1: "In your opinion, what was the emotion that the robot wanted to express in this video?" The participant had to answer by choosing from one of the 4 emotions. After that, the participant was asked to watch all 3 videos designed by different researchers for each emotion, and answer the following question, Q2: "Which video expressed the emotion X of the robot better?", where X replaced with the word related to each emotion, i.e., joy, anger, sadness, and relief. The most selected gesture in Q2 by the participants for each emotional expression was chosen as the gesture for expressing that emotion in the main experiment. Note that Q1 was asked in order to check whether the designed emotion was conveyed to the participants correctly by each gesture, and also to deepen the discussion of the result of the study (this will be discussed below). 22 university students (19 males and 3 females) at the age of 21 to 27 watched the video related to the short-term emotions while 21 students (18 males and 3 females) watched the video related to the long-term emotions. Fig 4A and 4B show the gestures elected for the short-term and long-term emotion expressions as the result of the contest, respectively.
The number of the votes for each elected gesture as well as the ratio of the perceived emotion for them were also analyzed to see how the participants perceived the emotion expressed by the elected gestures. The result of the number of the votes was as follows for the elected gestures expressing joy, anger, sadness, and relief, respectively: 68%, 50%, 82%, 46% for the shortterm, and 76%, 52%, 48%, 43% for the long-term expressions. This result indicates that some gestures won with relatively large number of votes (e.g., joy and sadness for short-term, and joy for long-term expression) and some by a relatively small margin (e.g., relief for short-term, and sadness and relief for long-term expression). For the elected gestures, the evaluation of the participants about Q1, i.e., their perception about the expressed emotion by the gesture, varied for different emotions: Table 1 shows the summary of the answers. As shown in the table, for the short-term expression of relief, the rate of correct answers, i.e., the ratio that the intended emotion of the designed gesture was conveyed to the participants correctly, was low: only 59% of the participants evaluated the gesture as a representation of relief. While for the other emotions, the rates of correct answers were high, i.e., 100%, 82%, and 77% for sadness, anger, and joy, respectively. As shown in the table, 36% of the participants evaluated the short-term expression of relief as the expression of sadness, and 5% evaluated it as a representation of anger. A similar result was obtained for the long-term expressions, as shown in the lower part of the table. Since more than the half of the participants distinguished the emotion of the elected gestures correctly even for the relief, and considering the limitation of the degree of the freedom of the joints and facial expression of the robot for expressing its emotions, as well as the fact that all three designers of the gestures agreed that designing better gestures with the current DoF would be difficult, the elected gestures for each emotion including the one for the relief were evaluated as the adequate ones to be distinguished by most of the people, and were therefore adopted without change in the main experiment of this study.

Short-term emotions
Hypothesis. The hypothesis for the short-term emotions is as follows: vertical oscillation and/or vertical transition of the robot improves subjective evaluation by humans regarding the expressivity and/or clarity of a robot's emotion expression.
Apparatus, subjects, and procedure. The robot mentioned in subsection "Robot specifications", locomotive-CommU, was utilized in the experiment. The experiment was conducted via crowd sourcing, wherein members were asked to answer an online survey over the internet. The participants were asked to watch the video of the robot expressing emotion and evaluate the quality of the emotion expression. For the short-term emotions, the emotion expression was implemented by combining the gestures elected in the gesture contest ( Fig 4A) and the vertical movement of the robot, that is, a combination of vertical oscillation and transition. For the vertical transition as the first independent variable of the experiment, all three patterns mentioned in the previous section were adopted as the level of the variable, that is, moving-up, moving-down, and no vertical transition, whereas for the vertical oscillation as the second independent variable of the experiment, three types of oscillations were utilized, that is, nooscillation, slow oscillation, and fast oscillation because the other oscillations (very fast, medium, and very slow oscillation, see Fig 2B for detail) did not match the short-term gestures of the robot based on our experience, i.e., the combination of the vertical oscillations with the short-term gestures led to very strange and not-synchronized movements. In order to omit the inclusion of such a negative impression, or in other words the factor of the degree of matching, but focusing only on the exploration of the effect of the oscillation speed, the mentioned oscillations were not considered in this experiment. In conclusion, nine different videos related to the different vertical movements were prepared (combination of three types of vertical transitions and three types of oscillations). The lengths of the videos were approximately 1-2 seconds based on the gesture of the robot.
To determine a suitable sample size for the experiment, a priori power analysis was conducted. The results showed that the suitable sample size was N = 2392 with the following parameters: α = 0.05, 1−β = 0.8, small effect size for interaction effect f 2 = 0.005 [48], number of groups = 9, df = 4. Note that since two independent variables were considered for the experiment and the interaction effect of the variables should be considered, the amount of the effect size in the power analysis was set based on the suggested one for the interaction effects, which was described in detail in previous work [48]. For the same reason, the degree of freedom regarding to the interaction effect was set as the parameter of the a priori power analysis, i.e., df = 4. As a result, the closest number to the calculated N that the crowd sourcing practically could gather for the experiment was set for the sample size, that is, N = 2250. Table 2 shows the number of participants for each range of age and gender for the experiment. As mentioned in the table, we tried to have the same distribution of the number of subjects for the age and gender ranges. Table 3 shows the number of subjects gathered for the experiment related to each emotion. Fig 5 shows the procedure of the experiment for each participant. The participants were first asked to watch a video of the robot expressing its emotion with its gesture but without the vertical movement of its cylinder. This gesture was the one elected through the robot gesture contest described in subsection "Robot gestures for emotion expression". The participants were asked to evaluate the expressivity of the robot's emotion, that is, the extent to which the robot could express its emotion, by answering question q1: In this video, the robot expressed its emotion X. How much do you think the robot could express it?, where X was replaced with one of the following words depending on the emotion that was programmed for the robot in the video: joy, anger, sadness, or relief. The questions were answered using a 7-level Likert scale, where 7 meant "could express very well" and 1 meant "could not express at all". Thereafter, the participants watched the video of the robot expressing the same emotion with the same gesture but with the vertical movement. Then the participants were asked to reevaluate the expressivity using the same question, namely q2. Further, the participants were asked to evaluate the improvement in the clarity of the emotion expression of the robot, that is, how much the emotion of the robot was conveyed to the participant, by answering question q3: Compared to the previous video, was the emotion X of the robot conveyed to you more clearly?, where one of the words joy, anger, sadness, or relief replaced the term X. The questions were  answered using a 7-level Likert scale, where 7 meant "conveyed very much" and 1 meant "did not convey at all". The improvement of expressivity, that is, the differences between the scores for q2 and q1, were utilized for the first dependent variable of the experiment, and the improvement of clarity, that is, scores for q3, were used as the second dependent variable of the experiment. As briefly mentioned in subsection "Hypothesis and experiment design", these dependent variables were utilized as the score gain of the statistical analysis, while the design of the experiment was a randomized control-group pretest-posttest design. In this experiment, the two independent variables (vertical oscillation and transition) were treated as between-subject factors, therefore each participant watched only one video with vertical movement in the post-test stage. The subjects were almost equally distributed to watch one of the nine videos of vertical motion of the robot.

Long-term emotions
Hypothesis. The hypothesis for long-term emotions is as follows: vertical oscillation of the robot improves subjective evaluation by humans regarding the expressivity and/or clarity of a robot's emotion expression.
Apparatus, subjects, and procedur. For the long-term emotions, the same robot, experiment procedure, and evaluation method as those of the short-term emotions were utilized. However, as mentioned before, the vertical transition was not utilized for long-term emotions, and only the vertical oscillation was considered as the independent variable of the experiment. For that, four types of oscillations were adopted: very slow, mid-speed, very fast, and no oscillations (see Fig 2B for details). The other oscillation speeds, that is, slow and fast oscillations in Fig 2B, were not utilized because they did not match with the gestures of the robot based on our experience (see the explanation for the short-term gestures for a detailed explanation). The lengths of the videos for the long-term emotions were 3 to 5 seconds, which were approximately equal to the length of the gesture for each emotion expression.
To determine a suitable sample size for the experiment, a priori power analysis was conducted utilizing the following parameters: α = 0.05, 1−β = 0.8, small effect size for main effect f 2 = 0.01 [49], number of groups = 4, df = 3. Note that since only one independent variable was considered for the experiment and the main effect was the only expected effect, the amount of the expected effect size in the power analysis was set based on the suggested one for the main effects, which is described in detail in the previous work [49]. For the same reason, the degree of freedom regarding to the main effect was set as the parameter of the priori power analysis, i.e., df = 3. Because a suitable sample size was obtained as N = 1095, we set the number of participants to the closest number to the calculated sample size considering the practically equal number of the participants that the crowd sourcing could gather for all conditions, that was, N = 1120. Table 4 shows the number of participants designed for each range of age and gender. Table 5 shows the number of subjects who attended the experiments related to each emotion. The procedure of the experiment was the same as that explained for the short-term emotions in the previous section and Fig 5. The only difference was that the adopted gestures for the robot (gestures elected for long-term emotions were utilized instead of those examined in the short-term emotions) and the number of the videos for each emotion (four videos per longterm emotion).

Results
As noted in the previous sections, for the short-term emotions, two-way ANOVA was conducted to evaluate the effects of two factors (vertical oscillation and vertical transition of the robot) on the improvement of the robot's short-term emotion expression, that is, the improvement of expressivity and clarity (see subsection "Apparatus, subjects, and procedure" in section "Experiment" for the precise definitions). For the long-term emotions, one-way ANOVA was conducted to evaluate the effect of vertical oscillation of the robot on the improvement. For the post-hoc multiple comparison, Tukey's HSD test was used. Also, the observed posterior power of the tests (mentioned with 1−β) and the effect size of the comparisons in terms of Cohen's d (mentioned with d) were calculated for the statistical tests. The alpha level of the statistical significance of the tests was set to 0.05 and IBM SPSS Statistics Version 24.0 was used for the analysis. In this section, the results of the statistical test for the shortterm and long-term emotions are reported for each of the emotions expressed by the robot, that is, joy, anger, sadness, and relief.
Relief. For expressivity, the main effect of the oscillation was significant (F(3,1210) = 5.99, p < .001, 1−β = .958). Post-hoc analysis revealed that the no oscillation condition had a higher score (M = 0.12, SD = 0.91) than the very fast oscillation (M = -0.21, SD = 1.01), where p < .001, d = 0.33 (see Fig 8E). In addition, the very slow oscillation had a marginally significant higher score (M = -0.02, SD = 0.84) than the very fast oscillation, where p = .073, d = 0.20 (same figure). For clarity, the main effect of the oscillation was significant (F(3,1210) = 3.85, p = .009, 1−β = .823). The result of the multiple comparison test showed that the no oscillation condition had a higher score (M = 4.00, SD = 1.51) than the very fast oscillation condition (M = 3.61, SD = 1.53), where p = .008, d = 0.32 (see Fig 8F). Additionally, the normal oscillation had a marginally significant higher score (M = 3.91, SD = 1.47) compared to the very fast oscillation, where p = .066, d = 0.25 (same figure).  moving-down behavior was effective in improving the representation of emotions with a low degree of valence, except for relief. Furthermore, having the oscillation was effective in improving the representation of emotions with a high degree of arousal, and having no oscillation was effective for improving the representation of emotion with a low degree of arousal, except for relief. These conclusions were derived from the following facts, which are also summarized in Fig 9. For joy, i.e., the top-right area of the figure, moving-up behavior was effective for improvement of emotion expression. Moreover, having oscillations (both fast and slow) was effective compared to having no oscillation. For anger, i.e., the top-left area of the figure, moving-down behavior was effective. Further, the fast oscillation had a marginally significant effect on the improvement. For sadness, i.e., the bottom-left area of the figure, no movement was better than moving up or down, while having no oscillation was marginally effective. In addition, in cases where the robot was moving up, having no oscillation improved the expression of sadness. When the robot had fast oscillation, it was better to move down or have no vertical transitions. Finally, for relief, i.e., the bottom-right area of the figure, fast oscillation was effective.

Short-term emotions
As mentioned above, the only exception to the rule was relief. According to the rule found, the moving-up behavior as well as no-oscillation behavior had to be effective for improvement of the expression of relief. However, while relief has a high degree of valence, there was no effect of the moving-up behavior on the improvement of emotion expression. Further, while relief had a low degree of arousal, there was no effect of the no-oscillation behavior. Instead, the fast oscillation was the effective expression, which is contradictory to the above conclusion. The reason for this seems to be the poor gesture by the robot for expression of relief, as mentioned in the description of Table 1 in subsection "Robot gestures for emotion expression". Therefore, it appears that in the main experiment, the participants could not distinguish the expression for relief from other emotions such as sadness and anger; thus, the evaluation for adding the vertical movement to the gesture of relief was not similar to those of the other emotions and did not follow the rule found. However, designing a gesture of relief for the robot does not seem to be a simple task, since some previous works reported that humans have a tendency to attach greater weights to negative entities [50], are biased for using negative information [51], and make incorrect judgment of positive faces compared to negative ones [52]. Therefore, for the gestures corresponding to an emotion with a positive valence like relief, more careful designs are required.
Further, the results of this study were compared with those of a previous work [41] in which the relation between the movement of the robot and expressed emotion was analyzed based on the main features of the Laban movement analysis (LMA) [42]. The findings of our study, i.e., the effective factors for improving individual emotion expressions of the robot, were supported by the previous work as follows. In the previous work, the following relationships between improvement of emotion expression of the robot and robot body movement were reported: 1) joy correlates with the straightness of posture, range of body, backward posture, quickness, and powerfulness; 2) anger correlates with the range of body, movement of parts of the body in different directions, quickness, and powerfulness; 3) sadness correlates with low posture, narrowness of body, forward posture, slowness, and weakness; and 4) relief correlates with straightness of posture, backward posture, movement of parts of the body in the same direction, slowness, and weakness. Comparing the effective behaviors revealed in our work with the factors reported in the previous work, the following correspondences could be suggested.
For joy, the moving-up behavior of the robot reported in our work corresponds to the straightness of posture reported in the previous work, since the robot's height increases via the cylinder's moving-up motion and consequently provides straightness of posture for the robot.
This also corresponds to the range of the body, since the volume of space that the robot occupies increases with the moving-up behavior. The vertical oscillation of the robot, especially the fast movement, is compatible with quickness and powerfulness. For anger, the combination of the moving-down behavior with the looking-left gesture of the robot seems to play the same role as "movement of parts of the body in different directions" reported in the previous work, since the movement of the moving-down behavior is orthogonal with the looking-left gesture of the robot in which the scalar product of these two movement is maximum (for details, see the definition of the space factor in the previous work [53]). Moreover, the fast oscillation of the robot corresponds to quickness and powerfulness.
For sadness, the no-movement behavior of the robot corresponds to narrowness of body, while the moving-down behavior of the robot (in the case where the robot oscillates quickly; see the subsection "Sadness" in the section "Results" for the "Short-term emotions") is compatible with low posture. Further, the no oscillation behavior corresponds to slowness and weakness. For relief, none of the behaviors of the robot in our study matched with those of the factors mentioned in the previous work. As mentioned before, the poor gestures utilized for the expression of relief by the robot would be a plausible reason for having no corresponding behaviors. Improvement of the gesture for relief may therefore result in finding related behavior between our work and the previous work. Fig 10 shows a summary of the effective vertical oscillation for each long-term emotion of the robot. For joy and anger, the fast oscillation was effective. For sadness, having no oscillation was effective. Finally, for relief, no oscillation was effective, while the slow and normal oscillations speeds were marginally effective. In the circumplex model of emotions, joy and anger are categorized as emotions with a high degree of arousal, while sadness and relief are classified as having a low degree of arousal. From these results, it is suggested that the speed of the vertical oscillation corresponds to the degree of arousal of emotions in the circumplex model. In other words, the fast vertical oscillation of the robot improves its expression of emotions with high arousal, while the slow or no vertical oscillation enhances expression of low arousal emotions.

Limitations
Although the effect of the vertical oscillation and transition of the robot on its emotion expression could be summarized based on the significant difference of the mean score of the conditions as shown in Figs 9 and 10, the effect size of the factors was not the same. While the study was conducted with the assumption of small expected effect size as set in the priori power analysis, some factors had a relatively large effect size, e.g., fast oscillation in long-term expression of joy (d = 0.56), while others had a small effect size, e.g., fast oscillation in short-term expression of joy (d = 0.15). Therefore, further studies are required to investigate how the movements of the robot should be improved, especially for the short-term emotions, which had relatively small effect sizes in the result of this study. Also, in the experiment the order of the videos was fixed in order to prevent the probable negative evaluation of the participants about the stopped movement of the robot. However, the extent of such negative effect was not studied objectively and the findings of this study may include the effect of the fixed order of the videos. To solve such limitations, further research would be required to enable a standard psychological experiment with randomized order of the stimuli as well as excluding the probable negative effect.
On the other hand, in the pilot experiment of the study, that is the robot gesture contest held in the laboratory, the number of female participants was quite lower than the number of male participants. In addition, the majors of the all participants were robotics, computer engineering, or computer science. Therefore, the gestures elected for the expression of robot's emotions might include a strong tendency as a result of such a biased sample, and consequently do not reflect enough the general opinion about robots' gestures. In future works, conducting such gesture contest using a crowed sourcing study may lead to a more concrete evaluation of a robot's gesture and the preparation of more precise studies about the effect of robot's movement on their emotion expression. Also, the cultural context in which the experiment was conducted might play an important role in the selection of the gesture for each emotion. Since it was not controlled in this experiment, a cross-cultural study in future works is necessary in order to reveal details about the way to express the emotions through the body movement of the robot in different cultural contexts. In addition, the gestures were designed by researchers who have no specific specialty in terms of standardized design of the expression, such as facial action coding system (FACS) [54,55] in the case of face. Systematic and organized design of the gesture and analysis before adopting to the experiment would lead to more precise results and concrete findings in future works.
The other limitation related to the utilized robot was its limited expressive capability. Since the embodiment and human likeness of communicational robots were reported as the important factors for the expressiveness [56][57][58], gestures created by the simplified humanoid robots (similar to the one used in our experiment) would have limitations. Consequently, the results obtained under such a limitation would have constraints in terms of generalizability: it cannot be extended to the other types of social robots. Therefore, a comparative study on social robots with different levels of expressive capability is required to find out how the same movements may produce a different perception of expression. In this study, a vertical cylinder system was utilized to realize the robot's vertical movement. However, adopting such a system could be difficult for some communication robots due to their specifications, or even for some interaction scenes. Therefore, study of the vertical movements realized by more common or easy-toimplement systems for the vertical movement and/or oscillation of the robot, such as robot's torso movement, up-down behavior of the pitch of the robot or the other oscillation generators, can be considered as further possible future studies to evaluate the scalability of the findings of this work.

Conclusion
In this study, we examined the emotion expression capability of a robot and proposed its vertical oscillation and vertical transition movements as effective factors to improve expressivity and clarity. The expressions of the robot were divided into short-term and long-term emotions and studied independently. Four basic emotions were considered for the robot from the circumplex model of emotions, namely joy, anger, sadness, and relief. To verify the proposed idea, a video experiment was conducted for each emotion, in which human subjects were asked to watch the emotion expression of the robot and subjectively evaluate its expressivity and clarity. Statistical analyses on data gathered from human participants showed that for long-term emotions, the fast oscillation of the robot improved emotion expression with a higher degree of arousal, while slow or no oscillations were better for expressing emotions with a lower degree of arousal. In other words, it was suggested that for long-term emotions, the speed of the vertical oscillation would correspond to the degree of arousal in the circumplex model of the emotion. For the short-term emotions, except for relief, the moving-up transition improved emotion expression with a higher degree of valence, while the moving-down or no vertical transition situations were better for emotions with a lower degree of valence. Thus, it is suggested that that the direction of vertical transition of the robot, or more generally, the degree of expansion of the robot's body corresponds to the degree of valence in the circumplex model of emotion. Additionally, except for relief, the effect of vertical oscillation of the robot was marginally the same as that for long-term emotions, that is, the speed of vertical oscillation corresponded to the degree of arousal for the emotion. Comparing the findings of this study with the results reported in a previous work [41] in which the relationship between the body movement of the robot and its emotion expression was studied using the main features of the LMA, it is observed that our findings are supported in terms of the effective factors for enhancement of a robot's emotion expression.
However, to avoid complexity in the study, some of the important modalities of the robot that affect its emotion expressions, such as utterance, voice, color expressions on the cheek, and language were not considered in this work. Studying the effects of a combination of such modalities with the vertical movement, especially language and voice parameters of the robot (which are the most general parameters examined in other social robots), remains as a topic for further research. In addition, the findings of this research were not supported for expression of the relief by the robot. Since the poor gesture for relief was assumed to be the main reason for such failure, adopting more expressive gestures for representation of relief and conducting similar experiments could be considered as avenues for future work. Also, analysis about the effect of age and gender of the participants on the perceived emotion is required to be considered as another essential future work. Finally, conducting real-world human-robot interaction experiments with actual constraints would be an important next step for this study. For a real-word interaction with human, ability of the robot to adapt its behavior based on human's reaction to prevent the probable short fall in the interaction is one of the essential issues, e.g., producing empathetic response to the emotion of the interlocutor [59]. Research on the human-like interaction skills for the robots to improve emotion recognition and consequently produce adequate response as well as emotional expression is required as another important future work.