From Motion to Emotion: Accelerometer Data Predict Subjective Experience of Music

Melanie Irrgang; Hauke Egermann

doi:10.1371/journal.pone.0154360

Abstract

Music is often discussed to be emotional because it reflects expressive movements in audible form. Thus, a valid approach to measure musical emotion could be to assess movement stimulated by music. In two experiments we evaluated the discriminative power of mobile-device generated acceleration data produced by free movement during music listening for the prediction of ratings on the Geneva Emotion Music Scales (GEMS-9). The quality of prediction for different dimensions of GEMS varied between experiments for tenderness (R₁²(first experiment) = 0.50, R₂²(second experiment) = 0.39), nostalgia (R₁² = 0.42, R₂² = 0.30), wonder (R₁² = 0.25, R₂² = 0.34), sadness (R₁² = 0.24, R₂² = 0.35), peacefulness (R₁² = 0.20, R₂² = 0.35) and joy (R₁² = 0.19, R₂² = 0.33) and transcendence (R₁² = 0.14, R₂² = 0.00). For others like power (R₁² = 0.42, R₂² = 0.49) and tension (R₁² = 0.28, R₂² = 0.27) results could be almost reproduced. Furthermore, we extracted two principle components from GEMS ratings, one representing arousal and the other one valence of the experienced feeling. Both qualities, arousal and valence, could be predicted by acceleration data, indicating, that they provide information on the quantity and quality of experience. On the one hand, these findings show how music-evoked movement patterns relate to music-evoked feelings. On the other hand, they contribute to integrate findings from the field of embodied music cognition into music recommender systems.

Citation: Irrgang M, Egermann H (2016) From Motion to Emotion: Accelerometer Data Predict Subjective Experience of Music. PLoS ONE 11(7): e0154360. https://doi.org/10.1371/journal.pone.0154360

Editor: Gonzalo G. de Polavieja, Champalimaud Foundation, PORTUGAL

Received: November 3, 2015; Accepted: April 12, 2016; Published: July 14, 2016

Copyright: © 2016 Irrgang, Egermann. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Due to lack of consent to publish raw data from participants, data are available upon request from Melanie Irrgang, TU Berlin, Fachgebiet Audiokommunikation, Sekr. EN-8, Einsteinufer 17c, 10587 Berlin, melanie.irrgang@citypedia.eu. Data for the app is available in the following repository: https://github.com/mirrgang/motion2emotioncc.

Funding: The authors have no support or funding to report.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Music is often used to regulate emotions like reducing stress or to influence one’s mood as shown by [1] or [2]. This means that listening to music is highly linked to the experience of emotions [3–5]. Nonetheless, these subjective qualities of music do still play a minor role in the field of Music Information Retrieval (MIR) and Music Recommender Systems (MRS), i.e. for the retrieval and recommendation of music offered by web-based services [6].

In the field of Affective Computing [6] there is increasing effort to connect the physical characteristics of music to emotional valence and arousal values of the circumplex model from [7]. However, considering acoustical features of music and its implications on the perception of emotions does not yet consider the motor origin hypothesis of emotion in music as claimed by [8]. Accordingly, music is often discussed to be emotional because it reflects expressive movements in audible form [9–11]. Furthermore, the term emotio derived from the Latin movere, to move, and is being used as a synonym for being moved. That is why Leman calls for new non-verbal, embodied possibilities to describe music and its experience [12]. He suggests to use corporeal articulations as a bridge between linguistic self-report measures and measurements of physical energy like pitch, loudness or tempo because “human action can realize the transformation from physical energy to cultural abstraction, and vice versa” ([12] p. 77).

Describing music by moving a mobile device like a smartphone, could build a bridge between physical energy and subjective experience. Smartphones are among the most increasingly popular devices to listen to music [13]. Once corporeal articulations are available from smartphone-assessed motion data and emotionally interpretable, missing emotional descriptions of music in MIR and MRS could be provided based on these embodied descriptions. Hence, a model able to translate between corporeal and verbal descriptions of music would not only offer innovative, multimodal access to the retrieval of music, but also place additional semantic annotations about the emotional qualities of music at the disposal that also enrich the conventional verbal search.

1 Verbal Models of Musical Emotion

The most widespread models to describe the emotional qualities of music listening experience are the basic emotions model, the circumplex model and the Geneva Emotion Music Scales (GEMS) [14].

The basic emotion model assumes, that there are four to six basic emotions that have evolutionary meaning and that are culturally universal. They often include fear, anger, happiness, sadness, disgust, and surprise [14]. The circumplex model maps these basic emotions and many other emotional feelings onto a two-dimensional space that is spanned by valence (“how pleasant or unpleasant is the experience?”) and arousal (“how intense is the experience?”) [7]. Accordingly, the quality of each emotion can be described by these two underlying qualities.

The GEMS have been iteratively developed and evaluated in four studies [15]. The aim for music-related emotions was to find “a more nuanced affect vocabulary and taxonomy than is provided by current scales and models of emotion” ([15] p. 513). The original version of the GEMS comprises 45 terms including feeling of transcendence, nostalgic, solemn or impatient that are not part of any other emotional model. The GEMS-9, a shortened version of the GEMS-45, are, like the long-version, grouped into the categories of sublimity (wonder, transcendence, tenderness, nostalgia, peacefulness), vitality (power, joyful activation) and unease (tension, sadness) [15]. Torres-Eliard, Labbé and Grandjean collected self-report measures of the GEMS and suggest that it was a suitable model to assess musical emotion [16]. They concluded that “the results indicate a high reliability between listeners for different musical excerpts and for different contexts of listening including concerts, i.e. a social context, and laboratory experiments” ([16] p. 252).

2 Corporeal Articulation of Musical Emotion

Several studies have shown that there is a close link between movement and emotion in music. Sievers et al. asked participants to adjust the features rate, jitter, consonance/smoothness, step size and direction for each of the following five emotions: anger, happiness, peacefulness, sadness and fear [10]. For one group the adjustment of features led to different movement and appearance of a bouncing ball. For the second group adjusting features changed the melody and expression of a piano piece. Experiments were conducted both in the U.S.A. and Cambodia. The settings used for different emotions were highly similar for motion and music in both cultures. Accordingly, the authors conclude that emotion expression in music and movement seem to be based on the same universal features. Giordano et al. studied the relationship between walking and emotion and its implications for the expression of musical performances [8]. Slow, quiet, and irregular walking sounds were associated with expressing sadness while fast, loud, and regular walking sounds with happiness. Similar patterns are also used in music performance (see [17]). Thus, they concluded that their findings “support the motor-origin hypothesis of musical emotion expression that states that musicians and listeners make use of general movement knowledge when expressing and recognizing emotions in music” ([8] p. 29). This connection between musical rhythm and motor activities was supported by Parncutt already in 1987 [18].

There are many possibilities to express music listening experience in an embodied way. Among them are tapping or moving parts of the body along with the beat, singing, imitating to play a musical instrument or dancing. Hedder also evaluated an approach based on facial expression as form of embodiment [19]. Drawing as described in De Bruyn, Moelants, and Leman [20] and [12] is another alternative as a means of graphical attuning to the experience. Last but not least, the employment of acceleration sensor data generated by arm gestures as by Amelynck, Grachten, van Noorden and Leman [21] was described to be a very promising approach of multimodal querying on mobile devices.

Amelynck et al. [21] investigated how motion can be linked to emotion in the context of MIR. They asked participants to perform arm gestures while holding a Wii remote controller in order to describe the music. Afterwards the emotional qualities of the musical excerpts were rated on the dimensions of valence and arousal. Using motion features recorded with the Wii controller generated fairly good predictions for the dimension of arousal, but performed less precisely for the dimension of valence. The authors argue that this might be due to people rating sad music as pleasant as described in [22]), and conclude that the circumplex model might be unsuitable to be used with musical emotions. Also Juslin and Vjästfjäll note that perceiving mixed emotions, that are positive and negative at the same time, limits that employability of the circumplex model [23]. Accordingly, Amelynck et al. suggest the GEMS as an alternative to this circumplex model.

Aims and Experimental Design

The goal of the presented study is to explore which and how well each of the GEMS can be predicted by mobile-device generated acceleration data. That way, the present study continues the work of Amelynck [21], testing the use of an alternative emotion model and using different motion sensors. Hence, these findings will contribute to understand, how acceleration data can be used to integrate embodied music cognition into Music Recommender Systems. Furthermore, it will test the often described similarities between certain motion features and emotional qualities [8][10].

First, a pilot study was conducted to develop a measurement instrument for the following experiments. Afterwards, we conducted two experiments to test if and how accelerometer data can be used to describe musical experience. Here, the second experiment tested if results could be replicated for different music samples and if results changed when participants were free to choose the songs they felt like moving to. Furthermore, we also tested how different GEMS qualities relate to different movement patterns (rhythmic vs. gestalt), e.g. if music experienced as sad was less suitable for rhythmic movement patterns.

General Methods

Ethics Statement

Prior to participating in both experiments, individuals were informed of their general goals and of the procedures involved, i.e. describing the music corporeally and to rate its emotional qualities. They gave oral consent to participating in the study and storing of the data collected during the experiment. No ethics approval was required from the Technical University Berlin for behavioral studies such as those reported in this manuscript. There was no institutional review board available at the department where the experiment was conducted. Neither of the experiments involved deception or stressful procedures. Participants were informed that they were free to leave the experiment at any time, and that their data was analyzed anonymously. Participants in both experiments were recruited on a voluntary basis from the students and acquainted interested persons. Some students got course credit for participation. Others shared a professional or private interest in the study and its methodology and therefore volunteered to participate. The research reported in this manuscript was carried out according to the principles expressed in the Declaration of Helsinki. No other than the personal identifying information reported in this manuscript were collected (see Methods section).

Development of Measurement Smartphone App

We conducted a pilot study to design and accompany the development of the measurement instrument. In this phase, we interviewed 11 persons with different backgrounds, i.e. different age, gender, musicians and non-musicians, for their preferred way of describing music experience in an embodied way. After testing two favorites in a prototypical stage, participants opted for performing free movements while holding their smartphone device over drawing lines. Hence, an Android App was developed iteratively applying the think-aloud method to integrate participants’ feedback as described by [24].

The choice for Android was due to a wider spread of Android devices that could enable us to repeat the experiment with a lager sample size in the future. The app presented the music stimuli to participants and simultaneously recorded accelerometer data from smartphone sensors (for more details, we want to refer to the documentation for Android Developers [25]). Afterwards, it presented nine emotional attributes taken from the GEMS-9 short version. Here, participants were instructed to rate the emotional qualities of the music excerpts presented. Screenshots showing the App are available as Supporting Information S2 Fig.

Stimulus Selection

Music used in both experiments was selected in a participatory approach [26] based on suggestions by the participants of the pilot study according to the following criteria:

account for a variety of field participants’ preferences
cover the range of the GEMS-9
keep the balance between female and male artists
do not let emotions be covered in a stereotypical way like tenderness by female artists or tension by male artists
artists of different color
cover a variety of genres

For each musical piece, an excerpt of ~40s duration was chosen such that it was as homogeneous as possible w.r.t GEMS qualities during this time.

Motion Data Analysis

The general workflow for motion data analysis was as follows:

get raw acceleration data from motion sensor for x, y and z in 3D space
cut beginning (first 5s) and end to standardize duration d of signals to d = 35s
resample with sample rate ~5.7Hz
apply PCA to x, y and z
extract motion features
normalize range of motion features intra-individually
split data set into training (50%) and test (50%) set
stepwise select features on training set and fit linear regression model for each GEMS feeling
evaluate model quality on test set for each GEMS feeling

Motion Feature Extraction.

As participants needed a few seconds to fully get into the movement, the first five seconds were cut from the beginning of the motion data. Prior to feature extraction we applied a Principle Component Analysis (PCA) to each recording of accelerometer data (per stimulus and participant). That way, the x-, y- and z-dimensions were transformed to three principal components PC1, PC2 and PC3. This helped to enhance the comparability of movements between participants, e.g. to account for different ways of holding a device. We did not apply PCA in order to compress data, all dimensions were kept. Furthermore, we did not extract direction-relevant features that would have spoken against applying a PCA. Table 1 shows an overview on the features extracted to characterize the movement, categorized into tempo, size, regularity and smoothness. The statistical features absolute skewness, median and standard deviation (std) are computed to get a time compressed representation of the features extracted from the time series of motion data. As most features were not normally distributed, we chose to compute the median over the mean. During the selection of features we learned that the standard deviation still served as a significant feature to represent the degree of variance in the distribution. In order to remove any inter-individual differences in movement size, all features were subsequently range-normalized intra-individually to the interval [0-1]. Fig 1 shows an example of one accelerometer recording before and after applying PCA. Here, the extraction of different features is illustrated. When results are described, we will refer to positive acceleration in eigenspace as forward and to negative acceleration in eigenspace as backward movement.

Download:

Table 1. List of Extracted Motion Features.

https://doi.org/10.1371/journal.pone.0154360.t001

Download:

Fig 1. Example of Extracted Features for the Movements of one Participant to the Music Excerpt Wargasm, Experiment 1: The figure shows a participant performing large and fast movements.

After about 12s the direction of movement changed, probably for ergonomic reasons (first subplot). Principle components of accelerometer data are shown in the second subplot. The third subplot indicates that the maximum frequency magnitude lies at a high frequency. The fourth subplot visualizes the computation of the signals’ amplitudes (only for PC1). The stars in the fifth subplot show the midcrosses of the motion signal.

https://doi.org/10.1371/journal.pone.0154360.g001

Model Fitting.

We fitted one linear regression model for every GEMS feeling, with all motion features as predictor variables. There was no strong multi-collinearity between motion features indicated by the fact, that for any predictor, the Variance Inflation Factor was smaller than 10. Stimulus order effects could have resulted in an emotional afterglow effect of stimuli (e.g. the rating of the second piece is influenced by the rating of the first one), violating the assumption of observational independence in linear regression models. However, no such effect could be observed in test plots on autocorrelation of residuals and test plots mapping order against residuals. Features were selected using the forward stepwise algorithm for linear regression [27]. Before selecting the features and fitting the model, the data was partitioned into training and test set, ~50% each, in order to evaluate each model’s ability to generalize with unseen data. The test set was compiled by randomly selecting five observations from each participant.

Experiment 1

2.1 Method

The following sections summarize the first experiment that was conducted after developing the measurement app in the pilot study.

Stimuli.

The musical stimuli for the first experiment were compiled by the pilot study’s participants according to the criteria described in the General Methods section. Table 2 shows the final list of musical excerpts. The list of samples was presented in random order for each participant. However, they were then free to choose the preferred order in which to assess the samples.

Download:

Table 2. List of Music Excerpts First Experiment.

https://doi.org/10.1371/journal.pone.0154360.t002

Participants.

For this experiment, we recruited 22 participants from the Master of Science program Audio Communication and Technology at TU Berlin. They had an average age of 27 years (SD = 2.36). 73% identified as ‘male’, 18% as ‘female’, 5% as ‘rather male’ and another 5% did not identify with any gender. 91% were experienced in playing an instrument, the production of music or singing in a choir. 5% only had short term experience in making music beyond classes in school and 5% indicated to have no experience at all. 36% already participated in dancing classes or similar activities for which movement is related to music. 50% were only dancing occasionally in clubs or on concerts. 14% had no experience at all in moving to music. 82% are regularly using a smartphone, 9% are experienced in using a smartphone but do not use one now and 9% are not using one at all. Participants also indicated, that for them they most formative music genres were Rock* (68%), Electro* (55%), Hip Hop* (32%), Classic (27%), Metal (22%), Jazz (23%), Punk (23%), Pop* (18%), Dub(step) (18%) and Reggae (14%). (Note: *several similar subgenres were grouped to one).

Procedure.

The laboratory used was illuminated only slightly and offered enough space to move freely. The app ran on a Motorola Moto G with Android Version 4.4. Participants wore AKG headphones featuring a 1,5m long cable. During the experiment, they were alone in the lab with doors closed. Before, a guided test tour through the app was given in order to familiarize participants with the experiment. They were informed that the study was about describing the music corporeally and to rate it in terms of the GEMS. They did not know that, subsequently their GEMS ratings would be predicted from movement. We also told them that there was no right or wrong way to move to the music.

After a participant selected a song, the first step was to listen to the song in order to be prepared for the corporeal articulation. Participants could stop the presentation of the excerpt early when they decided that they knew the music already well enough to describe it.

Afterwards, the movements were actually recorded by the device’s acceleration sensor synchronized to the music. For this part of the study participants were instructed as follows: “Please move now with the device according to the music. It is important that you stand and don’t sit during motion capturing. You can move freely, i.e. all parts of the body, but keep in mind that only movement of the device can be captured.”

After each embodied description participants rated the perceived emotional qualities of the musical excerpts according to the GEMS-9 on a 100-point, unipolar intensity scale initialized to ‘0’ (Table 3). They were instructed as follows: “Please rate the perceived emotional quality of the music according to the GEMS-9. Do not rate how you felt during listening.” Subsequent to the GEMS, participants were asked how suitable they considered both embodied and verbal descriptions for the music excerpt. At the end of the experiment participants were asked to fill out a short socio-biographical questionnaire.

Download:

Table 3. Emotion Ratings Items: Participants rated their experience of nine German GEMS categories (each defined by three additional adjectives).

https://doi.org/10.1371/journal.pone.0154360.t003

Results and Discussion

Similarities between Movements and GEMS for First Experiment.

The fixed effects from Table 4 indicate that music perceived as transcendent was related to a rather irregular tempo of movement (std_dist_midcrosses). For wonder the movement’s size (std_peak) was irregular in the second component and regular (skewness_peak) in the first principle component. Power related to regular and large movements. Tenderness was described by small (median_peak) movements with regular backward (std_fall) and irregular forward (std_rise) gestures. Nostalgia was also described by small movements (median_peak) with irregularly smooth backward movements in the third component and regular ones in the second component (skewness_fall). In contrast to power, peacefulness is characterized by small movements. When participants rated music as joyful, they performed movements with regular backward phases and less regular forward movements like jumping. For sadness movements were slow while for tension movements were primarily large with irregular tempo.

Download:

Table 4. Fixed Effects Modeling Parameter Estimates for GEMS Ratings and their Principle Components of First Experiment Predicted via Motion Features Related to Size, Rate, Regularity and Smoothness of the Movement.

https://doi.org/10.1371/journal.pone.0154360.t004

Fig 2 shows that participants preferred rating on GEMS to describe their experience when they perceived nostalgia, sadness, tenderness or peacefulness. For joy, power and wonder participants preferred embodied descriptions. For tension and transcendence we did not observe such an association between feeling rating and description preference.

Download:

Fig 2. Spearman Correlation of GEMS and Description Preferences (Embodied vs. GEMS) for First Experiment.

https://doi.org/10.1371/journal.pone.0154360.g002

Prediction Results.

Table 5 indicates that tenderness, power and nostalgia were predicted best, followed by tension, wonder and sadness. For peacefulness and joy around 20% of variance in the data could be explained by the fitted regression models. Transcendence was most difficult to predict because only 14% of the variance in the data could be explained through the motion data. Since for both, the training and test data set, a comparable amount of variance could be explained, there was no evidence for overfitting of the model to the data.

Download:

Table 5. R-Squared and RMSE for first experiment ranked according to their R² on the training set.

https://doi.org/10.1371/journal.pone.0154360.t005

Since we asked participants to rate the intensity of experience of the different GEMS, most feelings are likely to be correlated with the overall intensity or arousal of emotional experience (see Fig 2). Here, most feelings show small to large correlations with power. Therefore, we applied a PCA on these rating data, in order to represent emotional experience with less dimensions. A resulting arousal component in the GEMS rating could be interpreted as describing the quantity or intensity of the feeling, whereas a valence component as describing the quality. That way, we could identify, if there are potential differences in predicting the quantity/intensity and quality of emotional experience from movement data.

Based on the Elbow Method we extracted two components explaining most of the variance in the data. Table 6 visualizes the item loadings of the eigenvectors having the highest eigenvalues. The first PC might be interpreted as the degree of relaxation as it got high positive loadings for tenderness, peacefulness and nostalgia, but high negative loadings for power and tension. Accordingly, it represents the opposite of arousal. The second PC might be interpreted as positive valence as it got high positive loads for joy and wonder, but a negative loading for sadness. Hence, the interesting question was whether the motion features only predicted the degree of relaxation (vs. arousal and activation) or if they could also explain degrees of valence inherent in the experienced emotionality. The estimates for the fixed effects in Table 4 show that relaxation was expressed by small movements while positive valence related to fast movements. Furthermore, the prediction results from Table 5 indicate that for relaxation about 44% of the variance in data could be described by the fitted regression models, while only 8% positive valence variance is explained. As prediciton accuracy is similar on both training and test set, there was no overfitting of the model to the data.

Download:

Table 6. Principle Component Loadings of GEMS Ratings of First Experiment.

https://doi.org/10.1371/journal.pone.0154360.t006

Discussion.

The findings in predicting the relaxation and valence components indicate, that not only arousal or intensity was predicted in the GEMS ratings, but also the quality of perceived emotion, i.e. positive valence. However, only 8% of the variance in valence were covered by the approach. Therefore, these results are similar to those of [21]. This could be explained by the fact there were several GEMS with little mean intensity and little variance as can be seen from the polarity profile in Fig 3. Examples are transcendence, sadness, wonder, and tension. Even though joy showed a rather high degree of variance according to the polarity profile, its prediction turned out to be difficult in this experiment. There might not have been a common movement pattern throughout participants for music perceived as joyful. In general, it also should to be noted that there have not been enough music excerpts to sufficiently cover all states of the GEMS. We therefore conducted a second experiment with a different set of stimuli that were chosen to cover more different GEMS feelings.

Download:

Fig 3. Polarity Profile of Mean GEMS Ratings for the First Experiement.

https://doi.org/10.1371/journal.pone.0154360.g003

Experiment 2

Experiment 2 was conducted in order to test, if the findings from Experiment 1 could be replicated. Furthermore, different music stimuli were chosen in order to cover more states of the GEMS. We also asked for more feedback concerning the movement patterns used from participants. Doing so, we tested the following two hypotheses: “Participants prefer embodied rhythmic-related descriptions when they perceive power, tension or joy” and “Participants prefer embodied gestalt-related descriptions or verbal descriptions (GEMS) when they perceive sublimity-related feelings like nostalgia or transcendence”. We assumed that sublimity-related musical emotions were more related to musical contour and melody. Thus, we assumed participants to imitate musical contour by performing gestures that are more gestalt-like and less rhythmic.