Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Affective computing of multi-type urban public spaces to analyze emotional quality using ensemble learning-based classification of multi-sensor data

  • Ruixuan Li ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Resources, Software, Validation, Visualization, Writing – original draft

    s1920040@jaist.ac.jp

    Affiliations School of Art and Design, Dalian Polytechnic University, Dalian City, Liaoning Province, China, Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan

  • Takaya Yuizono ,

    Contributed equally to this work with: Takaya Yuizono, Xianghui Li

    Roles Conceptualization, Investigation, Supervision, Writing – review & editing

    Affiliation Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan

  • Xianghui Li

    Contributed equally to this work with: Takaya Yuizono, Xianghui Li

    Roles Data curation, Funding acquisition, Validation, Visualization

    Affiliations School of Art and Design, Dalian Polytechnic University, Dalian City, Liaoning Province, China, Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan

Abstract

The quality of urban public spaces affects the emotional response of users; therefore, the emotional data of users can be used as indices to evaluate the quality of a space. Emotional response can be evaluated to effectively measure public space quality through affective computing and obtain evidence-based support for urban space renewal. We proposed a feasible evaluation method for multi-type urban public spaces based on multiple physiological signals and ensemble learning. We built binary, ternary, and quinary classification models based on participants’ physiological signals and self-reported emotional responses through experiments in eight public spaces of five types. Furthermore, we verified the effectiveness of the model by inputting data collected from two other public spaces. Three observations were made based on the results. First, the highest accuracies of the binary and ternary classification models were 92.59% and 91.07%, respectively. After external validation, the highest accuracies were 80.90% and 65.30%, respectively, which satisfied the preliminary requirements for evaluating the quality of actual urban spaces. However, the quinary classification model could not satisfy the preliminary requirements. Second, the average accuracy of ensemble learning was 7.59% higher than that of single classifiers. Third, reducing the number of physiological signal features and applying the synthetic minority oversampling technique to solve unbalanced data improved the evaluation ability.

Introduction

Affective computing has attracted significant interest in psychology, cognitive science, and computer science. Researchers have attempted to identify emotions and influencing factors through scientific and digital methods [1, 2]. Emotions can either be short- or long-term [3]. Short-term emotions are primarily related to stimuli and the corresponding response. Long-term emotions, on the other hand, are affected by more complex factors such as cultural background, politics, and time. Most researchers have focused on quantitatively revealing the relationships between people, stimuli, and emotions based on short-term emotions [4].

The emotional theories of James-Lange, Cannon-Bard, and Schachter-Singer, and the cognitive appraisal theory of Lazarus indicate that emotions are closely related to environmental stimuli and physiological responses [58]. Environmental stimuli emanate from events, weather, people, sounds, images, and scenery. Physiological responses include external responses (facial expression, language, and action) and internal physiological responses (the peripheral and central nervous systems) [9, 10]. In the fields of psychology, cognition, and computer science, researchers have used various typical or ordinary elicitations (objects), stimulated participants to elicit physiological responses, collected participants’ internal and external physiological data using instruments, and then built models of emotion recognition through data processing and feature extraction [1012]. Most of these researchers collected data in the laboratory and selected typical pictures, videos, and sounds as stimuli. However, the actual applications occur in complex external environments that produce more data noise, and stimuli are frequently not typical images or sounds but daily and ordinary stimuli. In addition, although some researchers have used the same physiological signals as indicators, the features and classifiers were significantly different [1225]. This results in a lack of comparability between related studies. Therefore, it is imperative to screen the main features and compare the classifiers to obtain a more reliable evaluation model. Furthermore, some researchers in urban design and geography have introduced emotion recognition methods and conducted related experiments in urban spaces [2632]. However, they only selected a single type of space for data collection, such as a predefined route in the city center [27], shopping route in a city center [28], specific route around a city center [29], or predetermined route in a neighborhood [30], which makes data collection easy. However, this approach limits the scope of application of the model.

Generally, two approaches are used to evaluate the quality of urban public spaces: expert and user evaluations. The expert evaluation focuses on physical attributes, such as spatial form, color, material, and ecological conditions, and social attributes such as safety, function, and aesthetic quality. User evaluation, on the other hand, focuses on users’ perceptions, behaviors, and physiological data for a comprehensive evaluation. Because the evaluation indices of both approaches are significantly different, it is difficult to integrate them into one framework. Emotion quality evaluation belongs to the second method and is a quality evaluation method based on the user’s emotional experience. With the aid of contemporary physiological signal processing, feature extraction, and machine learning technology, emotional evaluation can be used to evaluate space quality based on the emotional response of many users by reducing the influence of individual factors. Ten public spaces of five types in Japan and China were selected for this study. We collected emotional signals emanating from the participants through real-world experiments and applied ensemble learning to build emotional classification models and spatial emotional quality evaluation processes. The proposed method is suitable for multiple types of urban public spaces and is easy to operate. The final evaluation results support spatial design and urban renewal decisions.

Related research

Urban public spaces are a combination of the physical and social environments. The former includes artificial facilities, such as buildings and roads, and natural elements, such as microclimates, vegetation, and water. The combination and attributes of these elements in space can affect the environmental quality. The social environment includes security, function, aesthetic quality, and business conditions, which are partly related to the user’s experience and perception [2830]. Therefore, it is difficult to determine the weight of each factor when evaluating the environmental quality as a whole through physical and social environments, particularly for different types of public spaces, because researchers assign different weights. Thus, it is difficult to apply a spatial quality evaluation system to new spaces.

Emotion is a comprehensive human response to environmental stimuli. As an evaluation index, emotion can prevent the problem related to weighting the evaluation factors. Although psychologists have not developed a widely accepted cognitive model for evaluating the quality of emotion, which is a black-box process, they generally consider two main processes when a person receives an external stimulus. The first process is called low-class evaluation, which is a relatively automatic evaluation of the initial cognitive and emotional responses to the stimulus. The second process, called higher-class evaluation, involves more explicit recognition and evaluation of the stimuli [7, 33]. Lazarus argued that cognitive activity precedes emotions, and emotions affect subsequent perception activities [8, 33]. Overall, scholars generally consider this process as an interaction between cognition and emotion; cognitive evaluation can elicit emotional responses that influence new cognition and judgment [8, 33, 34].

However, although users’ emotions are indicators of the quality of the public space, emotions are often influenced by subjective intentions. Thus, it is difficult to obtain accurate emotional data and important to determine the appropriate methods for measuring emotions. For this purpose, researchers have proposed two methods of emotional measurement: subjective and objective. The tools for measuring emotions subjectively include the self-assessment manikin (SAM), mood adjective scales, and positive and negative emotion scales. Although it is easy to obtain emotional responses using these tools, they are prone to subjective influences. The tools for measuring emotions objectively include physiological measurements, facial motion coding systems, and text analysis measurement methods [3540]. This method is more advantageous because it prevents the subjective and deliberate influences of the participants. However, owing to technical limitations, the results of observable measurements cannot accurately reflect actual emotions. Therefore, most researchers combine subjective and objective measurement methods to reduce the data noise.

Over the past two decades, researchers in computer science, psychology, cognition, and physiology have used different methods to study emotion recognition. These researchers built various emotion recognition models by acquiring human physiological signals and extracting signal features [12, 13, 21, 22, 24, 4146]. Typically, two types of emotional stimuli are selected. The first is virtual objects, such as pictures, videos, and music, and the other is actual environments, such as an in-car environment, building environment, street, and park. Virtual objects often include emotion labels. They are strong stimuli that are independent of the participants’ emotional responses [46, 47]. Actual environments are often weak stimuli without emotion labels and depend on the participant’s emotional response. Therefore, in contrast to using space photos or street view pictures as stimuli, experiments in actual three-dimensional space can yield emotional responses and physiological data that are more reflective of the actual scenario. Table 1 shows the related studies on emotion recognition using physiological sensors in urban spaces over the past decade.

thumbnail
Table 1. Related studies on emotion recognition using physiological sensors in urban spaces over the decade.

https://doi.org/10.1371/journal.pone.0269176.t001

Emotion recognition based on physiological signals includes seven steps: 1) selecting physiological signal feedback instruments and related equipment, 2) selecting emotional stimuli, 3) conducting experiments and collecting physiological signals, 4) extracting and reducing signal features, 5) fusing data, 6) selecting classifiers, and 7) verifying models. Among the related studies shown in Table 1, six researchers selected a single campus or space in a city center as a stimulus, and five researchers collected more than two physiological signals. All the researchers mainly used single-classification support vector machine (SVM), k-nearest neighbors (KNN), naïve Bayes (NB), convolutional neural network, long short-term memory (CNN-LSTM), multilayer perceptron (MLP), and ensemble classifier random forest (RF), and finally developed binary, ternary, and quinary emotional classification models.

Physiological signal acquisition

The physiological signals related to emotions include external physiological responses and physiological signals. Owing to variations in cultures and habits, external physiological responses are diverse and may be subject to self-control. However, internal physiological signals are almost impossible to control. Therefore, they can be used as signs of emotional response [13, 17, 18, 24, 25, 3440, 4850]. The main physiological signals include EDA, electrocardiograms (ECG), electromyograms (EMG), EEG, and HRV. Most studies have indicated that using multiple physiological signals for emotion recognition is more effective than using a single physiological signal [15, 2123, 46, 5153].

Extracting and reducing signal features

The features of physiological signals include time and frequency domains and nonlinear features. The number extracted by different researchers varies significantly because of the complexity of the features. The six researchers listed in Table 1 extracted 8 to 188 features, which led to different results. To date, the degree of correlation between features and emotions is inconclusive. Researchers frequently used principal component analysis (PCA) and factor analysis to reduce the number of features [5258].

Selecting classifiers

The selection of classifiers has a significant impact on recognition accuracy. Common classifiers suitable for emotion recognition include logistic regression (LR), SVM, decision trees (DT), artificial neural networks (ANN), and ensemble models such as RF [2830, 5962]. In addition to the selection of the classifier, the number of target variables had a more significant impact on the accuracy of recognition. Generally, the number of target variables was inversely proportional to the accuracy. Although the number of target variables input to the classifier ranged between two and five in related studies, but the accuracy was not significantly different [13, 25, 29, 30, 47, 48]. Meanwhile, there is no comparability between research results [28, 30, 53, 60, 62], and the accuracy of emotion recognition was considerably different.

Methods

Typical urban spaces were selected as the stimuli to elicit the participants’ physiological and emotional signals. We built emotion recognition models using signal processing, feature extraction, and reduction. Fig 1 shows a flowchart of the study. In this process, we attempted to optimize the method of spatial emotion recognition and applied the proposed model to the public space of another city to further verify its effectiveness.

thumbnail
Fig 1. Flowchart of the experiment and data analysis process.

https://doi.org/10.1371/journal.pone.0269176.g001

According to the provisions of Article 5, Paragraph 1 of the Regulations on the Conduct of Research Involving Human Subjects of the Japan Advanced Institute of Science and Technology (JAIST), we submitted a human body research plan to the Research Ethics Committee of JAIST and obtained research permission before the experiments. The research process followed the principles of the Declaration of Helsinki. The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details.

Data collection of urban public spaces

We collected data from 10 public spaces of five types: five in Nomi City, Kanazawa City, Japan, and five in Dalian City, China. The five types of spaces were campus public spaces, residential areas, park spaces, memorial spaces, and historical pedestrian street spaces. In each space, we selected a linear space with a length of approximately 300–1000 m as the experimental route and divided each route into four sections with different spatial characteristics (function and structure), for a total of 10 × 4 = 40 sections. Additionally, we divided these 10 spaces into the ratio of 8:2, used the data from eight spaces for model training and testing, and the data from the other two spaces for the external validation of the built model. Figs 2 and 3 show the route maps and photos of each section. The location, function, sections, and length of the selected spaces and experimental routes are listed in Tables 2 and 3. We used the data from the spaces in Fig 2 and Table 2 to train and test the models and those in Fig 3 and Table 3 to verify the model performance through external validation.

thumbnail
Fig 2. Route maps and photos of each section in the eight spaces; their data were used for model training and testing.

https://doi.org/10.1371/journal.pone.0269176.g002

thumbnail
Fig 3. Route maps and photos of each section in the two spaces; their data were used for model external validation.

https://doi.org/10.1371/journal.pone.0269176.g003

thumbnail
Table 2. Basic information on the eight spaces and their data were used for model training and testing.

https://doi.org/10.1371/journal.pone.0269176.t002

thumbnail
Table 3. Basic information on the two spaces and their data were used for model external validation.

https://doi.org/10.1371/journal.pone.0269176.t003

A total of 20 students (7 men and 13 women; average age, 28.6. Fourteen of them were aged 20–29, four aged 30–39, and two 40–49) participated in the experiment. There were nine participants in the experiments in Nomi City and Kanazawa City, Japan and 11 participated in the experiment in Dalian City, China.

Except for the two campuses, none of the participants visited any of the sites before the experiment. Prior to the experiment, the aims and experiment content were explained to each participant. All the participants signed a formal consent form. During the experiment, the participants wore a Bitalino portable physiological signal feedback instrument (BITalino (r)evolution Plugged kit, PLUX Wireless Biosignals Ltd., Portugal), carried a GPS device (Nav-u NV-U73T, Sony), and walked through the five spaces. The physiological signal feedback instrument collected the participants’ EDA, ECG, and EMG, which were stored on a laptop in the backpack. The GPS recorded the participants’ location information simultaneously (Fig 4). Each participant filled out the SAM immediately after walking through each space (Fig 5).

thumbnail
Fig 4. The participants wore a physiological signal feedback device (BITalino (r)evolution Plugged kit), carried a GPS device (Nav-u NV-U73T, Sony) and a laptop computer as they walked through the selected space route at a natural pace and constant speed.

They could turn their heads to look at their surroundings. The computer automatically recorded the EDA, ECG, and EMG. The position where the electrodes of the physiological signal feedback instrument were pasted on the body: the two EDA electrodes were fixed on the first phalanx of the index and middle fingers; the ECG electrodes were fixed at the carotid arteries on both sides of the neck; the EMG electrodes were fixed on the inner side of the forearm of the arm.

https://doi.org/10.1371/journal.pone.0269176.g004

Data processing and analysis

Emotional valence and arousal.

The SAM is a straightforward and universal tool that can track individual responses to emotional stimuli in various environments and rapidly evaluate emotional responses. During the experiment, affective information were obtained from the participants’ responses to the SAM questionnaire. Because it is an odd options of emotion measurement tools, we could obtain three, five, and seven emotion levels. To build a binary classification model, we deleted the samples whose emotional valence was zero and considered emotions whose valence was -2 and -1 as negative emotions and marked them as“-1”; those whose valence were one and two were positive emotions and marked as “1.”

In addition, the statistical results of the SAM scale indicated that, compared to the meaning of emotional valence (positive or negative), emotional arousal was less understood by the participants, who found it difficult to distinguish between emotional arousal and psychological stress. Furthermore, some participants stated that they would experience psychological stress caused by individual differences as they walked through the public space while wearing instruments and stress can interfere with emotional arousal. Therefore, we did not use emotional arousal. Rather, we used only emotional valence as the target variable to build the valence classification model (S1 Table).

Physiological signal preprocessing.

Noise reduction was necessary because the physiological signals collected in public urban spaces contained more noise. The interference in the ECG signal primarily results from power frequency interference, electrode contact noise, electromyographic noise, and breathing. Therefore, we used a Butterworth filter to low-pass filter the ECG signals and applied a zero-phase-shift filter to correct the baseline drift. The denoising of the EDA signal included smoothing, denoising, and filtering using a second-order Butterworth filter with a cut-off frequency of 0.3 Hz. The EMG signal is a waveform diagram of the action potential generated by muscle contraction. Because of the influence of the participants’ walking movements, we applied the Blackman window algorithm to the EMG signal for high- and low-pass filtering (5–50 Hz).

Feature extraction and reduction.

Based on the GPS positioning, we divided each participant’s EDA, ECG, and EMG signals in each space into four segments; thus, each signal had 400 samples. As 22 samples were incomplete, 378 were valid.

To ascertain the number and effectiveness of the features, we applied different software packages to extract features from the EDA, ECG, and EMG signals. First, we used AcqKnowledge (ver. 4.2) [64] to analyze the EDA signal and obtain seven time-domain and five nonlinear features. After the Fourier transform, we obtained four frequency-domain features (S1 Fig). We then used Kubios HRV Premium software (ver. 3.4.3) [65] to extract the features of the ECG signal and obtained 17 time-domain features, 16 frequency-domain features, and 33 nonlinear features (S2 Fig). We used the plug-in EMG Toolbar V5.30 [66] of the software Origin 2019 [67] to extract the features of the EMG signals and obtained five time-domain features and two frequency-domain features (S3 Fig). We obtained a total of 68 signal features (Table 4 and S2 Table).

thumbnail
Table 4. Sixty-eight extracted signal features (all) and 50 reduced features (Italics are deleted features).

https://doi.org/10.1371/journal.pone.0269176.t004

We then used SPSS (IBM SPSS Statistics 24) to perform PCA on 68 signal features. The results indicated that the significance of the Bartley sphere test was P<0.01, KMO = 0.795, PCA was effective, and the value of extracted eigenvalues was greater than 1 (cumulative % = 85.78%) in the components. After calculating and comparing the weight of each feature, we selected 50 features (shown in bold text in Table 4) that were highly correlated with emotions (S3 Table).

Building models and evaluation methods

We obtained a total of 10 datasets, including valence and feature data from 10 spaces. We used eight of these Table 2 for model training and testing (S1 Text). The other two datasets Table 3 were used as new data to verify the classification capability of the proposed model. We then used SPSS Modeler18.1 to establish the training and validation models of binary, ternary, and quinary classifications.

Unbalanced data and synthetic minority oversampling technique (SMOTE).

The public space built in a city is primarily a place for citizens’ daily leisure and entertainment; thus, the emotions elicited by the space stimulation are primarily positive or calm. Therefore, in the collected data, we observed that the samples of “valence = -2” and “valence = -1” in the dataset were significantly less than other samples, which resulted in poor recognition of negative emotions in the training model. Therefore, we introduced the SMOTE to solve the problem of unbalanced data. Class imbalance refers to an unbalanced distribution of classes in the training set. The proportion of the minority class is equal to or less than 10% of the dataset. When the data is unbalanced, the minority classes do not provide sufficient “information”, and the model cannot accurately predict the minority classes. SMOTE is an improved oversampling method [68] that randomly selects an example from a minority group and determines its k-nearest neighbors (KNN) (k = 5 in this example). Subsequently, the algorithm randomly selects a neighborhood in the feature space, as well as a point between the two samples as a new sample, repeats the above steps, and finally achieves a balance between the majority and minority samples (Fig 6).

thumbnail
Fig 6. SMOTE algorithm: The blue square and green circle represent the minority and majority classes, respectively.

The KNN of point O in the minority set was obtained by calculating the Euclidean distance between O and each sample in the set. Based on the k (k = 5), the algorithm connected the k (k = 5) minority points (a1, a2, a3, a4, a5) around O, and finally inserted new synthetic points (O1, O2, O3, O4, O5) on the line of the two points, until the number of all the minority types and insertion points was balanced with the number of majority types.

https://doi.org/10.1371/journal.pone.0269176.g006

Single and ensemble classifiers.

In related research, the classifiers used were single classifiers, including LR, SVM, DT 5.0, ANN, and RF ensemble classifier [2830, 5962]. We used three single classifiers and three ensemble classifiers for the model training. The single classifiers were LR, DT, and ANN, and the three ensemble classifiers were DT C5.0 (boosting), RF (bagging), and the neural network (boosting).

Ensemble learning achieves a better predictive performance by combining predictions from multiple models. The three main classes of ensemble learning methods are bagging, stacking, and boosting methods. Among these, bagging and boosting are used more often than stacking. Bootstrap aggregation (bagging) is an ensemble learning method that achieves a diverse group of ensemble members by varying the training data. Boosting is a machine learning algorithm that can be used to reduce deviations in supervised learning. Boosting learns a series of weak classifiers and combines them into a robust classifier. To avoid overfitting and achieve a high classification accuracy, we compared the performance indices of the models, and finally selected the models with solid generalization ability.

Selection of performance indicators of the model.

The confusion matrix, also known as the error matrix, is a standard format for accuracy evaluation. It can be used to calculate the performance indices of the classification model: accuracy, recall, and F1-score. The calculation method for each index is as follows. Note:

TP = No. of true positives among total predictions;

FP = No. of false positives among total predictions;

FN = No. of false negatives among total predictions;

TN = No. of true negatives among total predictions.

In addition to the above three indices, we also selected the area under the curve (AUC) and the Gini coefficient as the performance indices of the binary classification model. The AUC is a popular measure of the degree or measure of separability. This indicates the extent to which the model is capable of distinguishing between the two classes. The value range of the AUC is between 0.5 and 1. An AUC of 0.5 indicates the worst performance. The closer the AUC is to 1.0, the better the performance of the model. The Gini coefficient compares the Lorenz curve of a ranked empirical distribution to the line of perfect equality. It measures the degree of concentration (inequality) of a variable within the distribution of its elements. It is calculated as follows:

For the indices of the ternary and quinary class classification models, we also selected Cohen’s kappa coefficient to test the consistency of the classification results. Cohen’s kappa is a statistical coefficient that represents the degree of accuracy and reliability of the classification. It measures the agreement between two raters who classify items into mutually exclusive categories [69]. The kappa value is always less than or equal to one, indicating less-than-perfect or perfect agreement, respectively. The Cohen’s kappa coefficient was calculated as follows: where po is the relative observed agreement among raters, and pe is the hypothetical probability of chance agreement.

Results

The effect of feature reduction on the models

The PCA algorithm was used to reduce the extracted 68 features to 50. However, although the PCA algorithm reduced the dimension of the independent variables, the significance of these independent variables to the target variable was not clear. To verify whether the reduction in the number of features had a positive effect on valence classification, we used 68 and 50 signal features to build binary and ternary classification models, respectively (RF (bagging) and ANN (boosting) as classifiers). Table 5 presents the results of the model performance before and after feature reduction.

thumbnail
Table 5. Comparison of the performance of the models based on 68 features and 50 features.

https://doi.org/10.1371/journal.pone.0269176.t005

Classification results and performance comparison

Binary classification.

We divided the eight datasets used for the training and testing models into two parts, in the ratio of 8:2, which were randomly selected as the training and test sets, respectively (S4 Table) The values of the target variable for binary classification were “-1, 1,” and 50 signal features as the independent variables. The model performance results are presented in Table 6 and S4 Fig.

thumbnail
Table 6. Performance comparison of binary classification models with different classifiers.

https://doi.org/10.1371/journal.pone.0269176.t006

The results of binary classification indicated that the recognition accuracies of the models based on the ANN and ANN (boosting) were higher than 90%, and they had better classification performance. These results also indicate that the two models was effective for evaluating the affective quality evaluation of urban public spaces.

Ternary classification.

The value of the target variable for ternary classification were “-1, 0, and 1”, and all the valid sample data were used in model training or testing. The sample data were divided into training and test sets at a ratio of 8:2, and SMOTE was used for data oversampling (S5 Fig). After testing the models, we obtained the classification accuracy and average of each class of model performance index, as presented in Table 7 and Fig 7.

thumbnail
Fig 7. Confusion matrices of the ternary class classification using DT C5.0 (boosting) (a), RF (bagging) (b), and ANN (boosting) (c).

https://doi.org/10.1371/journal.pone.0269176.g007

thumbnail
Table 7. Performance comparison of ternary classification models with different classifiers.

https://doi.org/10.1371/journal.pone.0269176.t007

The performance indices of each class classification in the ternary classification model are listed in Tables 8 and 9.

thumbnail
Table 8. Performance indexes of each class classification in the ternary classification model with three single classifiers.

https://doi.org/10.1371/journal.pone.0269176.t008

thumbnail
Table 9. Performance indexes of each class classification in the ternary classification model with ensemble learning.

https://doi.org/10.1371/journal.pone.0269176.t009

From the results of the ternary classification, we observed that the models based on the ANN (boosting) and RF (bagging) had higher performance index values and their recognition accuracies were 91.07% and 90.18%, respectively. Moreover, the models exhibited better classification abilities for each class (Fig 7). The results indicated that both models could also effectively evaluate the affective quality of urban public spaces.

Quinary classification.

The value of the target variable for quinary classification was “-2, -1, 0, 1, 2”, and all the valid sample data were used to build the models. We divided the sample data into training and test sets according to a ratio of 8:2 and used SMOTE for data oversampling (S6 Fig). After testing the models, we obtained the following classification accuracy and average of Recall, F1-score, and Kappa for each class, which are presented in Table 10 and Fig 8.

thumbnail
Fig 8. Confusion matrices of the quinary class classification using DT C5.0 (boosting) (a), RF (bagging) (b), and ANN (boosting) (c).

https://doi.org/10.1371/journal.pone.0269176.g008

thumbnail
Table 10. Performance comparison of quinary classification models with different classifiers.

https://doi.org/10.1371/journal.pone.0269176.t010

The performance indices of each class classification in the quinary classification model are listed in Tables 11 and 12.

thumbnail
Table 11. Performance indexes of each class classification in the quinary classification model with three single classifiers.

https://doi.org/10.1371/journal.pone.0269176.t011

thumbnail
Table 12. Performance indexes of each class classification in the quinary classification model with ensemble learning.

https://doi.org/10.1371/journal.pone.0269176.t012

The results of the quinary classification indicated that the model that incorporated DT C5.0 (boosting) had the best classification performance. However, its accuracy was only 69.86%, and the kappa coefficient was low, which demonstrated that the recognition performance of each class was very uneven, although some classes had 100% accuracy. Thus, in practice, these six models cannot satisfy the quinary classification of the affective quality of a space.

The comparison of the four indices of the binary, ternary, and quinary classification models with the best performance is shown in Fig 9. The results indicated that the classification ability declined sequentially, and that the quinary class classification had a significant decline. The binary and ternary class classification models were proven to be able to satisfy the practical requirements.

thumbnail
Fig 9. Comparison of accuracy and main performance indexes (binary: AUC; ternary and quinary: Kappa) of binary, ternary, and quinary classification models.

https://doi.org/10.1371/journal.pone.0269176.g009

External validation

In addition to internal testing, the performance of the models was subjected to external validation. We input the two previously selected spatial datasets (collected from Japan and China) into the built binary, ternary, and quinary classification models to verify the effectiveness of the model at predicting new spatial emotional quality (S7 Fig). The models output results for the two spaces. By comparing the output classification results with the raw valence values, we obtained the accuracy and confusion matrices of the classification, as shown in Table 13, Figs 10 and 11 (S5 Table).

thumbnail
Fig 10. Confusion matrices of the ternary class classification for external validation using DT C5.0 (boosting) (a), RF (bagging) (b), and neural network (boosting) (c).

https://doi.org/10.1371/journal.pone.0269176.g010

thumbnail
Fig 11. Confusion matrices of the quinary class classification for external validation using the DT C5.0 (boosting) (a), RF (bagging) (b), and ANN (boosting) (c).

https://doi.org/10.1371/journal.pone.0269176.g011

thumbnail
Table 13. Classification accuracy of the emotional quality of the two new spaces using the proposed binary, ternary, and quinary-class classification models.

https://doi.org/10.1371/journal.pone.0269176.t013

The results indicated that the highest accuracy of external validation in binary classification was 80.9%, whereas those of ternary and quinary types were 65.3% and 61.1%, respectively. Moreover, the accuracies of the ensemble classifiers were generally higher than those of the corresponding single classifiers. The confusion matrix of the ternary classification indicated that the classification results of samples whose valences were -1 were lower than those of the other classes. Because there was no sample whose valence was -2 in the new data, the quinary classification result was zero and the classification results of the samples whose valences were zero and one were more accurate than those of the others.

Application process of the proposed model

The training model was designed for evaluating the quality of public spaces in practice. Furthermore, the external validation described in the previous section was aimed at not only the verification of the model, but also the application of the model in practice. These two steps verified the effectiveness of the model in practice.

Therefore, we attempted to develop a process for evaluating the affective quality of urban public spaces based on multiple physiological signals (Fig 12). The process entailed the following steps. First, we determined the experimental routes and divided them into several sections. Then, we invited the local community residents to participate and sign the consent form. In the data collection stage, we collected several physiological signals when the participants walked through these routes. After feature extraction, fusion, and reduction, the features were input into the classification model. According to the results, a space with a positive valence will maintain the status quo, whereas a space with a negative valence required renovation.

thumbnail
Fig 12. Emotional quality evaluation process of urban built public space.

https://doi.org/10.1371/journal.pone.0269176.g012

Discussion

Models suitable for evaluating the affective quality of multitype public spaces were built and examined in this study. We not only improved the model’s performance through feature selection, SMOTE, and ensemble classifiers but also used external validation to verify the actual performance of the model.

The aims and methods of the proposed approach differed from those of extant approaches. First, to ensure the adaptability of the model, the scope of this study was multi-type spaces across countries. Second, we used three ensemble classifiers and compared their performances with those of single classifiers. In the past 10 years, ensemble classifiers have demonstrated strong classification performance. Compared with the models established using classifiers, such as SVM [28, 30, 47], KNN [28], BEP-tree [30], MLP [30], and RF [28, 46], in related studies, the ensemble classifiers used in this study exhibited a higher classification accuracy, 92.59% in binary class classification and 91.07% in ternary class classification, and supported by the higher performance indices. For the quinary classification, the highest accuracy in this study was 69.86%, which was lower than the 79% obtained by Kalimeri and Saitis [46]. We attributed this to the single-space experiments and similar emotional responses. The features of each part of the same space were generally not significantly different; thus, although high accuracy was achieved, the diversity of the spatial emotions and the adaptability of the model were reduced. Third, the proposed model was subjected to external validation to circumvent the limitations arising from sourcing the data of the training and validation sets from the same space. Thus, new data were inputted into the model and the results indicated that the performance of the model decreased significantly; specifically, for the multiclass classification model, the decline was between 5%-30%. Therefore, we confirmed that classification studies cannot be performed using only a unified dataset, and external verification is necessary.

As shown in Table 1, previous researchers extracted 8–188 features from physiological signals. This considerable difference in the number of features was owing to the difference in the number of physiological signals and the feature extraction method. Therefore, to ensure the comparability of the studies and facilitate their operation in practical applications, we selected three commonly used physiological signals, EDA, ECG, and EMG, and the PAC method, which is widely used to reduce the feature dimensions. As shown in Table 10, with the same classifier, the recognition accuracy of the model increased by 6.35% on average, after the number of features was reduced from 68 to 50; other indices improved as well. These results indicated that the PAC algorithm effectively eliminated data redundancy and noise and improved the classification ability of the model. However, obtaining a definite number of features remains a challenge and solving this problem requires scholarly consensus following extensive experiments.

Compared with positive emotions, relatively fewer spaces, unless they are undeveloped or under problematic management, elicit negative emotions. Thus, we had a situation where the data sample contained insufficient examples of negative emotions, occasionally, less than 1/10 of the positive emotion samples. The unbalanced samples resulted in inaccurate predictions. Generally, up-sampling and down-sampling the data or algorithm level can solve this problem; however, simply increasing the amount of data by duplication affects a model’s adaptability. On the other hand, directly reducing the sample size results in information loss. Oversampling techniques, such as SMOTE, increase the number of minority samples. Additionally, it has minimal effect on the information contained in the data, making it possible to obtain a model with better classification ability.

By calculating the average difference among the accuracies of the three ensemble classifiers and the three single classifiers in Tables 5, 6 and 8, we observed that the average accuracy of the ensemble classifiers was 7.59% higher than that of the single classifiers. A comparison of the Gini and Kappa coefficients yielded similar results, which indicated that these ensemble classifiers adapted better to the multi-noise data collected in urban public spaces. Moreover, the performance of the models with ANN (boosting) and RF (bagging) classifiers was better than that of the model with DT C5.0 (boosting). These results may be attributed to the greater data fault tolerance of neural network (boosting) and RF (bagging) in comparison to DT C5.0 (boosting). Users’ emotions are affected by a variety of spatial factors; therefore, the fault tolerance of the models was significant.

External validation is a method for validating the predictive ability of a model by entering a new dataset. Related studies have shown that good test results do not guarantee that a model will have good adaptability. The predictive ability of the model for new data is often lower than that of the test results [7072]. Similar results were obtained in our study. The results of the external validation of the quinary classification were significantly worse than those of the test results. We attributed this to the use of different spatial data and participants, as well as the limited sample size of external verification. Quinary classification requires a larger sample size than binary and ternary classification. Meanwhile, as a comparison of Figs 7 and 9 reveals, the two classification results were almost the opposite. In the classification of the test set, the classification results of the samples whose valences were -2, -1, and two were better than that of others. In contrast, the classification results of the samples whose valences were zero and one were better than that of others in the external validation classification. This may be owing to the use of SMOTE, which increases the minority class samples through oversampling, increases the number of samples with similar information to the original samples, and finally, reduces the model’s ability to classify new minority class samples. In binary and the ternary classifications, the impact of SMOTE was limited owing to the large sample size. Therefore, external validation was a further step toward verifying the model’s actual performance. Although SMOTE is suitable for large sample sizes, as the number of classes increases, the sample size of each class decreases, and its effect becomes very limited.

Limitations

In this study, an affective quality evaluation model for multi-type urban public spaces was built. However, the proposed model had limitations in the following three aspects. First, binary and ternary classification models can be used to evaluate multiple types of public spaces. However, the results of the quinary classification were poor, and its performance could only be improved by increasing the number of samples and samples of different categories. Second, the data of emotional quality assessment could not reflect the comprehensive features of the public space because it was based on personal experience. Therefore, commercial and spatial behavior data must be added to the evaluation model to obtain detailed information about the public space. Third, human emotions include short-term and long-term effects. Users who enter a public space for the first time rely primarily on their physical senses to perceive it. After long-term use, factors such as space function, public social interaction, and place attachment become the main factors affecting evaluation. Thus, it is necessary to further analyze the long-term emotions evoked by a space to obtain a more comprehensive evaluation of its affective quality.

Conclusions and future research

Despite the above limitations, we can confidently report that the binary and ternary affective evaluation of multiple types of spaces based on multiple physiological signals can satisfy the requirements of decision-making on urban public spaces renewal.

Whether through expert or user evaluation, the evaluation of public spaces in different regions, styles, and functions has always been a controversial problem in urban science. Our focus was on enhancing the adaptability and classification capabilities of the proposed model. To obtain a model with better adaptability, we collected data from five types of spaces in two countries to ensure the diversity of spatial data. In addition, we improved the classification performance of the model using efficient feature reduction, SMOTE algorithm, and ensemble learning. We also compared the performances of the binary, ternary, and quinary classification models. Finally, through external validation, we observed that the binary and ternary classification models outperformed the quinary model at satisfying practical requirements.

In future research, we will attempt to study the effects of long-term emotions, spatial function, and neighborhood interaction on the evaluation of spatial affective quality. Through multimodal signal extraction and new machine learning technologies, we can continuously improve the performance of the spatial quality evaluation model and provide technical support for the construction of intelligent cities.

Supporting information

S1 Table. Valence statistics of participants in 10 public spaces.

https://doi.org/10.1371/journal.pone.0269176.s001

(XLSX)

S2 Table. Dataset of 68 signal features extracted from EDA, EMG and ECG signals.

https://doi.org/10.1371/journal.pone.0269176.s002

(XLSX)

S3 Table. Dataset of 50 signal features after feature reduction.

https://doi.org/10.1371/journal.pone.0269176.s003

(XLSX)

S4 Table. Dataset from eight spaces for binary classification.

https://doi.org/10.1371/journal.pone.0269176.s004

(XLSX)

S5 Table. Data and calculation tables output after external validation.

https://doi.org/10.1371/journal.pone.0269176.s005

(XLSX)

S1 Fig. Example of feature extraction from EDA signal.

https://doi.org/10.1371/journal.pone.0269176.s006

(TIF)

S2 Fig. Example of feature extraction from ECG signal.

https://doi.org/10.1371/journal.pone.0269176.s007

(TIF)

S3 Fig. Example of feature extraction from EMG signal.

https://doi.org/10.1371/journal.pone.0269176.s008

(TIF)

S4 Fig. Data flow for training and testing a binary classification model.

https://doi.org/10.1371/journal.pone.0269176.s009

(TIF)

S5 Fig. Data flow for training and testing a ternary classification model.

https://doi.org/10.1371/journal.pone.0269176.s010

(TIF)

S6 Fig. Data flow for training and testing a quintuple classification model.

https://doi.org/10.1371/journal.pone.0269176.s011

(TIF)

S7 Fig. Data flow for external validation.

https://doi.org/10.1371/journal.pone.0269176.s012

(TIF)

Acknowledgments

We would like to acknowledge the participants in this study.

References

  1. 1. Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion. 2017; 37: 98–125.
  2. 2. Picard RW. Emotion research by the people, for the people. Emotion Review. 2010; 2(3): 250–254.
  3. 3. Kozma A, Stone S, Stones MJ, Hannah TE, McNeil K. Long- and short-term affective states in happiness: Model, paradigm and experimental evidence. Social Indicators Research. 1990; 22(2): 119–138.
  4. 4. Houben M, Noortgate W, Kuppens P. The relation between short-term emotion dynamics and psychological well-being: A meta-analysis. Psychological bulletin. 2015; 141(4): 901–930. pmid:25822133
  5. 5. James W. Psychology, briefer course. vol. 14. Harvard University Press; 1984. Available: https://muse.jhu.edu/book/52455
  6. 6. Cannon WB. The James-Lange theory of emotions: A critical examination and an alternative theory. The American Journal of Psychology. 1927; 100(3/4): 567–586. Available: https://www.jstor.org/stable/1422695
  7. 7. Schachter S, Singer J. Cognitive, social, and physiological determinants of emotional state. Psychological Review. 1962; 69(1): 378–399. https://doi.org/10.1037/h0038845 pmid:14497895
  8. 8. Lazarus RS. Cognition and motivation in emotion. American Psychologist. 1991; 46(4): 352–367. pmid:2048794
  9. 9. Kanjo E, Al-Husain L, Chamberlain A. Emotions in context: Examining pervasive affective sensing systems, applications, and analyses. Personal and Ubiquitous Computing. 2015; 19(7): 1197–1212.
  10. 10. Kreibig SD. Autonomic nervous system activity in emotion: A review. Biological Psychology. 2010; 84(3): 394–421. pmid:20371374
  11. 11. Ferreira BRA. Emotions recognition based on sensor fusion and machine learning techniques [Master Thesis]. Universidade do Minho. Portugal; 2018. Available: http://www3.dsi.uminho.pt/pimenta/supmsdsis/bd/PT4ILMPFOKCQU/ProjDissertacaoBrunoFerreiraA72285.pdf
  12. 12. Al Machot F, Elmachot A, Ali M, Al Machot E, Kyamakya K. A deep-learning model for subject-independent human emotion recognition using electrodermal activity sensors. Sensors. 2019; 19(7): 1659. pmid:30959956
  13. 13. Ali M, Machot FA, Mosa AH, Jdeed M, Machot EA, Kyamakya K. A globally generalized emotion recognition system involving different physiological signals. Sensors. 2018; 18 (6): 1905. pmid:29891829
  14. 14. Aniket V. Biosignal processing challenges in emotion recognition for adaptive learning [Doctoral dissertation]. Univ. Central Florida. Florida; 2010. Available: http://purl.fcla.edu/fcla/etd/CFE0003301.
  15. 15. AlZoubi O, Hussain MS, D’Mello S, Calvo RA. Affective modeling from multichannel physiology analysis of day differences. In: Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science. vol. 6974. Berlin, Heidelberg: Springer; 2011. pp. 4–13. https://doi.org/10.1007/978-3-642-24600-5_4
  16. 16. Geiser M, Walla P. Objective measures of emotion during virtual walks through urban environments. Applied Sciences. 2011; 1(1): 1–11.
  17. 17. Jerritta S, Murugappan M, Nagarajan R, Wan K. Physiological signals based human emotion recognition: A review. In: 2011 IEEE 7th International Colloquium on Signal Processing and its Applications. IEEE; 2011. pp. 410–415. https://doi.org/10.1109/CSPA.2011.5759912
  18. 18. Kołakowska A, Szwoch W, Szwoch M. A review of emotion recognition methods based on data acquired via smartphone sensors. Sensors. 2020; 20(21): 6367. pmid:33171646
  19. 19. Ragot M, Martin N, Em S, Pallamin N, Diverrez JM. Emotion recognition using physiological signals: Laboratory vs. wearable sensors. In: Advances in Human Factors in Wearable Technologies and Game Design. AHFE 2017. Advances in Intelligent Systems and Computing, vol 608. Springer, Cham. 2018. pp.15–22 https://doi.org/10.1007/978-3-319-60639-2_2
  20. 20. Fatma N, Kaye A, Christine LL, Neal F. Emotion recognition from physiological signals using wireless sensors for presence technologies. Cognition, Technology & Work. 2004; 6(1): 4–14.
  21. 21. Torres-Valencia C, Álvarez-López M, Orozco-Gutiérrez Á. SVM-based feature selection methods for emotion recognition from multimodal data. Journal on Multimodal User Interfaces. 2017; 11(1): 9–23.
  22. 22. Zhang X, Xu C, Xue W, Hu J, He Y, Gao M. Emotion recognition based on multichannel physiological signals with comprehensive nonlinear processing. Sensors. 2018; 18(11): 3886. pmid:30423894
  23. 23. Verma GK, Tiwary US. Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage. 2014; 102 Pt 1: 162–172. pmid:24269801
  24. 24. Wen W, Liu G, Cheng N, Wei J, Shangguan P, Huang W. Emotion recognition based on multi-variant correlation of physiological signals. IEEE Transactions on Affective Computing. 2014; 5(2): 126–140.
  25. 25. Li W, Zhang Z, Song A. Physiological-signal-based emotion recognition: An odyssey from methodology to philosophy. Measurement. 2020; 172(4): 108747. https://doi.org/10.1016/j.measurement.2020.108747
  26. 26. Hogertz C. Emotions of the urban pedestrian: sensory mapping. Pedestrians’ quality needs. Walk21 Cheltenham (UK); 2010. pp. 31–52. Available: http://files.designer.hoststar.ch/hoststar10546/file/1-pqn_final_report_part_b4_meas_walking.pdf#page=32
  27. 27. Birenboim A, Dijst M, Scheepers FE, Poelman MP, Helbich M. Wearables and location tracking technologies for mental-state sensing in outdoor environments. The Professional Geographer. 2019; 71(3): 449–461.
  28. 28. Kanjo E, Younis EMG, Sherkat N. Towards unravelling the relationship between on-body, environmental and emotion data using sensor information fusion approach. Information Fusion. 2018; 40(1): 18–31.
  29. 29. Kanjo E, Younis EMG, Ang CS. Deep learning analysis of mobile physiological, environmental and location sensor data for emotion detection. Information Fusion. 2019; 49(10): 46–56.
  30. 30. Ojha VK, Griego D, Kuliga S, Bielik M, Buš P, Schaeben C, et al. Machine learning approaches to understand the influence of urban environments on human’s physiological response. Information Sciences. 2019; 474(1): 154–169.
  31. 31. Resch B, Puetz I, Bluemke M, Kyriakou K, Miksch J. An interdisciplinary mixed-methods approach to analyzing urban spaces: The case of urban walkability and bikeability. International Journal of Environmental Research and Public Health. 2020; 17(19): 6994. pmid:32987877
  32. 32. Yao W, Zhang X, Gong Q. The effect of exposure to the natural environment on stress reduction: A meta-analysis. Urban Forestry & Urban Greening. 2021; 57: 126932.
  33. 33. Smith CA, Lazarus RSl. Emotion and adaptation. Handbook of personality: Theory and research. 1990; pp. 609–637. https://doi.org/10.2307/2075902
  34. 34. Brosch T, Pourtois G, Sander D. The perception and categorisation of emotional stimuli: A review. Cognition and Emotion. 2010; 24(3): 377–400.
  35. 35. Zadra JR, Clore GL. Emotion and perception: The role of affective information. Wiley Interdisciplinary Reviews Cognitive Science. 2011; 2(6): 676–685. pmid:22039565
  36. 36. Brosch T, Pourtois G, Sander D. The perception and categorisation of emotional stimuli: A review. Cognition and Emotion. 2009; 24(3), 377–400.
  37. 37. Lazarus RS. Thoughts on the relations between emotion and cognition. American Psychologist. 1982; 37(9): 1019. Available: http://gruberpeplab.com/3131/Lazarus_1982.pdf
  38. 38. Blair KS, Smith BW, Mitchell DG, Morton J, Vythilingam M, Pessoa L, et al. Modulation of Emotion by Cognition and Cognition by Emotion. Neuroimage. 2007; 35(1): 430–440. pmid:17239620
  39. 39. Marcus GE, Neuman WR, MacKuen MB. Measuring emotional response: Comparing alternative approaches to measurement. Political Science Research and Methods. 2017; 5(4): 733–754.
  40. 40. Mauss IB, Robinson MD. Measures of emotion: A review. Cognition and Emotion. 2009; 23(2): 209–237. pmid:19809584
  41. 41. Thanapattheerakul T, Mao K, Amoranto J, Chan JH. Emotion in a century: A review of emotion recognition. In: Proceedings of the 10th International Conference on Advances in Information Technology—IAIT 2018. New York, USA: ACM Press; 2018. pp. 1–8. https://doi.org/10.1145/3291280.3291788
  42. 42. Chen J, Hu B, Wang Y, Moore P, Dai Y, Feng L, et al. Subject-independent emotion recognition based on physiological signals: A three-stage decision method. BMC Medical Informatics and Decision Making. 2017; 17(Suppl 3): 167. pmid:29297324
  43. 43. Hui TKL, Sherratt RS. Coverage of emotion recognition for common wearable biosensors. Biosensors. 2018; 8(2): 30. pmid:29587375
  44. 44. Kim KH, Bang SW, Kim SR. Emotion recognition system using short-term monitoring of physiological signals. Medical and Biological Engineering and Computing. 2004; 42(3): 419–427. pmid:15191089
  45. 45. Xun L, Zheng G. ECG signal feature selection for emotion recognition. Telkomnika. 2013; 11(3): 1363–1370.
  46. 46. Kalimeri K, Saitis C. Exploring multimodal biosignal features for stress detection during indoor mobility. In: Proceedings of the 18th ACM international conference on multimodal interaction; 2016. pp. 53–60. https://doi.org/10.1145/2993148.2993159
  47. 47. Olsen AF, Torresen J. Smartphone accelerometer data used for detecting human emotions. In: The 2016 3rd International Conference on Systems and Informatics (ICSAI 2016). Shanghai, China: IEEE; 2016. pp. 410–415. Available: http://doi.org/10.1109/ICSAI.2016.7810990
  48. 48. Londhe S, Borse R. Emotion recognition based on various physiological signals-A review. ICTACT Journal on Communication Technology. 2018;9(3): 1815–1822.
  49. 49. Nikolova D, Petkova P, Manolova A, Georgieva P. ECG-based emotion recognition: Overview of methods and applications. In: ANNA’18; Advances in Neural Networks and Applications 2018. VDE; 2018. pp. 1–5. Available: http://ieeexplore.ieee.org/servlet/opac?punumber=8576698
  50. 50. Shu L, Xie J, Yang M, Li Z, Li Z, Liao D, et al. A review of emotion recognition using physiological signals. Sensors. 2018; 18(7): 2074. pmid:29958457
  51. 51. Alberdi A, Aztiria A, Basarab A. Towards an automatic early stress recognition system for office environments based on multimodal measurements: A review. Journal of Biomedical Informatics. 2016; 59: 49–75. pmid:26621099
  52. 52. Tan C, Ceballos G, Kasabov N, Subramaniyam NP. Fusionsense: Emotion classification using feature fusion of multimodal aata and deep learning in a brain-inspired spiking neural network. Sensors. 2020; 20(18): 5328.
  53. 53. Nweke HF, Teh YW, Mujtaba G, Al-garadi MA. Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion. 2019; 46(Part 1): 147–170.
  54. 54. Christelle G, Fabrice PB, Aurélie C, Sylvie Charbonnier, Stéphane Bonnet, Audrey Vidal. Features relevance analysis for emotion classification with physiological sensors. In: Proceedings of the 2nd International Conference on Physiological Computing Systems. Scitepress—Science and Technology Publications; 2015. pp. 17–25. https://doi.org/10.5220/0005238600170025
  55. 55. Banda N, Engelbrecht A, Robinson P. Feature reduction for dimensional emotion recognition in human-robot interaction. In: 2015 IEEE Symposium Series on Computational Intelligence. IEEE; 2015. pp. 803–810. https://doi.org/10.1109/SSCI.2015.119
  56. 56. Gu Y, Tan SL, Wong KJ, Ho MHR, Qu L. Using GA-based feature selection for emotion recognition from physiological signals. In: 2008 International Symposium on Intelligent Signal Processing and Communications Systems. IEEE; 2009. pp. 1–4. https://doi.org/10.1109/ISPACS.2009.4806747
  57. 57. Johannes W, Jonghwa K, Elisabeth A. Physiological signals to emotions: Implementing and comparing selected methods for feature extraction and classification. In: 2005 IEEE International Conference on Multimedia and Expo. IEEE; 2005. pp. 940–943. https://doi.org/10.1109/ICME.2005.1521579
  58. 58. Shukla J, Barreda-Angeles M, Oliver J, Nandi GC, Puig D. Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Transactions on Affective Computing. 2019; 12(4): 857–869.
  59. 59. Chen S, Jiang K, Hu H, Kuang H, Yang J, Luo J, et al. Emotion recognition based on skin potential signals with a portable wireless device. Sensors. 2021; 21(3): 1018. pmid:33540831
  60. 60. Iliou T, Anagnostopoulos CN. Comparison of different classifiers for emotion recognition. In: 2009 13th Panhellenic Conference on Informatics. IEEE; 2009. pp. 102–106. https://doi.org/10.1109/PCI.2009.7
  61. 61. Keelawat P, Thammasan N, Numao M, Kijsirikul B. A comparative study of window size and channel arrangement on EEG-emotion recognition using deep CNN. Sensors. 2021; 21(5): 1678. pmid:33804366
  62. 62. Molavi M, Yunus Jb, Akbari E. Comparison of different methods for emotion classification. In: 2012 Sixth Asia Modelling Symposium. IEEE; 2012. pp. 50–53. https://doi.org/10.1109/AMS.2012.53
  63. 63. Bradley MM, Lang PJ. Measuring emotion: the self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry. 1994; 25(1): 49–59. pmid:7962581
  64. 64. Acqknowledge ver. 4.2 [Computer software]. 2012. Supplied by Biopac Systems, Inc, USA, Retrieved from https://www.biopac.com/acqknowledge-4-2-released
  65. 65. Kubios HRV Premium ver. 3.4.3 [Computer software]. 2020. Supplied by Kubios Oy, Kuopio, Finland, Retrieved from https://www.kubios.com/hrv-premium
  66. 66. Antoine Couturier, EMG Toolbar ver. 5.30 [Computer software]. 2017. Retrieved from https://www.originlab.com/fileExchange/details.aspx?fid=420
  67. 67. Origin 2019 [Computer software]. 2019. Supplied by OriginLab Corporation, USA, Retrieved from https://www.originlab.com/2019
  68. 68. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002; 16: 321–357.
  69. 69. Uebersax JS. A generalized kappa coefficient. Educational and Psychological Measurement. 1982; 42(1): 181–183.
  70. 70. Consonni V, Ballabio D, Todeschini R. Evaluation of model predictive ability by external validation techniques. Journal of Chemometrics. 2010; 24(3–4): 194–201.
  71. 71. Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology. 2005; 58(5): 475–483. pmid:15845334
  72. 72. Collins GS, Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Medical Research Methodology. 2014; 14(1): 1–11. pmid:24645774