Expansion quantization network: A micro-emotion detection and annotation framework

Jingyi Zhou; Senlin Luo; Haofan Chen

doi:10.1371/journal.pone.0333930

Abstract

Textemotion detection constitutes a crucial foundation for advancing artificial intelligence from basic comprehension to the exploration of emotional reasoning. Most existing emotion detection datasets rely on manual annotations, which are associated with high costs, substantial subjectivity, and severe label imbalances. This is particularly evident in the inadequate annotation of micro-emotions and the absence of emotional intensity representation, which fail to capture the rich emotions embedded in sentences and adversely affect the quality of downstream task completion. By proposing an all-labels and training-set label regression method, we map label values to energy intensity levels, thereby fully leveraging the learning capabilities of machine models and the interdependencies among labels to uncover multiple emotions within samples. This led to the establishment of the Emotion Quantization Network (EQN) framework for micro-emotion detection and annotation. Using five commonly employed sentiment datasets, we conducted comparative experiments with various models, validating the broad applicability of our framework within NLP machine learning models. Based on the EQN framework, emotion detection and annotation are conducted on the GoEmotions dataset. A comprehensive comparison with the results from its literature demonstrates that the EQN framework possesses a high capability for automatic detection and annotation of micro-emotions. The EQN framework is the first to achieve automatic micro-emotion annotation with energy-level scores, providing strong support for further emotion detection analysis and the quantitative research of emotion computing.

Citation: Zhou J, Luo S, Chen H (2025) Expansion quantization network: A micro-emotion detection and annotation framework. PLoS One 20(11): e0333930. https://doi.org/10.1371/journal.pone.0333930

Editor: Alemayehu Getahun Kumela, Universite Cote d'Azur, FRANCE

Received: December 19, 2024; Accepted: September 19, 2025; Published: November 13, 2025

Copyright: © 2025 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: This project is available on Figshare at the following DOI: 10.6084/m9.figshare.30406315.

Funding: This research is partially supported by the 242 National Information Security Projects,PR China under Grant 2020A065. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Emotions represent one of the most intricate intrinsic experiences of humanity. Artificial intelligence can gain profound insights into and analyze human emotions through emotion detection, allowing for a better understanding of people’s emotional responses [1]. This capability can help individuals comprehend their motivations, enhance the rationality of decision-making, improve interpersonal relationships [2], treat psychological disorders such as depression and anxiety [3], and develop robots with greater emotional understanding to enhance user experiences, among other benefits.

Emotion datasets are a vital resource in NLP research. Currently, the most widely used datasets are single-label datasets [4–7], where each sample can belong to only one emotional category. Traditional single-label emotion datasets have ranged from early binary or ternary categories (such as “positive,” “negative,” and “neutral”) [8] to the six basic emotions of joy, sadness, anger, fear, disgust, and surprise proposed by Ekman [9], and Plutchik’s eight basic emotions [10].Single-label datasets are relatively straightforward to annotate, incur lower manual costs, and are less prone to errors. However, a single sentence or passage often contains multiple emotions, which cannot be adequately summarized with simple labels, leading to the emergence of multi-label emotion datasets. In multi-label datasets, each sample can simultaneously belong to two or more emotional categories. As NLP technology continues to innovate and evolve, multi-label emotion datasets enriched with nuanced micro-emotions are expected to replace the initial, singular annotations of single-label datasets [11]. SemEval-2007 serves as a micro-emotion dataset utilizing news headlines as its corpus, annotated with emotion labels carrying effective values, yet it comprises a relatively small sample size of 1,250. In 2020, at the ACL conference, Google released the large, manually annotated GoEmotions multi-label emotion dataset [12].Despite being considered the largest fine-grained emotion dataset currently available [13], the GoEmotions dataset still faces issues of label sparsity and imbalance. Specifically, single-label samples account for 83%, while dual-label samples comprise only 15%, and samples with three or more labels make up a mere 2%.The sparsity of labels may hinder model learning [14], particularly due to imbalances where some labels appear infrequently, resulting in poor predictive performance, while others occur frequently, leading to biased predictions favoring those labels.

In recent years, with the rapid development of human-machine alignment [15] and humanoid robots [16], there has been an increasing demand for machines to understand human emotions, making affective computing [17] and micro-expressions [18,19] prominent research topics. While macro-expressions provide relatively straightforward and direct representations of emotions, micro-expressions more accurately reflect subtle, unconscious, or fleeting emotional states. Annotating and detecting micro-expressions presents a greater challenge. Current methodologies for capturing and detecting micro-expressions through images or videos have resulted in the development of several datasets with emotional intensity related to micro-expressions [20–24], alongside published research findings. Compared to micro-expressions, the subtle micro-emotions embedded in natural language expressions can more comprehensively capture the nuanced emotional fluctuations present in human language and text. Achieving human-machine alignment in humanoid robots and human-machine dialogue necessitates that the recognition and detection of micro-emotions in textual communication be regarded as equally important as that of micro-expressions. Currently, the scarcity of publicly available datasets for text-based micro-emotions poses a challenge for research in this domain.

In the context of natural language, micro-emotions refer to fleeting, low-intensity, and often subconscious emotional states expressed subtly through text. These emotions are typically harder to detect than macro-emotions such as joy or anger, as they are conveyed through nuanced wording, implicit sentiment, or slight linguistic variations. For instance, a sentence like “That’s exactly the kind of brilliant nonsense I expected” may carry sarcasm mixed with disappointment and helplessness—emotional shades that would be missed by coarse labeling systems. Micro-emotions provide a deeper view into the speaker’s internal psychological state, making them highly valuable in fields such as sentiment analysis, mental health screening, and user experience optimization. Our study considers micro-emotions as integral components of fine-grained emotion modeling and aims to capture them systematically through quantification and annotation.

The capacity of machines to understand textemotion has been a long-term research objective in natural language processing (NLP). Currently, emotion datasets—whether annotated for single-label, multi-label, or micro-expressions—primarily rely on manual annotation. This reliance is inevitably influenced by external and subjective factors, resulting in high costs, low efficiency, and challenges in annotating micro-emotions. To better capture the various subtle nuances of human emotions in emotion detection, exploring machine or machine-assisted micro-emotion annotation datasets has become increasingly vital. We have sought to establish a simple yet effective framework for micro-emotion detection and annotation, termed the Expansion Quantization Network (EQN), which incorporates energy scores. Within the EQN framework, the automatic multi-label annotation assigns the highest energy values to macro-emotions and lower energy values to micro-emotions, thereby enhancing the model’s ability to understand and predict emotional nuances more effectively.Our EQN framework is adaptable to manually annotated single-label or multi-label emotion datasets.

Contributions of this paper:

Introduction of continuous emotional intensity: The EQN framework adds continuous energy values to samples based on manually annotated single-label or multi-label emotion datasets. By quantifying emotional intensity with continuous values, the framework distinguishes between macro-emotions and micro-emotions, addressing the subjectivity inherent in manual annotations.
Full label mapping numerical method: This method learns the interdependencies among data labels to annotate label values without the need for prior knowledge or emotional lexicons, thereby reducing the risk of data contamination.
Label regression method for training sets: By learning from the fully annotated training set, this approach regresses the labels that have already been manually annotated to a maximum value while retaining the values of automatically annotated labels. This method enhances the performance of training iterations.
Validation of the EQN framework’s generalization ability in NLP models: Comparative experiments conducted on five distinct single-label and multi-label emotion detection datasets using various NLP models demonstrate that the EQN framework is widely applicable across NLP models.
Supplementary annotation of the GoEmotionsmicro-emotion dataset and public release:GoEmotions is a 28-class emotion dataset, which presents significant challenges for emotion classification and micro-emotion detection. Utilizing the EQN framework, we first fully annotated the GoEmotions dataset and applied the proposed label regression method to supplement the micro-emotion annotations, which have now been publicly released.

Related work

Machine-assisted annotation of micro-emotion labels with energy values is particularly crucial in applications such as customer emotion management, psychological health analysis, and brand monitoring. It provides more nuanced emotional feedback and has garnered significant attention from scholars in the field of Natural Language Processing (NLP). The SemEval-2007 dataset [11], which includes effective value annotations, is a multi-label micro-emotion dataset based on manual annotation. Although it offers micro-emotion data for machine-assisted labeling, its size is relatively limited. Early research in emotion focused primarily on emotional polarity (such as positive, negative, and neutral), typically employing bag-of-words models or emotion dictionaries for classification [1]. Each emotional lexicon in these dictionaries is assigned a score to evaluate the accuracy of its emotional sentiment. S. Saifullah et al. [25] employed machine learning methods, utilizing data that underwent preprocessing through tagging, filtering, stemming, tokenization, and emoji conversion. By leveraging 24 combinations of ML and FE algorithms, they achieved optimal performance in anxiety emotion detection.The Chinese EmoBank [26] provides a manually annotated Chinese dimensional emotion lexicon, which includes various modal words to express emotional intensity. Additionally, research has proposed methods to generate word-level emotional distribution (WED) vectors by integrating domain knowledge with dimensional dictionaries [27]. The latest study by S. Saifullah et al. [28] employed semi-supervised learning techniques for automated annotation, achieving remarkable results in hate speech detection. In semi-supervised learning, their model learns from labeled data, which provides explicit information, while also extracting implicit knowledge from unlabeled data. This hybrid approach enables the model to generalize effectively and make informed predictions even when labeled data is limited. Moreover, this method enhances the model’s ability to handle real-world scenarios where annotated data is scarce.In 2024, the latest research by Wang Yaoqi [29] and colleagues attempted to introduce emotional distance among emotions, utilizing a text EDLE method that incorporates VAD emotional knowledge to enhance label accuracy based on emotion dictionaries.

Despite the presence of energy intensity scores in emotion dictionaries, these resources are primarily utilized for determining the categorical attributes of emotions and do not yet facilitate the automatic annotation of emotional energy intensity values. In multi-label learning (MLL) methods, the objective is to identify multiple emotions for each sentence [30]. This approach involves setting a threshold, whereby emotions scoring above this threshold are marked as relevant, while others are deemed irrelevant. However, MLL methods are ineffective in learning the intensity of each individual emotion.

To address this issue, Geng (2016) [31] proposed a novel machine learning paradigm known as label distribution learning (LDL). Subsequently, the emotional distribution learning (EDL) algorithm improved upon the label distribution framework [32]. However, these methods necessitate the design of complex textual features, which require substantial human resources. In 2024, EmoLLMs [33], based on large language models such as ChatGPT, employed instruction data to fine-tune various LLMs with the aim of predicting both the emotional category and intensity of the input text. EmoLLMs are capable of generating micro-emotion labels accompanied by numerical values. Although this approach has demonstrated promising results, the process of developing instruction tuning data remains intricate, with the resultant emotional classification and intensity largely contingent upon the cognitive capabilities of the large models.

In summary, scholars in the field of NLP have been dedicated to uncovering the emotional energy values embedded in text, employing a variety of distinctive methods. However, these approaches still exhibit limitations in practical applications, failing to achieve the automatic annotation of large-scale micro-emotion datasets. Our EQN framework is capable of automatically annotating labels with continuous values, enabling multi-label datasets to encompass both macro and micro emotional characteristics. The principles and methods underlying this framework are relatively straightforward, yet its applicability is broad.

Micro-emotion detection is one of the crucial downstream tasks for multi-label micro-emotion datasets. Traditional machine learning models typically classify text by converting it into word vectors and extracting features through feature engineering, with commonly employed methods including Naive Bayes [34], Support Vector Machines (SVM) [35], and Logistic Regression [36]. In contrast, deep learning models leverage neural networks to autonomously learn hierarchical feature representations from data, extracting rich and complex features from the raw input. This process often necessitates substantial amounts of training data. For instance, Convolutional Neural Networks (CNN [37]) and Recurrent Neural Networks (RNN) [38], including Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU), are effective in extracting textual features and performing classification. Currently, fine-tuning methods based on large language models, such as BERT and GPT, are widely applied to multi-label micro-emotion datasets, demonstrating favorable results.

In the section dedicated to validating the EQN framework, we conducted comparative experiments using five models, including Artificial Neural Networks (ANN), Deep Convolutional Networks (CNN), Recurrent Neural Networks (RNN), TextCNN, and BERT. In these experiments, all models employed default parameters without any specialized tuning, with the primary objective of assessing the usability and generalization capabilities of the EQN framework. During the process of annotating a large micro-emotion dataset using the EQN framework, we supplemented the GoEmotions micro-emotion dataset with additional annotations based on the BERT model, and we performed various evaluations of the annotation results, simultaneously comparing them with relevant literature.

The main components of this paper are as follows: The first and second sections present the introduction and related work; the third section outlines the fundamental structure and operational mechanisms of the EQN; the fourth section conducts comparative experiments on five sets of single-label and multi-label emotion datasets, examining the differences between using and not using the EQN framework to validate its generalization capabilities through traditional neural networks, deep learning, and large language models; the fifth section employs the complete EQN framework to experiment with the GoEmotions dataset, which encompasses 28 categories of emotions, and compares the results with evaluation metrics from relevant literature [13], thereby further validating the effectiveness of the EQN framework; finally, the paper concludes with a summary of findings.

Methods

This section provides a comprehensive overview of the EQN, including a flowchart depicting the overall process of the EQN framework, definitions of key terms, the operational steps of the EQN framework, the core structure, as well as the design of both the input and output components.

EQN framework

The process of using machine models to detect data samples generally comprises three components: data input, model learning and processing, and classification output. The EQN proposed in this paper primarily focuses on enhancing the input and output components, making it compatible with any machine learning model designed for NLP classification tasks. The structure of the EQN framework is illustrated in Fig 1.

Download:

Fig 1. Schematic diagram of the EQN structure.

https://doi.org/10.1371/journal.pone.0333930.g001

Terms involved in the EQN.

Full label: Refers to the data samples that are initialized or output with complete category labels, each accompanied by a real number representing emotional intensity. In this paper, the range of real values for full labels is specified to be between 0.0 and 10.0. For samples lacking corresponding predefined label attributes, the minimum emotional intensity is marked as 0.0, while the maximum emotional intensity is set at 10.0. After automatic annotation, thresholds can be adjusted according to actual conditions.

Full label initialization: This process maps the manually annotated single or multi-labels from the original dataset to values of 0.0 or 10.0 prior to the initial run of the training set. Labels that have been manually annotated are assigned the maximum value of 10.0, whereas unannotated labels receive the minimum value of 0.0.

Training set label regression: Based on the core framework of EQN, this step involves annotating the training set with full labels, assigning the maximum value of 10.0 to the labels that have been manually annotated, while other values remain unchanged. In other words, it replaces the 0.0 values used during the initialization of full labels with the micro-emotional values learned by the model.

Operational process of the EQN framework.

The EQN framework consists of a two-stage training pipeline for enhancing emotion classification through full-label regression. Its operation is straightforward and can be summarized as follows:

1. Data Preparation: Each training sample with a single emotion label is transformed into a multi-dimensional one-hot vector representing the full emotion label space (e.g., for 28 emotions in GoEmotions).
2. Stage 1 – Full Label Initialization & Model 1 Training:

A base classification model (Model 1) is trained on the initialized dataset using standard loss functions (e.g., MSE loss for regression).

Model 1 learns to map input texts to emotion label vectors.

3. Soft Label Generation:

Model 1 is used to predict probability distributions over the entire label space for each training sample.

These predicted probabilities serve as soft labels for all emotions not originally annotated (while preserving the original label as 1.0).

4. Stage 2 – Model 2 Training with Refined Labels:

A second model (Model 2) is trained on the soft-labeled dataset to learn a more robust and generalized representation of emotional features.

5. Inference:

Model 2 can be used to classify emotions of new texts or assign multi-dimensional emotional scores for downstream tasks.

Note: Steps 1–3 are referred to as the core EQN module (CoEQN), which is also used as a standalone component in Section 4 for ablation experiments.

A full Python-style pseudocode is provided in the Supplementary Material to ensure reproducibility.

Based on the aforementioned EQN framework model diagram, the primary distinction between the EQN framework and conventional text classification models lies in the input and output components. The following sections will focus on detailing the input and output components of the EQN framework.

EQN input component.

Typical text input: In traditional NLP tasks for single-label or multi-label text classification, the input text undergoes preprocessing and feature extraction (Tokenization, Embedding) before being converted into numerical features to serve as input. The input component generally comprises the text along with its corresponding labels. Labels are typically represented as either integers or one-hot encodings and possess the following characteristics:

Integer label representation: For the i-th sample, the input feature X_i corresponds to the label category Y_i, with N categories represented as 0, 1, 2,...,N − 1, where N denotes the total number of categories.
One-Hot encoding of labels: Each label is treated as an independent feature and represented as a binary encoding, with only two possible values: 0 and 1. Here, 0 indicates the absence of the emotional label in the sample, while 1 signifies the presence of that emotional label.

Input for the EQN framework: In the EQN framework, the processing of input text data aligns with traditional methods; however, the representation of labels differs significantly. In conventional approaches, whether using integer or one-hot encoding, labels merely indicate the presence or absence of a category. In contrast, the labels for input text in the EQN framework are annotated as full label values.

The input method for full label values has the following characteristics:

Full category label annotation: It employs full category labels, with each label assigned a continuous real number representing the intensity of the emotion.
Value range for labels: The numerical range corresponding to the labels can be defined according to task requirements. In this study, the values are set between 0.0 and 10.0, where 0.0 signifies the minimum emotional intensity for the label, and 10.0 denotes the maximum intensity.
Two input methods: The initialization of full category label values for samples and the label regression of the training set correspond to two distinct frameworks: CoEQN and EQN, respectively.

As shown in Fig 1, full-label numerical initialization mapping and label regression pertain to the input component. In the CoEQN framework, the full-label numerical initialization mapping assigns an initial value to each emotion label, using real numbers between 0 and 10 to represent the intensity of emotions. For each label in sample i, initialization mapping can be performed using the following formula.

(1)

Here, is a mapping function that assigns the value mapped from label of sample ito .

(2)

Label regression in the EQN framework involves using the CoEQN-trained model to annotate the training set, followed by performing label regression on the annotated training set. Let E represent the intensity value annotated by the model; the label regression formula is as follows:

(3)

EQN is a framework that takes full-label text input, processes it through model learning, and outputs the intensity value of each label via linear regression, providing full-category label intensity values for each predicted sample. By individually training a linear regression model for each label, the framework generates an intensity prediction for each label. This intensity value, a continuous measure (ranging from 0 to 10 in this study), reflects the relevance or association between the label and the current text. By setting an intensity threshold, the framework determines which labels are present, thereby enabling macro-emotion and micro-emotion annotation. Emotion classification is achieved by ranking the labels based on the annotated intensity values.

Examples of traditional input and EQN framework input: To clarify the differences between the full label method and the integer or one-hot encoding labeling approaches, the following comparative examples are presented.

Assuming the dataset X contains three samples—Sample 1, Sample 2, and Sample 3—with three predefined labels labeled as 0, 1, and 2. The manual annotation results indicate that Sample 1 has labels 0 and 1, Sample 2 has label 1, and Sample 3 has label 2. The input data for Samples 1, 2, and 3 can be represented using full label initialization, full label regression, integer encoding, and one-hot encoding, as shown in Table 1.

Download:

Table 1. Examples of full Label method and labels represented by integers or one-hot encoding.

https://doi.org/10.1371/journal.pone.0333930.t001

EQN output component.

The design of the output component in the EQN framework differs from that of traditional multi-label classification due to the distinct problems it addresses. In conventional multi-label classification, input text is assigned to predefined categories, with the initialized label values being discrete and the output resulting in discrete categories.

While the results produced by the EQN framework can be utilized for classification tasks, it primarily addresses regression tasks, yielding continuous values. This approach is akin to systems used for stock price prediction or real estate valuation, where the output is expressed as numerical values.

In the EQN framework designed to solve regression problems, the model’s output component first connects to a fully connected Dense layer. Each neuron in the Dense layer is linked to all neurons in the previous layer, with each connection assigned a weight that learns the relationships between different features, thereby preparing data for subsequent output.

The final layer of the network is a linear layer (using a linear activation function) that consists of C units, whereCrepresents the total number of categories. Corresponding to the full label input of samples in the EQN framework, the output section of the linear layer produces Cchannels, each outputting the intensity score of the sample for each label, thereby achieving full label output of emotional values.

The output of the EQN framework provides specific numerical values, which not only address regression problems but can also be utilized to solve sample classification issues using the annotated values. Each of the C channels in the linear layer outputs the intensity score for each label, with scores ranging from 0.0 to 10.0. Here, 0.0 indicates the absence of the corresponding emotional label, while 10.0 signifies a very strong presence of that emotional label. Values between 0.0 and 10.0 represent varying levels of emotional intensity.

By setting a threshold, multi-label classification can be performed (as demonstrated in the fourth part of this paper, where annotated data is used for classification, serving as one method to validate the annotation effectiveness of the EQN framework).

Assume the dataset contains samples. For each text with an n-dimensional vector, the feature vector is represented as . These features may include term frequency, TF-IDF, word embeddings, etc. With N labels, the model’s objective is to predict an intensity value for each label . The model outputs an energy level score for each label, forming an energy level prediction vector for the j-th label of the i-th sample, corresponding to the true value .

(4)

(5)

Here, indicates that the true value for the j-th label is present, while indicates that the true value for the j-th label is absent. The intensity prediction value for the j-th label is estimated via linear regression as follows:

(6)

Here, represents the weight vector corresponding to label j for text i, and is the bias term. When selecting the optimal model, the EQN framework employs the Mean Squared Error (MSE) as the loss function to measure the difference between predicted values and true label values. By minimizing the gap between predicted and true labels, the framework optimizes the weights and bias . The loss function is calculated as follows:

(7)

After the model has been trained, the final predicted value is selected as the output for the EQN framework, which provides full-category labels with emotional intensity. For emotion annotation, a threshold h is set based on the specific context. Labels with values below this threshold are considered to indicate the absence of that particular emotion, and their value is set to 0. The specific annotation formula is as follows:

(8)

EQN framework evaluation metrics.

Statistical classification in emotion detection is a crucial downstream task. Key evaluation metrics in dataset classification include Precision Recall and F1-score etc. For single-label classification in emotion datasets, the full-category output for each sample is processed using Max(), where the label with the highest energy level is selected as the predicted label, which is relatively straightforward. In multi-label classification, since each sample may have multiple labels, the computation of evaluation metrics becomes more complex.

For emotion classification, in the case of single-label classification, the full-category output for each sample is processed using Max(), and the label with the highest energy level is chosen as the predicted label. In multi-label classification, the number of labels for the sample and the threshold are used to select the predicted labels. Assuming sample i has k_i true labels, we select the top k_ilabels with the highest energy scores as the predicted labels. The predicted label set is as follows:

(9)

The following are the calculation formulas for the evaluation metrics of the EQN framework. These formulas compute the label matching ratio at the sample level, and the overall evaluation metric is obtained by averaging across all samples.

Precision measures how many of the predicted labels are correct. The precision for the i-th sample is calculated as:

(10)

Where is the true label set of the i-th sample, and is the label set predicted based on the energy scores. represents the intersection of the true label set and the predicted label set, i.e., the number of correctly predicted labels.

The overall multi-label precision of the EQN framework is the average of the precision values for all samples. The calculation formula is as follows:

(11)

Recall measures how many of the true labels are correctly predicted. The recall for the \(i \)-th sample is calculated as follows:

(12)

The overall multi-label recall of the EQN framework is the average of the recall values for all samples. The calculation formula is as follows:

(13)

The F1-score is the harmonic mean of precision and recall, balancing both metrics. The F1-score for the i-th sample is calculated as follows:

(14)

The overall F1-score of the EQN framework is the average of the F1-scores for all samples. The calculation formula is as follows:

(15)

The EQN framework comprises two processes: CoEQN and EQN, with the latter encompassing the entire workflow for the automatic annotation of emotional datasets and micro-emotion detection. CoEQN includes only step1–5 of the EQN framework.

To validate the applicability of the EQN framework, experiments were conducted using the same processes and parameter settings as conventional models. This approach enhances the comparability of the experimental results and also demonstrates the generalizability of the EQN framework.

Experiments

To validate the broad applicability of the EQN framework, we selected five commonly used datasets (four single-label and one multi-label), along with various algorithms and language models for comparative analysis. The experimental setup, including the equipment specifications, experimental rules, and model evaluation methods, will be detailed.

The results of the tests comparing “with” and “without” the CoEQN framework will be presented in different formats, including tables and Pearson correlation coefficient heatmaps. These visual representations will highlight the outcomes of the same model under identical conditions, thereby validating the usability of the EQN framework.

Datasets used for comparative experiments

7health dataset.

The 7health dataset [4] is a mental health emotion analysis dataset designed to reveal psychological health patterns through statements. This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. It amalgamates raw data from multiple sources, which have been cleaned and compiled to create a robust resource for developing chatbots and conducting emotion analysis.

The dataset comprises 51,074 entries, annotated with seven categories of emotions: anxiety, bipolar, depression, normal, personality disorder (PD), stress, and suicidal. The distribution of sample counts for each category is presented in Table 2.

Download:

Table 2. Distribution of sample counts in the 7health dataset.

https://doi.org/10.1371/journal.pone.0333930.t002

As indicated by the data in Table 2, the sample counts in this dataset are severely imbalanced. With the exception of the normal category, the other six categories represent negative emotions that exhibit significant similarity, making it a particularly challenging multi-label dataset.

6emotions dataset.

The 6emotions dataset [5] is an English corpus comprising six categories of emotions. Each entry in this dataset consists of a text segment representing a Twitter message, along with a corresponding label that indicates the predominant emotion conveyed. The emotions are classified into six categories: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (5). This dataset provides a rich foundation for exploring the nuanced emotional landscape within the realm of social media.

3TFN dataset.

The Twitter Financial News dataset(3TFN dataset) [6] is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their emotions.The dataset holds 11,932 documents annotated with 3 labels:Bearish, Bullish, Neutral。

3TSA dataset.

The Twitter Sentiment Analysis Dataset (3TSA dataset) [7] is a three-class dataset comprising approximately 163,000 tweets, each associated with sentiment labels. The dataset consists of two columns: the first column contains the cleaned tweets and comments, while the second column indicates the corresponding sentiment label.

All tweets have been cleaned using Python’s regular expressions and natural language processing techniques, with sentiment labels ranging from −1–1. A label of 0 indicates a neutral tweet, 1 denotes a positive sentiment, and −1 signifies a negative tweet.

GoEmotions dataset.

At the 2020 ACL conference, researchers from Google released the GoEmotionsdataset [12], the largest and most finely grained multi-label micro-emotion dataset to date, comprising 58,000 manually annotated Reddit comments. This dataset expands the emotion categories to 28, providing an opportunity to better uncover users’ latent emotions.

The dataset is divided into three parts: the training dataset contains 43,410 samples, the test dataset includes 5,427 samples, and the validation dataset comprises 5,426 samples. The emotion categories are as follows: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, and surprise.

Models and configurations

This study employs five models—ANN [39], CNN [40], LSTM [41], TextCNN [1], and BERT [42]—for comparative testing across the selected datasets.

ANN model.

Artificial Neural Networks (ANN) are computational models that mimic biological neural systems and are widely applied in machine learning and artificial intelligence. ANN serves as the foundation of modern deep learning, with many complex models (such as Convolutional Neural Networks and Recurrent Neural Networks) developed based on this fundamental structure. It primarily comprises an input layer, hidden layers, and an output layer. In this study, the parameters for the ANN model are set as follows: the number of neurons in the input layer is 512, the number of neurons in the hidden layer is 256, and the activation function is ReLU.

CNN model.

Convolutional Neural Networks (CNN) are extensively used in text classification due to their effectiveness in capturing local features and contextual information. A typical CNN architecture includes an input layer, convolutional layers, pooling layers, dense layers, and an output layer. For the CNN model utilized in this study, the parameters are set as follows: `max_features = 15000`, the number of input channels is 32, the number of convolutional filters is 7, and the activation function is ReLU.

LSTM model.

Long Short-Term Memory networks (LSTMs) are a specialized type of Recurrent Neural Network (RNN) capable of learning long-term dependencies. By incorporating a complex internal structure with multiple gating mechanisms, LSTMs effectively regulate the flow of information, allowing the network to retain long-term memories when necessary and discard irrelevant information when it is no longer needed. In this study, the LSTM module provided by TensorFlow is employed directly. The parameters for the LSTM model are set as follows: `input_dim = 5000`, `output_dim = 150`, `input_length = 150`, and the hidden layer size of the LSTM layer is 128.

TextCNNmodel.

TextCNN is a convolutional neural network model specifically designed for text classification, improving upon standard CNN architectures by modifying the convolutional layers. TextCNN employs convolutional layers with filters of varying sizes, where each filter is responsible for extracting specific n-gram features. Different-sized filters (e.g., 1-gram, 2-gram, 3-gram) capture contextual information of varying lengths. The parameters for the TextCNN model in this study include three 1D convolutional layers, each with filter sizes of 3, 4, and 5, and a channel size of 256.

BERT model.

BERT excels in emotion detectiondue to its robust contextual understanding and flexible training strategies, making it highly effective for emotion detection tasks [43]. It processes both left and right contexts in sentences, allowing the model to comprehend word meanings and contexts more accurately. This bidirectional processing is crucial for emotion detection, as words may carry different emotions based on contextual variations. We implement fine-tuning of the BERT-base-cased pre-trained model for text classification tasks in our study.

Experimental environment and rules

The experimental platform and key parameters for this study are as follows:

GPU: NVIDIA GeForce RTX 3090 GPU;
BERT model runtime environment:python=3.7, pytorch=1.9.0, cudatoolkit=11.3.1, cudnn=8.9.7.29;
Other modelsruntime environment: python=3.8, tensorflow==2.6.0, cudatoolkit=11.3.1, cudnn=8.2.1;
Parameter settings: The text length or sequence length is uniformly set to 150 for all models. For text input, except for BERT, TensorFlow’s Tokenizer and sequence representations are consistently utilized. The text input for BERT employs a summation of three types of embeddings (Token, Segment, Position) to generate the final input representation for each word.
Rules: To ensure consistency with the operational workflow of the five comparative models, the CoEQN framework is employed for validation in this section. For the same model, in the comparative experiments of “using” versus “not using” the CoEQN framework, the fundamental structure, parameter settings, training set, and test set samples remain unchanged. In the experiments where the CoEQN framework is utilized, only the input and output portions are modified accordingly, while the parameters follow those detailed in Section “Experiments”. The labels of the training set are mapped to full labels and initialized with energy level values, while the output for the test set generates full label energy scores. Classification is performed based on the full label energy scores of the test set using either MAX() (for single-label classification) or a predetermined threshold (for multi-label classification), with results compared to those obtained from conventional model methods on the same test set.

Results and discussion

Comparison of test results.

In this section, we present a detailed comparison of the results obtained from the different classification models—specifically focusing on single-label, multi-label, and EQN full label mapping (EQN) approaches. Based on the aforementioned rules, the testing results for five datasets and five models utilizing conventional single-label and multi-label classification methods, as well as the EQN full-label approach, are presented in Table 3.

Download:

Table 3. Testingresults of single-label, multi-label, and full-label mapping classification models.

https://doi.org/10.1371/journal.pone.0333930.t003

The results indicate that, from traditional neural network models like ANN to deep learning models such as CNN, RNN, TextCNN, and BERT, the basic structure, parameter settings, training datasets, and test datasets remain consistent between the “using” and “not using” CoEQN frameworks. This suggests that EQN is highly compatible with various models.

The full label mapping method employed by CoEQN has enhanced model performance, demonstrating particularly significant improvements in accuracy on datasets with a larger number of categories. Although the performance gains are less pronounced on datasets with fewer categories, the fundamental accuracy of the models has not been compromised. The full label numerical mapping method utilized within the EQN framework effectively capitalizes on the interdependencies among labels, providing continuous numerical annotations that enhance the model’s understanding of subtle features. This demonstrates the excellent generalization capabilities of the EQN framework, making it suitable for a wide range of NLP models.

Pearson correlation coefficient heatmap.

To further assess and demonstrate the quality of the automatic labeling achieved by the CoEQN framework, we calculated the Pearson correlation coefficients among the labels based on the distribution of the full label scores we annotated on the test set. According to Pearson’s theory, the Pearson correlation coefficient ranges from −1 to +1; the greater the absolute value of the coefficient, the higher the degree of correlation. A negative value indicates an inverse correlation, while a positive value signifies a direct correlation.

For clarity, we use the test results from the 7health dataset as an example. By employing the BERT-based CoEQN framework, we annotated the full label scores on the 7health test set, and based on these scores, we generated a Pearson correlation coefficient heatmap, as depicted in Fig 2.

Download:

Fig 2. Pearsoncorrelation heatmap for the 7health dataset.

https://doi.org/10.1371/journal.pone.0333930.g002

The Pearson correlation heatmap for the 7health dataset reveals that the personality disorder category exhibits low correlations with the other six health indicators. This aligns with the definition of personality disorders, which are characterized by persistent behavioral patterns stemming from genetic, congenital, and adverse environmental factors during individual development.

Furthermore, the correlation between depression and Normal is notably negative, with a coefficient of −0.43, followed by Suicidal and Normal at −0.38. This indicates that both depression and suicidal ideation significantly impact an individual’s health. Interestingly, the correlation between depression and suicidal ideation is merely −0.07, suggesting a minimal relationship, which may seem counterintuitive. Literature [44] suggests that suicidal thoughts are common in depression but are only moderately correlated with severe depression. Since the 7health dataset focuses on mental health rather than specific psychological disorders, the low correlation between depression and suicidal ideation is consistent with psychological understanding.

These evident correlations underscore the overall rationality of utilizing the EQN framework for emotional annotation, indirectly validating its practicality and applicability. It is hoped that these insights will provide a solid theoretical foundation for research conducted by mental health professionals and psychologists.

Application: Annotation of the GoEmotionsdatasetbased on the EQN

GoEmotions is a fine-grained, multi-label emotion dataset characterized by a substantial manual annotation workload and significant classification challenges, providing valuable support for emotion detection. However, the labeling may be insufficient. To further evaluate the EQN framework’s capability in capturing subtle emotions, we aim to supplement the annotation of the 28 categories within the GoEmotions dataset.

Utilizing the BERT model, we employ both the CoEQN framework and the EQN approach for automated annotation of the GoEmotions dataset. The analysis encompasses statistical evaluations of the annotation results, calculations of assessment metrics, and the generation of a Pearson correlation coefficient heatmap, which will be compared against the findings published in Google’s dataset literature [13].

GoEmotionsdata distribution

The distribution of sample labels in the training and testing sets of the GoEmotions dataset is illustrated in Fig 3. The training set comprises 43,410 samples across 28 categories, with the distribution of sample labels as follows: 36,308 samples are single-label, accounting for 84% of the total, while there are 28 samples with four labels and 1 sample with five labels. The l GoEmotions testing set contains 5,427 entries, with a maximum of 4 labels per entry. It includes 4,590 single-label samples, 774 samples with two labels, 61 samples with three labels, and 2 samples with four labels.

Download:

Fig 3. Distribution of sample label counts in the GoEmotionstraining and testing sets.

https://doi.org/10.1371/journal.pone.0333930.g003

CoEQN and EQN annotated datasets and comparative experiments

Section “Comparison of Test Results” details the detection experiments conducted on the GoEmotions dataset utilizing the CoEQN framework. This section presents the results of the comprehensive experiments based on the EQN framework, which also employs the basic BERT model with standard parameter settings, focusing on the automatic supplementary annotation of the GoEmotions dataset (both training and testing sets).

Initialization of the training set and label regression of the training set.

Based on the CoEQN experiment, the full-label initialization method was employed to map the original GoEmotions training set, generating an initialized training set. Subsequently, the optimal BERT model was identified through learning to annotate both the GoEmotions training set and test set.

In the EQN experiment, building upon the CoEQN findings, the full-label regression method was applied to regress the manually annotated labels of the GoEmotions training set. After this regression, the test set was automatically annotated, and the results were analyzed and statistically presented.

The label regression training set method follows the principle of prioritizing manual annotations, where the selected model from CoEQN annotates the GoEmotions training set. The original manually annotated label values are restored to 10, while other learned micro-emotion intensity values remain unchanged. The segments for the initialized training set and the label regression training set are illustrated in Figs 4 and 5.

Download:

Fig 4. Segments of the initialized training set in the CoEQNframework.

https://doi.org/10.1371/journal.pone.0333930.g004

Download:

Fig 5. Segments of the label regression training set in the EQN framework.

https://doi.org/10.1371/journal.pone.0333930.g005

Annotation of the GoEmotionstest set and comparative analysis.

In Section “Comparison of Test Results”, the GoEmotions test set was annotated based on the CoEQN framework, as depicted in Fig 6. Under the same BERT model and parameter settings, the GoEmotions test set was automatically annotated using the EQN framework and the label regression training set. The annotation segments for the GoEmotions test set based on CoEQN are illustrated in Fig 6, while the segments based on EQN are shown in Fig 7.

Download:

Fig 6. Segments of the GoEmotions test set annotated automatically using the CoEQN framework.

https://doi.org/10.1371/journal.pone.0333930.g006

Download:

Fig 7. Segments of the GoEmotions test set annotated automatically using the EQN framework.

https://doi.org/10.1371/journal.pone.0333930.g007

Based on the two annotated GoEmotions test sets mentioned above, the statistical analysis and comparison of results from the experiments are presented in Table 4.

Download:

Table 4. Comparison of annotation results for the GoEmotionstestset using CoEQN and EQN.

https://doi.org/10.1371/journal.pone.0333930.t004

The experimental results indicate that the EQN model, which is based on full-label regression for the GoEmotions training set, demonstrates a significant enhancement in recognition performance. This finding substantiates that the full-label numerical method employed in this study allows the model to effectively learn more nuanced textual features, thereby improving the quality of the manually annotated dataset.

To further observe the framework’s ability to capture subtle emotions, we use the prediction results from the test set based on Table 4 as an example. For simplicity, we only compare the single-label samples, which make up the largest proportion of the test set, totaling 4,590 samples. The top three predicted intensity values, TOP1, TOP2, and TOP3, are extracted from the predicted energy values across the 28 categories for each sample. The corresponding prediction hit counts are presented in Table 5.

Download:

Table 5. Comparison of expanded experimental results.

https://doi.org/10.1371/journal.pone.0333930.t005

The statistical results presented in Table 5 indicate that employing the full-label method on originally single-labeled instances and performing label regression for dual-label annotation resulted in a 17% increase in hit rate. Furthermore, if the dataset is augmented to include three labels, the hit rate improves by 25%. This outcome suggests that a single manually labeled instance may correspond to two, three, or even more related or similar emotions. It also demonstrates that our framework possesses a commendable ability to uncover subtle emotional nuances in text. Further exploration in this area is left for the reader to contemplate and discover.

Pearson coefficient heat map comparison.

To visually represent the results of the test set data annotated using the CoEQN and EQN frameworks, Pearson correlation coefficient heatmaps were generated based on the annotated intensity scores. These heatmaps, depicted in Figs 8 and 9, illustrate the degree of correlation among the various emotion labels. The data presented in the heatmaps effectively convey the relationships between the labels, providing insights into how different emotions are interconnected within the dataset.

Download:

Fig 8. Pearson correlation coefficient heatmap for the test set annotated using CoEQNframework.

https://doi.org/10.1371/journal.pone.0333930.g008

Download:

Fig 9. Pearson correlation coefficient heatmap for the test set annotated using EQN framework.

https://doi.org/10.1371/journal.pone.0333930.g009

A comparison of the two figures reveals that the Pearson correlation coefficients for the test set annotated using the EQN framework have largely increased, indicating a more pronounced correlation among the emotions. The interrelatedness of the 28 emotion categories reflects the nuances between them. Consequently, the adoption of the EQN framework provides more specific insights into the study of human emotional expression.

Comparison of CoEQN and EQN annotation results with published results

In the development and application of machine learning models, evaluating the model’s performance is a crucial step. Key indicators for assessing a model include Precision(prec), Recall, F1 score, macro-average, and standard deviation. When Google researchers released the GoEmotions dataset, they published relevant literature [13], which classified predictions on the GoEmotions test set based on the BERT model and provided statistical analysis and evaluation of the results.

To further evaluate the EQN framework, we similarly employed the BERT model and utilized the CoEQN and EQN frameworks described earlier to automatically annotate the energy scores for the 28 category labels of the GoEmotions test set. We computed the evaluation metrics for EQN based on the annotation results and conducted a comprehensive comparison with the results from the literature, thoroughly assessing the efficacy of the EQN framework.

Both experiments based on the EQN framework and the findings in literature [13] utilized the basic BERT model and the GoEmotions dataset. However, the literature does not provide detailed information regarding the basic parameter settings for BERT, which can be found in Section 2.2 of our study. The computed results for various evaluation metrics are compared with those presented in the literature and are summarized in Table 6.

Download:

Table 6. Comparison of test results for CoEQN and EQN with literature [13].

https://doi.org/10.1371/journal.pone.0333930.t006

The comparison table above reveals that, in contrast to the literature, the test results based on the EQN frameworks demonstrate varying degrees of improvement in precision and F1 scores across each label category. The F1-score serves as a composite metric suitable for evaluating overall model performance, especially in the context of imbalanced class distributions.Our method improves the F1-scoreof 21 out of 28 categories.The macro-average of F1 score for the EQN experiment is 0.52, which is greater than the 0.50 achieved by the CoEQN experiment and also exceeds the literature score of 0.46. The Precision for the EQN framework stands at 0.56, which exceeds the 0.52 Precision achieved with the CoEQN framework, representing a 4% increase. Additionally, it surpasses the literature’s Precision of 0.4 by 16%..Moreover, the standard deviations for Precision, Recall, and F1-score across the two EQN framework experiments are lower than those reported in the literature, indicating that the results from the EQN frameworks are more stable. Our two models perform moderately in terms of Recall scores, with macro-average Recall of 0.49 and 0.51, which are lower than the literature.Overall, the result shows that both the CoEQN and EQN frameworks enhance the F1 and Precision, with the EQN framework yielding particularly stable and reliable results.

Conclusion and outlook

This paper presents the extended quantitative EQN framework, designed to automate the annotation of micro-emotional datasets with emotional energy intensity scores. The framework is both straightforward and effective, addressing several critical challenges, including the high costs of manual annotation, subjective biases, significant label imbalance, and limitations in annotating quantitative data. Furthermore, it enhances micro-emotional labeling within artificial datasets.

The EQN framework leverages the relationships among labels and employs a comprehensive numerical mapping method, which enables it to better capture the intricate complexities of the data and the interdependencies between labels. By effectively identifying unlabeled micro-emotional features within the original data, the framework significantly enhances the performance of automatic annotation and detection.

A series of extensive experiments utilizing multiple natural language processing (NLP) models were conducted to validate the framework’s usability and generalizability. By employing the label regression training set, the EQN framework automatically annotated the GoEmotions training and testing sets with multi-label micro-emotions, complete with energy level scores, thereby enriching the GoEmotions dataset.

The continuous numerical representation of emotional energy levels provided by the automatic annotation process is particularly advantageous for quantitative emotional research. This innovation is expected to contribute positively to fields such as psychology and emotional computing, facilitating a more nuanced understanding of human emotions by machines.

Although this framework is based on micro-emotion labeling and detection, it is also applicable to text-related classification tasks.

Supporting information

S1 File. Pseudocode for the Python Implementation of the EQN Framework (Using the BERT Model as an Example).

https://doi.org/10.1371/journal.pone.0333930.s001

(PDF)

Acknowledgments

This research is partially supported by the 242 National InformationSecurity Projects, PR China under Grant 2020A065.

References

1. Wang H, He J, Zhang X, Liu S. A short text classification method based on N ‐gram and CNN. Chin J Electron. 2020;29(2):248–54.
- View Article
- Google Scholar
2. Ma Y, Nguyen KL, Xing FZ, Cambria E. A survey on empathetic dialogue systems. Inform Fusion. 2020;64:50–70.
- View Article
- Google Scholar
3. Zhang T, Yang K, Ji S, Ananiadou S. Emotion fusion for mental illness detection from social media: a survey. Inform Fusion. 2023;92:231–46.
- View Article
- Google Scholar
4. Dataset: 7health-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
5. Dataset: 6emotions-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/nelgiriyewithana/emotions?select=text.csv
6. Dataset: 3TFN-dataset. Available from: https://www.kaggle.com/datasets/borhanitrash/twitter-financial-news-sentiment-dataset
7. Dataset: TSA-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/cosmos98/twitter-and-reddit-sentimental-analysis-dataset
8. Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N Proj Rep. 2009;1(12):2009.
- View Article
- Google Scholar
9. Ekman P. An argument for basic emotions. Cogn Emot. 1992;6(3–4):169–200.
- View Article
- Google Scholar
10. Plutchik R. A general psychoevolutionary theory of emotion. In: Theories of emotion. Elsevier; 1980. p. 3–33.
11. Strapparava C, Mihalcea R. SemEval-2007 task 14: affective text. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Prague, Czech Republic: Association for Computational Linguistics; 2007. p. 70–4.
- View Article
- Google Scholar
12. GoEmotions. Available from: https://github.com/google-research/google-research/tree/master/goemotions
13. Demszky D, et al. GoEmotions: a dataset of fine-grained emotions. arXiv:2005.00547 [Preprint]. 2020.
- View Article
- Google Scholar
14. Zhang Y, Ma Y. Sparse multi-label feature selection via dynamic graph manifold regularization. Int J Mach Learn Cyber. 2022;14(3):1021–36.
- View Article
- Google Scholar
15. Wu S, Gao Y, Yang W, Li H, Zhu G. End-to-end video captioning based on multiview semantic alignment for human–machine fusion. IEEE Trans Automat Sci Eng. 2025;22:4682–90.
- View Article
- Google Scholar
16. Tong Y, Liu H, Zhang Z. Advancements in humanoid robots: a comprehensive review and future prospects. IEEE/CAA J Autom Sin. 2024;11(2):301–28.
- View Article
- Google Scholar
17. Amin MM, Mao R, Cambria E, Schuller BW. A Wide evaluation of ChatGPT on affective computing tasks. IEEE Trans Affect Comput. 2024;15(4):2204–12.
- View Article
- Google Scholar
18. Yadav R, Priyanka , Kacker P. AutoMEDSys: automatic facial micro-expression detection system using random fourier features based neural network. Int J Inf Tecnol. 2023;16(2):1073–86.
- View Article
- Google Scholar
19. Takalkar M, Xu M, Wu Q, Chaczko Z. A survey: facial micro-expression recognition. Multimed Tools Appl. 2017;77(15):19301–25.
- View Article
- Google Scholar
20. Talaat FM, Ali ZH, Mostafa RR, El-Rashidy N. Real-time facial emotion recognition model based on kernel autoencoder and convolutional neural network for autism children. Soft Comput. 2024;28(9–10):6695–708.
- View Article
- Google Scholar
21. Zhang Y, Zhai C. Affective computing: challenges and opportunities. IEEE Trans Affect Comput. 2018;9(4):397–409.
- View Article
- Google Scholar
22. Singh S. Emotion recognition for mental health prediction using AI techniques: an overview. IJARCS. 2023;14(03):87–107.
- View Article
- Google Scholar
23. Jayakodi J, Jayamali G, Hirshan R, et al. Creating a Sri Lankan micro-emotion dataset for a robust micro-expression recognition system. 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), vol. 5. IEEE; 2022. p. 102–7.
- View Article
- Google Scholar
24. Weismayer C, Pezenka I. Cross-cultural differences in emotional response to destination commercials. ENTER e-Tourism Conference. Cham: Springer Nature Switzerland; 2024. p. 43–54.
- View Article
- Google Scholar
25. Saifullah S, Dreżewski R, Dwiyanto FA, Aribowo AS, Fauziah Y. Sentiment Analysis using machine learning approach based on feature extraction for anxiety detection. In: Mikyška J, de Mulatier C, Paszynski M, Krzhizhanovskaya VV, Dongarra JJ, Sloot PM, editors. Computational science – ICCS 2023. ICCS 2023. Lecture notes in computer science, vol 14074; 2023. Cham: Springer.
- View Article
- Google Scholar
26. Lee L-H, Li J-H, Yu L-C. Chinese EmoBank: building valence-arousal resources for dimensional sentiment analysis. ACM Trans Asian Low-Resour Lang Inf Process. 2022;21(4):1–18.
- View Article
- Google Scholar
27. Li Z, Xie H, Cheng G, Li Q. Word-level emotion distribution with two schemas for short text emotion classification. Knowl-Based Syst. 2021;227:107163.
- View Article
- Google Scholar
28. Saifullah S, Dreżewski R, Dwiyanto FA, Aribowo AS, Fauziah Y, Cahyana NH. Automated text annotation using a semi-supervised approach with meta vectorizer and machine learning algorithms for hate speech detection. Appl Sci. 2024;14(3):1078.
- View Article
- Google Scholar
29. Wang Y, Wan Z, Zeng X, Zuo J. Valence-arousal-dominance emotion knowledge-based text emotion distribution label enhancement method. J Tsinghua Univ (Sci Technol). 2024;64(5):789–800.
- View Article
- Google Scholar
30. Zhang M-L, Zhou Z-H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26(8):1819–37.
- View Article
- Google Scholar
31. Geng X. Label distribution learning. IEEE Trans Knowl Data Eng. 2016;28(7):1734–48.
- View Article
- Google Scholar
32. Zhou D, Zhang X, Zhou Y, Zhao Q, Geng X. Emotion distribution learning from texts. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016.
- View Article
- Google Scholar
33. Liu Z, Yang K, Xie Q, Zhang T, Ananiadou S. EmoLLMs: a series of emotional large language models and annotation tools for comprehensive affective analysis. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 5487–96.
- View Article
- Google Scholar
34. Jefriyanto J, Ainun N, Ardha MAA. Application of naïve bayes classification to analyze performance using stopwords. JISTE. 2023;1(2):49–53.
- View Article
- Google Scholar
35. Wang H, Li G, Wang Z. Fast SVM classifier for large-scale classification problems. Inf Sci. 2023;642:119136.
- View Article
- Google Scholar
36. Geng Y, Li Q, Yang G, Qiu W. Logistic regression. In: Practical machine learning illustrated with KNIME. Springer Nature Singapore; 2024. p. 99–132.
- View Article
- Google Scholar
37. Emanuel RHK, Docherty PD, Lunt H, Möller K. The effect of activation functions on accuracy, convergence speed, and misclassification confidence in CNN text classification: a comprehensive exploration. J Supercomput. 2023;80(1):292–312.
- View Article
- Google Scholar
38. Al-Qerem A, et al. Utilizing deep learning models (RNN, LSTM, CNN-LSTM, and Bi-LSTM) for Arabic text classification. In: Artificial intelligence-augmented digital twins: Transforming industrial operations for innovation and sustainability. Cham: Springer Nature Switzerland; 2024. p. 287–301.
39. Moraes R, Valiati JF, Gavião Neto WP. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl. 2013;40(2):621–33.
- View Article
- Google Scholar
40. Xing L, Qiao Y. DeepWriter: a multi-stream deep CNN for text-independent writer identification. IEEE. 2017.
- View Article
- Google Scholar
41. Tawong K, Pholsukkarn P, Noawaroongroj P, Siriborvornratanakul T. Economic news using LSTM and GRU models for text summarization in deep learning. J Data Inf Manag. 2024;6(1):29–39.
- View Article
- Google Scholar
42. Devlin J. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [Preprint]. 2018.
- View Article
- Google Scholar
43. Kodiyala VS, Mercer RE. Emotion recognition and sentiment classification using BERT with data augmentation and emotion lexicon enrichment. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA); Pasadena, CA, USA; 2021. p. 191–8.
- View Article
- Google Scholar
44. Keilp JG, Grunebaum MF, Gorlyn M, LeBlanc S, Burke AK, Galfalvy H, et al. Suicidal ideation and the subjective aspects of depression. J Affect Disord. 2012;140(1):75–81. pmid:22406338
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Wang H, He J, Zhang X, Liu S. A short text classification method based on N ‐gram and CNN. Chin J Electron. 2020;29(2):248–54.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Ma Y, Nguyen KL, Xing FZ, Cambria E. A survey on empathetic dialogue systems. Inform Fusion. 2020;64:50–70.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Zhang T, Yang K, Ji S, Ananiadou S. Emotion fusion for mental illness detection from social media: a survey. Inform Fusion. 2023;92:231–46.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Dataset: 7health-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health

[ref5] 5. Dataset: 6emotions-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/nelgiriyewithana/emotions?select=text.csv

[ref6] 6. Dataset: 3TFN-dataset. Available from: https://www.kaggle.com/datasets/borhanitrash/twitter-financial-news-sentiment-dataset

[ref7] 7. Dataset: TSA-dataset [Internet]. Available from: from:https://www.kaggle.com/datasets/cosmos98/twitter-and-reddit-sentimental-analysis-dataset

[ref8] 8. Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N Proj Rep. 2009;1(12):2009.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref9] 9. Ekman P. An argument for basic emotions. Cogn Emot. 1992;6(3–4):169–200.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref10] 10. Plutchik R. A general psychoevolutionary theory of emotion. In: Theories of emotion. Elsevier; 1980. p. 3–33.

[ref11] 11. Strapparava C, Mihalcea R. SemEval-2007 task 14: affective text. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Prague, Czech Republic: Association for Computational Linguistics; 2007. p. 70–4.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref12] 12. GoEmotions. Available from: https://github.com/google-research/google-research/tree/master/goemotions

[ref13] 13. Demszky D, et al. GoEmotions: a dataset of fine-grained emotions. arXiv:2005.00547 [Preprint]. 2020.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref14] 14. Zhang Y, Ma Y. Sparse multi-label feature selection via dynamic graph manifold regularization. Int J Mach Learn Cyber. 2022;14(3):1021–36.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref15] 15. Wu S, Gao Y, Yang W, Li H, Zhu G. End-to-end video captioning based on multiview semantic alignment for human–machine fusion. IEEE Trans Automat Sci Eng. 2025;22:4682–90.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref16] 16. Tong Y, Liu H, Zhang Z. Advancements in humanoid robots: a comprehensive review and future prospects. IEEE/CAA J Autom Sin. 2024;11(2):301–28.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref17] 17. Amin MM, Mao R, Cambria E, Schuller BW. A Wide evaluation of ChatGPT on affective computing tasks. IEEE Trans Affect Comput. 2024;15(4):2204–12.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref18] 18. Yadav R, Priyanka , Kacker P. AutoMEDSys: automatic facial micro-expression detection system using random fourier features based neural network. Int J Inf Tecnol. 2023;16(2):1073–86.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref19] 19. Takalkar M, Xu M, Wu Q, Chaczko Z. A survey: facial micro-expression recognition. Multimed Tools Appl. 2017;77(15):19301–25.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref20] 20. Talaat FM, Ali ZH, Mostafa RR, El-Rashidy N. Real-time facial emotion recognition model based on kernel autoencoder and convolutional neural network for autism children. Soft Comput. 2024;28(9–10):6695–708.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref21] 21. Zhang Y, Zhai C. Affective computing: challenges and opportunities. IEEE Trans Affect Comput. 2018;9(4):397–409.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref22] 22. Singh S. Emotion recognition for mental health prediction using AI techniques: an overview. IJARCS. 2023;14(03):87–107.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref23] 23. Jayakodi J, Jayamali G, Hirshan R, et al. Creating a Sri Lankan micro-emotion dataset for a robust micro-expression recognition system. 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), vol. 5. IEEE; 2022. p. 102–7.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref24] 24. Weismayer C, Pezenka I. Cross-cultural differences in emotional response to destination commercials. ENTER e-Tourism Conference. Cham: Springer Nature Switzerland; 2024. p. 43–54.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref25] 25. Saifullah S, Dreżewski R, Dwiyanto FA, Aribowo AS, Fauziah Y. Sentiment Analysis using machine learning approach based on feature extraction for anxiety detection. In: Mikyška J, de Mulatier C, Paszynski M, Krzhizhanovskaya VV, Dongarra JJ, Sloot PM, editors. Computational science – ICCS 2023. ICCS 2023. Lecture notes in computer science, vol 14074; 2023. Cham: Springer.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref26] 26. Lee L-H, Li J-H, Yu L-C. Chinese EmoBank: building valence-arousal resources for dimensional sentiment analysis. ACM Trans Asian Low-Resour Lang Inf Process. 2022;21(4):1–18.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref27] 27. Li Z, Xie H, Cheng G, Li Q. Word-level emotion distribution with two schemas for short text emotion classification. Knowl-Based Syst. 2021;227:107163.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref28] 28. Saifullah S, Dreżewski R, Dwiyanto FA, Aribowo AS, Fauziah Y, Cahyana NH. Automated text annotation using a semi-supervised approach with meta vectorizer and machine learning algorithms for hate speech detection. Appl Sci. 2024;14(3):1078.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref29] 29. Wang Y, Wan Z, Zeng X, Zuo J. Valence-arousal-dominance emotion knowledge-based text emotion distribution label enhancement method. J Tsinghua Univ (Sci Technol). 2024;64(5):789–800.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref30] 30. Zhang M-L, Zhou Z-H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26(8):1819–37.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref31] 31. Geng X. Label distribution learning. IEEE Trans Knowl Data Eng. 2016;28(7):1734–48.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref32] 32. Zhou D, Zhang X, Zhou Y, Zhao Q, Geng X. Emotion distribution learning from texts. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref33] 33. Liu Z, Yang K, Xie Q, Zhang T, Ananiadou S. EmoLLMs: a series of emotional large language models and annotation tools for comprehensive affective analysis. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024. p. 5487–96.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref34] 34. Jefriyanto J, Ainun N, Ardha MAA. Application of naïve bayes classification to analyze performance using stopwords. JISTE. 2023;1(2):49–53.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref35] 35. Wang H, Li G, Wang Z. Fast SVM classifier for large-scale classification problems. Inf Sci. 2023;642:119136.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref36] 36. Geng Y, Li Q, Yang G, Qiu W. Logistic regression. In: Practical machine learning illustrated with KNIME. Springer Nature Singapore; 2024. p. 99–132.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref37] 37. Emanuel RHK, Docherty PD, Lunt H, Möller K. The effect of activation functions on accuracy, convergence speed, and misclassification confidence in CNN text classification: a comprehensive exploration. J Supercomput. 2023;80(1):292–312.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref38] 38. Al-Qerem A, et al. Utilizing deep learning models (RNN, LSTM, CNN-LSTM, and Bi-LSTM) for Arabic text classification. In: Artificial intelligence-augmented digital twins: Transforming industrial operations for innovation and sustainability. Cham: Springer Nature Switzerland; 2024. p. 287–301.

[ref39] 39. Moraes R, Valiati JF, Gavião Neto WP. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl. 2013;40(2):621–33.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref40] 40. Xing L, Qiao Y. DeepWriter: a multi-stream deep CNN for text-independent writer identification. IEEE. 2017.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref41] 41. Tawong K, Pholsukkarn P, Noawaroongroj P, Siriborvornratanakul T. Economic news using LSTM and GRU models for text summarization in deep learning. J Data Inf Manag. 2024;6(1):29–39.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref42] 42. Devlin J. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [Preprint]. 2018.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref43] 43. Kodiyala VS, Mercer RE. Emotion recognition and sentiment classification using BERT with data augmentation and emotion lexicon enrichment. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA); Pasadena, CA, USA; 2021. p. 191–8.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref44] 44. Keilp JG, Grunebaum MF, Gorlyn M, LeBlanc S, Burke AK, Galfalvy H, et al. Suicidal ideation and the subjective aspects of depression. J Affect Disord. 2012;140(1):75–81. pmid:22406338
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

Figures

Abstract

Introduction

Related work

Methods

EQN framework

Terms involved in the EQN.

Operational process of the EQN framework.

EQN input component.

EQN output component.

EQN framework evaluation metrics.

Experiments

Datasets used for comparative experiments

7health dataset.

6emotions dataset.

3TFN dataset.

3TSA dataset.

GoEmotions dataset.

Models and configurations

ANN model.

CNN model.

LSTM model.

TextCNNmodel.

BERT model.

Experimental environment and rules

Results and discussion

Comparison of test results.

Pearson correlation coefficient heatmap.

Application: Annotation of the GoEmotionsdatasetbased on the EQN

GoEmotionsdata distribution

CoEQN and EQN annotated datasets and comparative experiments

Initialization of the training set and label regression of the training set.

Annotation of the GoEmotionstest set and comparative analysis.

Pearson coefficient heat map comparison.

Comparison of CoEQN and EQN annotation results with published results

Conclusion and outlook

Supporting information

S1 File. Pseudocode for the Python Implementation of the EQN Framework (Using the BERT Model as an Example).

Acknowledgments

References