Figures
Abstract
The convergence of Metaverse technologies, Internet of Things (IoT), and consumer electronics has given rise to an imperative need for scalable, real-time sentiment analysis that can process heterogeneous, high-velocity media flows. The traditional approaches tend to fail in preserving the contextual, emotional, and temporal dynamism that pervades cross-platform settings. For these shortcomings, this work proposes a deep learning-based framework for sentiment analysis that integrates IoT-enabled consumer devices and Metaverse media interactions seamlessly. The overall BG-Hybrid model, fundamentally, blends BERT-led bidirectional encoding and GPT-based generative modeling to attain subtle emotion detection and context-aware comprehending. The five interconnected modules constituting the architecture include (i) multi-source data collection using RESTful APIs; (ii) weighted preprocessing pipelines using tokenization, lemmatization, and normalization; (iii) Adam algorithm-optimized model training and cross-entropy loss minimization-based training; (iv) adaptive real-time processing using dynamic window segmentation; and (v) an ongoing refinement loop using continuous user inputs, triggered by a feedback mechanism. Predictive thresholding is employed to manage temporal sentiment variations, and anomaly detection ensures data trustworthiness. Experimental analyses on Twitter Sentiment140 and Amazon Reviews datasets validate the effectiveness of the system, obtaining 94.5% accuracy, 91.5% F1-score, an average response latency of 250 ms, and proved scalability exceeding 91.5%.
Citation: Wang H, Wang S, Lu Y, Ivanovich Vatin N, Huang J (2025) Enhanced audience sentiment analysis in IoT-integrated metaverse media communication. PLoS One 20(10): e0332106. https://doi.org/10.1371/journal.pone.0332106
Editor: Hung Thanh Bui, Industrial University of Ho Chi Minh City, VIET NAM
Received: May 13, 2025; Accepted: August 21, 2025; Published: October 30, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The dataset of the Twitter Sentiment140 has been uploaded in the supporting information. The Amazon Customer Reviews dataset is publicly available via the AWS Registry of Open Data at: (https://registry.opendata.aws/amazon-reviews/). Amazon Customer Reviews can also be found via the reference 46 and we have properly cited in the manuscript.
Funding: This research was funded by the 2024 Special Research Project on People-To-People Exchange of the Center for International People-To-People Exchange and the Institute of Research and Practice of International People-To-People Exchange of the Ministry of Education (CCIPE-YXSJ-20240060 to S.W.), the Ministry of Education’s Employment Education Project for Supply-Demand Matching in 2025 (2024120908493 to S.W.), the 2022 Annual Research Project: “Online Open Courses Promoting Ideological and Political Education in Courses – Practice and Reflection on Red Classics’ Education through Dance” (2022ZXKC361 to S.W.), the Natural Science Foundation of Guangdong Province, China (Grant No. 2024A1515011162 to J.H.), the Natural Science Foundation of Shandong Province, China (Grant No. ZR2024QE021 to J.H.), and the Ministry of Science and Higher Education of the Russian Federation within the framework of the state assignment No. 075-03-2022-010 dated 14 January 2022 and No. 075–01568-23-04 dated 28 March 2023 (Additional agreement 075-03-2022-010/10 dated 9 November 2022, Additional agreement 075-03-2023-004/4 dated 22 May 2023 to N.V.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The integration of Metaverse technologies, consumer electronics, and the Internet of Things (IoT) is reshaping human–machine interactions [1]. Persistent connectivity through smart devices-such as wearables, home automation systems, and augmented reality platforms-enables synchronization between virtual and physical environments [2]. To support this interoperability, systems must optimize data throughput, energy efficiency, and communication latency [3,4]. However, achieving real-time sentiment analysis in these heterogeneous, dynamic ecosystems remains a challenge due to evolving linguistic patterns, limited cross-device compatibility, and asymmetric computational resources [5,6].
Current sentiment analysis frameworks often fall short when addressing multimodal, multilingual, and temporally dynamic data streams common in IoT-enabled Metaverse applications [7,8]. Challenges such as cultural idioms, regional dialects, and rapidly shifting online expressions further complicate the accurate interpretation of user emotions [9,10]. This paper introduces a high-fidelity, real-time sentiment analysis framework designed to adapt dynamically to diverse media data. The architecture embodies contextual, emotional, and temporal subtleties while preserving computational efficacy and scalability to achieve deployment on diverse IoT devices.
The motivations behind this research come from an increasing need to develop scaleable, reliable, and contextually-aware tools to perform sentiment analysis on new Metaverse-led platforms. Existing methods take minimal account of the need to achieve cross-platform interoperability, multilingual processing, and adaptive response routines essential to accurate sentiment detection. To address these gaps, this work offers a resilient, deep learning-based framework that facilitates precise sentiment analysis within real-time settings and that is capable of running on very different consumer end-points.
Center to the system proposal stands a new BG-Hybrid deep learning framework that encapsulates BERT-based bidirectionally contextual encoding and GPT-directed generative functionality. The architecture comprises five interrelated modules (Fig 1): (i) data gathering through RESTful API scraping to achieve multi-platform aggregation; (ii) preprocessing by performing weighted normalization, tokenization, lemmatization, and multilingual translation; (iii) training using a cross-entropy loss function optimized by the Adam optimizer with adaptive learning rates; (iv) real-time sentiment analysis using dynamic window segmentation and anomaly detection; and (v) refinement of feedback using entropy-directed parameter adjustment to achieve user-aligned prediction. This architecture allows the system to efficiently process divergent data streams, dynamically adjust to temporal changes, and repeatedly improve its performance to work within IoT-integrated Metaverse settings. The main contributions of the proposed approach can be summarized as:
- Hybridized deep learning model combining BERT’s bidirectional contextual encoding and GPT’s generative sentiment understanding, tailored for real-time, fine-grained emotion analysis in heterogeneous IoT-Metaverse environments.
- Real-time adaptive feedback mechanism that incorporates user corrections and evolving linguistic trends to iteratively refine model parameters through entropy minimization strategies.
- Temporal segmentation strategy dynamically adapting to data stream velocities and contextual variations, enhancing real-time sentiment detection accuracy under fluctuating media traffic conditions.
The remainder of this article is organized as follows. Background & Related Work Analysis discusses the related existing approaches. Materials and methods details the Proposed Methodology, describing the design and implementation of the novel deep learning-based sentiment analysis system. Discussion comprises a discussion and outcome. The article concludes with Conclusion section.
Background and related work analysis
Sentiment analysis involves identifying opinions within textual data and classifying them as positive, negative, or neutral [11]. In media communication, it is a critical tool for understanding public opinion, analyzing audience reactions, and guiding content strategies [12]. Earlier methods predominantly used keyword-based extraction techniques, which often failed to capture subtle sentiment variations, particularly in cases of sarcasm, ambiguity, or context-dependent language [13].
Recent research has advanced sentiment analysis methodologies to address these limitations, particularly in media-rich environments. Rodríguez-Ibáñez et al. reviewed sentiment analysis applications across major social media platforms, highlighting methodological innovations and their growing relevance for strategic decision-making [14]. Hartmann et al. reported significant improvements in algorithmic precision, particularly in marketing and consumer behavior studies where audience sentiment directly influences outcomes [15]. Zhu explored issues caused by cultural differences in sentiment interpretation, highlighting the difficulties of processing globalized streams of communication [16]. Likewise, Van der Velden et al. studied sentiment analysis applied to politics, illustrating its value to monitor changes in public discourse [17].
Research within particular areas then further broadened the horizons of sentiment analysis. Errami et al. explored the space of sentiment modeling on the Moroccan dialects and assisted in the development of diverse and culturally enhanced analytical paradigms [18]. Mehra studied emotion-based and aspect-specific sentiment from user content and opened new horizons on behavioral dynamics within the field of tourism [19]. Omuya et al. contrasted dimensionality reduction and NLP approaches to enhance sentiment classification precision on social networks [20]. Sussman et al., on public health, applied sentiment analysis to estimate public sentiment on COVID-19 vaccination and garnered interesting results on vaccine reluctance and communication approaches [21].
Deep learning techniques for NLP
Deep learning changed the face of NLP, offering tremendous developments in understanding and deciphering intricate patterns of language. Finally, Table 1 summarizes some of the major articles, thereby acting as an overview to readers approaching the study of such works.
Deep learning techniques for NLP
Deep learning has immensely improved natural language processing (NLP), such that models are able to grasp intricate linguistic patterns and achieve better performance on a variety of tasks. Table 1 offers an overview of prominent studies, providing readers with an account of recent advances within this realm. Gupta et al. [22] illustrated that deep learning models achieve better performance than classic statistical approaches to tasks like sentiment analysis, machine translation, and speech processing. Rodzin et al. [23] pointed to the capacity of these models to process and understand human language highly accurately. Also, Johri et al. [24] followed the trajectory from rule-based to neural network-based architectures within NLP, illustrating the trend to adopt data-powered methodologies. Zhou [25] explained improvements within neural network architecture that further boosted NLP capacities.
Goyal et al. [26] presented an overall review on deep learning techniques that have been applied to NLP, and they introduced the fundamental concepts and methods. Pattanayak [27] specially highlighted the recurrent neural networks (RNNs). Wang and Gang [28] explored the application of convolutional neural networks (CNNs), typically used in computer vision, to NLP tasks. Kazakova and Sultanova [29] examined contemporary challenges in NLP, identifying key areas where further research is required. Krutilla and Kovari [30] outlined the historical development and primary applications of NLP, contextualizing the role of deep learning in expanding its scope. Mankolli and Guliashki [31] analyzed various machine learning models and optimization techniques, contributing to a critical survey of methodologies and their relative performance in NLP applications.
Cross-platform sentiment analysis
Cross-platform sentiment analysis has emerged as a critical area of study, addressing the complexities and opportunities inherent in analyzing data from multiple social media platforms. Table 2 summarizes key studies reviewed in this section, highlighting their focus areas and principal contributions. Pearce et al. [32] investigated visual cross-platform analysis, focusing on methods for examining images shared across diverse social media platforms. Their work underscores the significant role of visual content in sentiment detection and reveals platform-specific characteristics influencing emotional expression. Yang et al. [33] compared topic framing on Twitter and Weibo using machine learning techniques, identifying differences in sentiment expression shaped by cultural and platform-specific dynamics. Similarly, Ruan et al. [34] analyzed public reactions to the 2019 Ridgecrest earthquake on Twitter and Reddit, demonstrating the necessity of cross-platform approaches to capture a comprehensive view of user sentiment during real-world events.
Yarchi et al. [35] investigated political polarization through time-series cross-platform analysis, incorporating interactional, positional, and affective dimensions. Novielli et al. [36] analyzed the transferability of sentiment analysis tools among various software engineering communities, underscoring the difficulties of finding universally applicable models. Tao and Peng [37] comparatively analyzed Weibo and Twitter posts on the Russian-Ukrainian conflict, finding convergence and divergence between platforms in sentiment expression and issue framing. Matassi and Boczkowski [38] foregrounded comparative research that employs cross-national, cross-media, and cross-platform orientations, arguing for inclusive approaches to sentiment analysis research. Boumhidi and Benlahbib [39] suggested a cross-platform reputation generation system employing aspect-based sentiment analysis, showing that combined approaches facilitate unified reputation scoring on multiple platforms. Kaufhold et al. [40] dealt with information overload on social media during crises by designing a cross-platform alerting system, reaffirming the necessity to process multi-platform, large-volume information flows efficiently to permit accurate sentiment evaluation and expedited information sharing.
Materials and methods
This section introduces the research methodology, an original deep learning-founded framework for enhanced sentiment classification from media communication. The methodology combines current Natural Language Processing (NLP) state-of-the-art approaches and advanced deep learning architectures to improve the accuracy and context-awareness of sentiment classification.
System modules and their working
This proposed sentiment analysis system comprises a number of modules that are interlinked and performing definite roles while performing sentiment analysis. These modules include data collection, data preprocessing, training of the model, real-time processing, and looped feedback, described below and illustrated by Fig 2:
- Data Collection Module: This module implements focused data acquisition strategies to collect textual data from the different available digital platforms. Subsequent scraping algorithms, by use of complex APIs, help extract relevant user-generated content—comments, reviews, and posts—to come up with a dataset that is rich in diverse public opinion and sentiment.
- Data Preprocessing Module: This module triggers the data cleansing operation after data collection. It filters out non-relevant elements like advertisements and spam and standardizes text in the path of its uniform analysis using NLP techniques, which includes language-particular preprocessing steps like tokenization and lemmatization. Translation algorithms are also applied to analyze multilingual data.
- Model Training Module: It uses neural network architectures specialized in sentiment analysis. This model is therefore able to extract complex emotional nuances and contextual subtleties in data within such a hybrid approach.
- Real-time Analysis Module: This module lies at the core of the system, utilizing trained neural networks in researching continuous streams of data. High-speed processing is enabled within this module to interpret big datasets at very high speed and give instant insight into the sentiments prevailing in various media channels with all contextual subtlety in the data.
- Feedback Loop Module: This module is integral to the system’s adaptive process. It relies on machine learning techniques that can look into how well the Sentiment Analysis model is working, hence bringing out areas for improvement. In this regard, this links user feedback and system-generated performance metrics, hence able to refine iteratively the model for its adaptability to evolving linguistic trends and user behaviors.
Data collection and preprocessing
The sentiment analysis framework relies on comprehensive and high-quality datasets to ensure accurate and robust predictions. The sequential flow of multi-source data aggregation, transformation, and weighting is systematically formalized in Algorithm 1, which guarantees structural integrity and linguistic consistency prior to deep learning model ingestion.
Algorithm 1. Data collection and preprocessing workflow.
After the initial collection, each dataset Di is subjected to a transformation process using the function Ti to normalize and standardize its contents. This transformation, formalized in Eq 1,
prepares the raw data for homogeneous processing by performing tokenization, lemmatization, removal of stop words, and structural normalization. Here, represents the transformed dataset ready for integration. To account for varying levels of reliability and relevance across sources, a weighting function is applied to each transformed dataset. This process, described in Eq 2,
assigns a weight Wi to each dataset such that sources with higher quality or greater relevance contribute proportionally more to the final dataset. This ensures that noise from less reliable sources is mitigated during aggregation. The fully integrated and preprocessed dataset is obtained through the final aggregation step, expressed in Eq 3e:
where and W represent the composite function of transformation and collection, and aggregate weighting matrix to combined dataset, respectively. The above operation produces Dfinal, a normalized, cleaned, and weighted dataset ready to undergo further deep learning-based sentiment analysis. The processing stages further work towards enhancing the quality of Dfinal. The preprocessing stages include removal of noise, normalization of linguistic constructs, and preparation of data format suitable to deep learning models. The whole pipe line is encapsulated by Eqs 4a–4c:
where fclean eliminates noise and irrelevant entries, fnorm applies linguistic and structural normalization, and fprep prepares the data for compatibility with the BG-Hybrid model.
Proposed model development
The creation of an adaptive and strong sentiment analysis model demands both an architecture that has been clearly designed and a training plan that is suitable. The current section provides the basic structure of the designed hybrid deep learning model, BG-Hybrid, and its optimization process and assessment approach.
Architectural formulation of the model
To address the requirement to process real-time sentiment detection while working with dynamic and multi-source data streams, the BG-Hybrid technique interweaves the bidirectional encoding merits of BERT and the generative modelling capability of GPT [41,42]. The general overall level representation of the model is given by Eq 5a:
where x denotes the input text sequence. Here, the BERT module processes the input to generate context-rich embeddings, which are then passed to the GPT module for sequence modeling and sentiment generation. The transformations within BERT and GPT are defined in Eqs 6a and 6b:
where E(x) represents the embedded vector of input x, and and
refer to the encoder and decoder transformer stacks, respectively. The intermediate hidden state h from BERT serves as the input to GPT for further generative processing. Sentiment classification is performed using a linear transformation followed by a softmax activation, as described in Eq 7a:
where denotes the predicted probability distribution over sentiment classes C, Wo is the weight matrix, and bo is the bias vector of the output layer. The training objective minimizes the cross-entropy loss defined in Eq 8a:
where represents the total loss, Θ includes all model parameters, and (x,y) are data-label pairs from the training dataset
. The loss penalizes discrepancies between the true labels y and predicted probabilities
. Model parameters are updated iteratively using the Adam optimizer, as described in Eq 9a:
where η is the learning rate, and ,
are bias-corrected estimates of the first and second moments, respectively. ε ensures numerical stability. In addition to classification, the model also generates contextual embeddings useful for downstream tasks, expressed in Eq 10a:
where denotes the intermediate representation extracted from the embedding layers. The dataset
is partitioned into training, validation, and testing subsets as defined in Eq 11a:
During training, loss minimization is performed over , while validation on
helps monitor overfitting. Model performance is assessed using accuracy and F1-score, defined in Eqs 12a and 12b:
where is the indicator function that equals 1 if the predicted class matches the true label and 0 otherwise. The F1-score balances precision and recall, making it suitable for imbalanced class distributions. The specific hyperparameter settings and architectural configurations employed in the BG-Hybrid model, including the adaptive streaming buffer and optimizer parameters, are detailed in Table 3, which serves as a comprehensive reference for reproducibility and fine-tuning.
Sentiment analysis techniques
The BG-Hybrid model performs sentiment analysis by generating probability distributions over predefined sentiment classes. As shown in Algorithm 2, the workflow commences with the computation of (Eq 13a) and
(Eq 14a), iteratively enriching predictions through emotion intensities, contextual scoring, and temporal smoothing.
Given an input text sequence x and a set of sentiment classes , the model first computes the probability distribution
over all classes as shown in Eq 13a:
where Ws and bs represent the trainable weights and biases of the classification layer, respectively. Θ denotes the parameters of the BG-Hybrid model. The softmax activation ensures a normalized probability distribution across sentiment classes. The predicted sentiment class is then determined by selecting the class with the maximum probability, as defined in Eq 14a:
where identifies the class
that maximizes the predicted probability
. Beyond categorical prediction, the model computes an emotion intensity score
for each class
. This score aggregates hidden representations hj, weighted by attention coefficients
, as expressed in Eq 15a:
Algorithm 2. Sentiment analysis procedure for BG-hybrid model.
where n is the number of hidden states contributing to the aggregation, represents the attention weight assigned to the j-th hidden state, and hj denotes its corresponding feature vector. To incorporate semantic coherence across adjacent sentences, a contextual sentiment score
is calculated using Eq 16a:
where Ck is the set of surrounding sentences within a contextual window and |Ck| denotes its cardinality. Temporal sentiment dynamics are handled by a function , which balances current sentiment predictions with historical trends, as defined in Eq 17a:
where controls the weighting between the current prediction and its temporal context.
aggregates historical sentiment data relevant to time t. For document-level analysis, the average sentiment representation
is computed across all sentences in a document D, as given in Eq 18a:
where denotes the total number of sentences in the document. Finally, the model quantifies prediction uncertainty using a confidence uncertainty score
, as shown in Eq 19a:
where reflects the complement of the predicted probability for class
, indicating lower confidence when
is close to 0.5.
Real-time stream-based sentiment analysis framework
The proposed framework provides a mathematically structured approach for analyzing real-time data streams, aiming for precise sentiment scoring and dynamic model refinement. The operational flow of real-time sentiment processing and adaptive feedback refinement is systematically outlined in Algorithm 3, where dynamic window management and path optimization ensure scalable performance under fluctuating data rates.
Algorithm 3. Real-time stream-based sentiment analysis workflow.
Stream processing and segmentation architecture.
Incoming data streams are denoted as , constructed by aggregating individual atomic elements di(t) arriving at time t. The complete sentiment output transformation from the stream is defined in Eq 20:
where represents the cumulative sentiment output at time t, Θ denotes model parameters, N is the total number of data elements, and Δ is the dynamic segmentation window. Segment-wise sentiment aggregation for localized analysis is described in Eq 21:
where denotes the number of elements in the current segment. The normalization of sentiment outputs to stabilize trends is given in Eq 22:
where and
represent historical mean and standard deviation. Anomaly detection for data integrity is modeled in Eq 23:
with and
denoting the mean and standard deviation of the data stream, and k controlling sensitivity.
Adaptive window management for dynamic data rates.
The window size adjusts dynamically with data velocity. Its update is defined in Eq 24:
where indicates data arrival rate and
is its temporal derivative. The threshold function for adjusting
is given in Eq 25:
where a, b, and c are tunable coefficients ensuring responsiveness to sudden changes in data rate. When historical trends are available, predictive window updates are performed using Eq 26:
where forecasts window adjustments based on prior data rate patterns. If user feedback is received, dynamic refinement of the window size occurs as expressed in Eq 27:
where integrates correctional signals from users. To maintain stability, smoothed updates of
are computed using Eq 28:
where γ is a smoothing factor balancing the influence of new and previous window sizes.
Real-time path optimization for feedback loops.
Path selection and feedback prioritization in the system rely on maximizing expected efficiency metrics. The process is captured in Eqs 29a:
where:
- (Eq 29a) is the optimal path for node i based on neighboring nodes
and their expected efficiency
.
- (Eq 29b) is the path score, balancing throughput
, latency
, and efficiency change rate
with weights
,
, and
.
- (Eq 29c) represents the temporal derivative of the expected efficiency at node j.
Experimental simulation setup
This section offers the general setup for system evaluation, namely hardware settings, software environments, datasets, preprocessing methods. The hardware platform ran a top-of-the-line Intel Core i9-10900K processor on Ubuntu 20.04 LTS, alongside an NVIDIA GeForce RTX 3080 GPU to facilitate accelerated deep learning computations. The software suite was Python 3.8, alongside TensorFlow 2.4 and PyTorch 1.7 to create, train, and make predictions on models.
To establish comparative benchmarks, system performance was evaluated against three prominent approaches in sentiment analysis: the Competence-Based e-Assessment (CBA-Assessment) by Amraouy et al. [43], the Facial Expression Recognition model (FER-Audience) by Kanipriya et al. [44], and Emote-Based Sentiment Analysis on Twitch Comments (Emote-Twitch) proposed by Kobs et al. [45]. These baselines were selected due to their distinct methodologies and relevance in capturing affective states across diverse user-generated content.
The datasets used within these simulations were selected to include diverse linguistic structures and domain settings. The baseline dataset, Twitter Sentiment140 [46], consists of 1.6 million labeled tweets by polarity—positive, negative, and neutral. The dataset captures dynamic, concise social media text patterns. As its counterpart, the Amazon Customer Reviews dataset [47] consists of about 400,000 reviews over several product categories, providing longer-form text data and cross-domain diversity.
Before being integrated into the experimental framework, the datasets went through extensive preprocessing to achieve data quality and consistency. For Twitter Sentiment140, removal of URL’s, user mentions, and non-textual entities was performed followed by tokenization, lemmatization, and removal of stopwords to achieve linguistic normalization and noise reduction. The same was done to the Amazon Customer Reviews dataset, which was further handled for inline HTML tags and format anomalies. Fig 3 shows the process of tokenization done on the Twitter Sentiment140 dataset and shows the systematic conversion of raw text to properly structured tokens that are then ready to be put to the BG-Hybrid model.
Table 4 provides an overview of datasets and experimental conditions, such as dataset sizes, training-validation-test divisions, and preprocessing approaches. The two datasets were divided into 80% training, 10% validation, and 10% test parts to ensure consistency among experiments.
Evaluation metrics and comparative results
This section presents an in-depth evaluation of the proposed BG-Hybrid sentiment analysis framework using rigorous experimental protocols. All results were obtained through ten-fold cross-validation, ensuring statistical robustness and minimizing bias across diverse data splits. The system’s performance is benchmarked against three prominent methods—CBA-Assessment [43], FER-Audience [44], and Emote-Twitch [45]—to highlight improvements in accuracy, precision, recall, F1-Score, and response time.
Accuracy
The accuracy metric reflects the proportion of correctly classified sentiment instances relative to the total number of instances. Using ten-fold cross-validation, the BG-Hybrid model achieved an average accuracy of 94.5%, outperforming all baseline methods (Fig 4). Specifically, CBA-Assessment yielded an accuracy of 88.7%, effective within e-assessment domains but less adaptable to heterogeneous data. FER-Audience, relying on facial expression cues, reported 82.3%, while Emote-Twitch achieved 89.5% in sentiment detection across Twitch comment streams.
Precision
Precision evaluates the proportion of true positive predictions within all positive predictions. As depicted in Fig 5, the BG-Hybrid system achieved an average precision of 92.3% across folds, demonstrating superior discrimination capabilities in identifying relevant sentiment categories. Comparatively, CBA-Assessment recorded 85.4%, Emote-Twitch achieved 86.7%, and FER-Audience trailed at 78.9%.
Recall
Recall measures the system’s ability to correctly identify positive instances out of all actual positive cases. The BG-Hybrid framework attained an average recall of 90.8% (Fig 6), outperforming CBA-Assessment (83.2%), Emote-Twitch (81.7%), and FER-Audience (76.4%). These results emphasize the framework’s robustness in capturing a broad spectrum of sentiment signals across domains.
F1-Score
The F1-Score, a harmonic mean of precision and recall, provides a balanced assessment of model performance. The BG-Hybrid model achieved an average F1-Score of 91.5% (Fig 7), surpassing CBA-Assessment (84.5%), Emote-Twitch (84.1%), and FER-Audience (77.5%). This improvement highlights the system’s capability to maintain high accuracy in both identifying and categorizing sentiments across diverse input data.
Table 5 consolidates the performance metrics of all methods, reinforcing the BG-Hybrid model’s superiority in achieving consistently high scores across all evaluation parameters.
Response time
Response time is critical for real-time sentiment analysis applications. The BG-Hybrid framework demonstrated an average response time of 250 ms across varied scenarios, including social media streams, customer reviews, news articles, and live commentary (Fig 8). Scenario-specific timings are provided in Table 6. These results affirm the system’s readiness for deployment in latency-sensitive environments. By comparison, CBA-Assessment reported a mean response time of 320 ms, Emote-Twitch averaged 350 ms, and FER-Audience exhibited the highest latency at 410 ms, reflecting constraints in dynamic, high-throughput contexts.
User feedback alignment
The effectiveness of the proposed sentiment analysis framework in aligning with user feedback was evaluated through ten-fold cross-validation, ensuring methodological rigor and robustness. This assessment was performed across six distinct evaluation scenarios: social media sentiment analysis, customer review analysis, news article sentiment classification, live event commentary analysis, forum discussion sentiment detection, and sentiment analysis of product feedback. Together, these scenarios provided a comprehensive overview of system adaptability across varied application contexts.
The metrics exhibit good correlation between system prediction and end-user judgments. Specifically, the framework exhibited 90% alignment scores on sentiment analysis via social media, 88% via customer reviews, 87% via news posts, 89% via live broadcast comments, 91% via forum posts, and 86% via product comments. Generally, average alignment scores on all scenarios amounted to 88.5%, as shown in Fig 9.
Comparatively, CBA-Assessment yielded an alignment score of 78%, FER-Audience 70%, and Emote-Twitch 75%. These outcomes show the envisaged system to possess an exceptional capacity to mirror user preference and intention and, thus, to work excellently and consistently on various sentiment analysis tasks.
Scalability analysis
Scalability, which was described by the capability of the system to maintain constant performance while processing increasing data sets, was thoroughly assessed by an identical tenfold cross-validation framework. The testing consisted of six work scenarios that are the same ones applied to inspect the congruence between learning and user experience. Scenarios were scaled incrementally on data throughput to verify the responsiveness and efficiency of the system.
The proposed framework demonstrated notable scalability across all scenarios. It achieved scores of 93% in social media sentiment analysis, 91% in customer reviews, 90% in news articles, 92% in live event commentary, 94% in forum discussions, and 89% in product feedback analysis. The mean scalability score across these scenarios was 91.5%, as depicted in Fig 10.
For context, CBA-Assessment attained an average scalability score of 80%, reflecting limitations in handling larger datasets effectively. FER-Audience, which relies heavily on computationally intensive visual data analysis, scored 72%, and Emote-Twitch, optimized for Twitch comments, achieved 78%. These comparative results underscore the efficiency and robustness of the proposed system, particularly in high-volume and dynamic environments where traditional approaches tend to degrade.
Ablation study and detailed performance analysis
This section includes ablation experiments isolating key components, comparative performance of configurations, and qualitative analysis of best and worst case results. Table 7 illustrates the individual and cumulative contributions of major components: BERT encoder, GPT decoder, dynamic feedback loop, and temporal segmentation. The removal of any component leads to a noticeable decline in performance across all metrics.
The data in Table 7 demonstrates that the feedback loop and temporal segmentation significantly improve user alignment and scalability. The combination of BERT and GPT provides the highest precision and recall, validating the hybrid design. Table 8 presents an analysis of best and worst performing scenarios across different datasets and contexts. This highlights where the system excels and where it faces challenges.
As seen in Table 8, performance dips slightly in contexts with high linguistic variability (e.g., product feedback), underscoring potential avenues for enhancement. Table 9 compares different variants of the BG-Hybrid model with alternative configurations and baseline models.
The full BG-Hybrid configuration outperforms all alternatives, reinforcing the merit of integrating bidirectional and generative transformers. Table 10 details how variations in key hyperparameters affect performance metrics.
Optimal settings (learning rate 0.001, batch size 32) demonstrate balanced performance. Table 11 highlights common error types and their potential causes.
These findings highlight the model’s robustness while identifying areas where targeted improvements, such as sarcasm detection modules, could be beneficial. The ablation results underscore the importance of each system component in delivering high accuracy and scalability. The dynamic feedback mechanism and temporal segmentation enhance adaptability to diverse contexts, while the integration of BERT and GPT ensures balanced performance across precision and recall. Best case scenarios demonstrate the system’s strength in structured textual environments, while worst case analyses point to challenges in handling linguistic diversity and sarcasm.
Discussion
This work proposes an enhanced sentiment analysis framework capable of overcoming the difficulties presented by dynamic and diverse media content. The BG-Hybrid framework, which is suggested, combines state-of-the-art NLP features and a dynamic feedback loop to permit real-time adaptability and context-aware sentiment detection. Simulation outputs validate that the BG-Hybrid framework always performs better than current methods using various evaluation metrics. Notably, the model was able to achieve 94.5% accuracy, considerably better than alternative approaches. Such a level of accuracy highlights the system’s capacity to identify subtle sentiment shifts within varying textual content. The model was further able to achieve 92.3% precision and 90.8% recall, further evidencing its capacity to reduce false positives and false negatives when applicable to sentiment classification tasks.
F1-Score 91.5% further testifies to the balanced performance of the system between precision and recall that attests to its credibility in practical uses. Apart from accuracy metrics, the system showed an average response time of 250 milliseconds and 91.5% scalability score. The findings attest to its feasibility to process large-scale data streams in real-time, an essential requirement to facilitate its use in fast-changing digital contexts. Comparative evaluation against current approaches (Competence-Based e-Assessment (CBA-Assessment), Facial Expression Recognition (FER-Audience), and Emote-Based Sentiment Analysis on Twitch Comments (Emote-Twitch)) showed that the suggested framework offers better performance on all metrics that matter. Its flexibility stands out particularly in its capability to integrate user feedback on the fly so that refinements are made continuously to compensate for changing linguistic patterns and user engagement. Such an aspect makes the BG-Hybrid model an adaptive and future-proof platform to perform sentiment analysis on diverse domains and sources of data.
Conclusion
Sentiment analysis remains a crucial research priority, particularly since content on digital platforms becomes more diverse, dynamic, and context-dependent. The presented work offers an extensive framework that addresses such issues by employing an integration between an adaptive feedback mechanism and the BG-Hybrid model to achieve real-time sentiment interpretation that, while context-aware and temporally-aware, remains capable to effectively scale and generalize to large datasets. The approach differentiates by leveraging its hybrid architecture, employing BERT-based bidirectionally encoded and GPT-based generative reasoning, and offers a new path to fine-granular emotion detection within large-scale datasets. Extensive experimental analyses exhibit superior core metric performance. The model exhibited 94.5% accuracy, 92.3% precision, 90.8% recall, and an average balanced F1-score of 91.5%, which exceeded baselines CBA-Assessment (88.7% accuracy) and FER-Audience (82.3% accuracy). Scalability testing yielded 91.5% and average response times remained consistently 250 ms throughout diverse simulation test setups. User alignment to feedback was 88.5%, and its flexibility that enables dynamic evolving sentiment contexts deserve emphasis. Future work will involve integrating this work within multimodal data streams, such as visual and audio signals, and extending the adaptive processes to manage developing linguistic trends and domain-specific terms.
Supporting information
S1 File.
The Sentiment140 Twitter sentiment dataset analyzed in this study is publicly available and can be downloaded directly from: (https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/investigating-sentiment-analysis/data/training.1600000.processed.noemoticon.csv.zip). The Amazon Customer Reviews dataset is publicly available via the AWS Registry of Open Data at: (https://registry.opendata.aws/amazon-reviews/).
https://doi.org/10.1371/journal.pone.0332106.s001
(ZIP)
References
- 1. Liebovitch LS, Powers W, Shi L, Chen-Carrel A, Loustaunau P, Coleman PT. Word differences in news media of lower and higher peace countries revealed by natural language processing and machine learning. PLoS One. 2023;18(11):e0292604. pmid:37910443
- 2. Hasan M, Ahmed T, Islam MR, Uddin MP. Leveraging textual information for social media news categorization and sentiment analysis. PLoS One. 2024;19(7):e0307027. pmid:39008472
- 3. Kastrati Z, Dalipi F, Imran AS, Pireva Nuci K, Wani MA. Sentiment analysis of students’ feedback with NLP and deep learning: a systematic mapping study. Applied Sciences. 2021;11(9):3986.
- 4. Kastrati Z, Ahmedi L, Kurti A, Kadriu F, Murtezaj D, Gashi F. A deep learning sentiment analyser for social media comments in low-resource languages. Electronics. 2021;10(10):1133.
- 5. Hunte MR, McCormick S, Shah M, Lau C, Jang EE. Investigating the potential of NLP-driven linguistic and acoustic features for predicting human scores of children’s oral language proficiency. Assessment in Education: Principles, Policy & Practice. 2021;28(4):477–505.
- 6. Alachram H, Chereda H, Beißbarth T, Wingender E, Stegmaier P. Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks. PLoS One. 2021;16(10):e0258623. pmid:34653224
- 7. Gupta I, Joshi N. Feature-based Twitter sentiment analysis with improved negation handling. IEEE Trans Comput Soc Syst. 2021;8(4):917–27.
- 8. Chen K, Wei G. Public sentiment analysis on urban regeneration: a massive data study based on sentiment knowledge enhanced pre-training and latent Dirichlet allocation. PLoS One. 2023;18(4):e0285175. pmid:37104499
- 9. Xu QA, Chang V, Jayne C. A systematic review of social media-based sentiment analysis: emerging trends and challenges. Decision Analytics Journal. 2022;3:100073.
- 10. Jiang Z, Seyedi S, Haque RU, Pongos AL, Vickers KL, Manzanares CM, et al. Automated analysis of facial emotions in subjects with cognitive impairment. PLoS One. 2022;17(1):e0262527. pmid:35061824
- 11.
Zad S, Heidari M, Jones JH, Uzuner O. A survey on concept-level sentiment analysis techniques of textual data. In: 2021 IEEE World AI IoT Congress (AIIoT). 2021. p. 285–91. https://doi.org/10.1109/aiiot52608.2021.9454169
- 12.
Suryawati CT, Nugroho RD, Ainie I, Irmayanti D, Listyaningsih, Andarwati TW, et al. Sentiment analysis on investment education from Twitter using ensemble learning. In: 2023 International Seminar on Application for Technology of Information and Communication (iSemantic). 2023. p. 364–9. https://doi.org/10.1109/isemantic59612.2023.10295308
- 13. He L, Wang Z, Wang L, Li F. Multimodal mutual attention-based sentiment analysis framework adapted to complicated contexts. IEEE Trans Circuits Syst Video Technol. 2023;33(12):7131–43.
- 14. Rodríguez-Ibáñez M, Casánez-Ventura A, Castejón-Mateos F, Cuenca-Jiménez PM. A review on sentiment analysis from social media platforms. Expert Systems with Applications. 2023:119862.
- 15. Hartmann J, Heitmann M, Siebert C, Schamp C. More than a feeling: accuracy and application of sentiment analysis. International Journal of Research in Marketing. 2023;40(1):75–87.
- 16. Zhu L. Retracted article: cultural adaptation and sentiment analysis based on neuron coding in the context of new media communication. Soft Comput. 2023;28(S2):521–521.
- 17.
van der Velden M, Umansky N, Pipal C. Sentiment analysis. Encyclopedia of Political Communication. 2023.
- 18. Errami M, Ouassil MA, Rachidi R, Cherradi B, Hamida S, Raihani A. Sentiment analysis on moroccan dialect based on ML and social media content detection. IJACSA. 2023;14(3).
- 19. Mehra P. Unexpected surprise: emotion analysis and aspect based sentiment analysis (ABSA) of user generated comments to study behavioral intentions of tourists. Tourism Management Perspectives. 2023;45:101063.
- 20. Omuya EO, Okeyo G, Kimwele M. Sentiment analysis on social media tweets using dimensionality reduction and natural language processing. Engineering Reports. 2022;5(3).
- 21. Sussman KL, Bouchacourt L, Bright LF, Wilcox GB, Mackert M, Norwood AS, et al. COVID-19 topics and emotional frames in vaccine hesitation: a social media text and sentiment analysis. Digit Health. 2023;9:20552076231158308. pmid:36896330
- 22.
Gupta M, Verma SK, Jain P. Detailed study of deep learning models for natural language processing. In: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). 2020. p. 249–53. https://doi.org/10.1109/icacccn51052.2020.9362989
- 23.
Rodzin S, Bova V, Kravchenko Y, Rodzina L. Deep learning techniques for natural language processing. In: Computer Science On-line Conference. Springer; 2022. p. 121–30.
- 24.
Johri P, Khatri SK, Al-Taani AT, Sabharwal M, Suvanov S, Kumar A. Natural language processing: history, evolution, application,, future work. In: Proceedings of 3rd International Conference on Computing Informatics and Networks: ICCIN 2020 . 2021. p. 365–75.
- 25. Zhou Y. Natural language processing with improved deep learning neural networks. Scientific Programming. 2022;2022:1–8.
- 26.
Goyal P, Pandey S, Jain K, Goyal P, Pandey S, Jain K. Introduction to natural language processing and deep learning. Deep learning for natural language processing: creating neural networks with python. 2018; p. 1–74.
- 27.
Pattanayak S, Pattanayak S. Natural language processing using recurrent neural networks. Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python. 2017. p. 223–78.
- 28.
Wang W, Gang J. Application of convolutional neural network in natural language processing. In: 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE). 2018. p. 64–70. https://doi.org/10.1109/iciscae.2018.8666928
- 29. Kazakova MA, Sultanova AP. Analysis of natural language processing technology: modern problems and approaches. Vestnik Donskogo gosudarstvennogo tehničeskogo universiteta. 2022;22(2):169–76.
- 30.
Krutilla Z, Kovari A. The origin and primary areas of application of natural language processing. In: 2022 IEEE 22nd International Symposium on Computational Intelligence and Informatics and 8th IEEE International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics (CINTI-MACRo). 2022. p. 000293–8. https://doi.org/10.1109/cinti-macro57952.2022.10029432
- 31.
Mankolli E, Guliashki V. Machine learning, natural language processing: review of models and optimization problems. ICT Innovations 2020 . Machine Learning and Applications: 12th International Conference, ICT Innovations 2020, Skopje, North Macedonia, September 24–26, 2020, Proceedings. Springer; 2020. p. 71–86.
- 32. Pearce W, Özkula SM, Greene AK, Teeling L, Bansard JS, Omena JJ, et al. Visual cross-platform analysis: digital methods to research social media images. Information, Communication & Society. 2018;23(2):161–80.
- 33. Yang Y, Hsu J-H, Löfgren K, Cho W. Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining. Soc Netw Anal Min. 2021;11(1):75.
- 34. Ruan T, Kong Q, McBride SK, Sethjiwala A, Lv Q. Cross-platform analysis of public responses to the 2019 Ridgecrest earthquake sequence on Twitter and Reddit. Sci Rep. 2022;12(1):1634. pmid:35102161
- 35. Yarchi M, Baden C, Kligler-Vilenchik N. Political polarization on the digital sphere: a cross-platform, over-time analysis of interactional, positional, and affective polarization on social media. Political Communication. 2021;38(1–2):98–139.
- 36.
Novielli N, Calefato F, Dongiovanni D, Girardi D, Lanubile F. Can we use SE-specific sentiment analysis tools in a cross-platform setting?. In: Proceedings of the 17th International Conference on Mining Software Repositories. 2020. p. 158–68. https://doi.org/10.1145/3379597.3387446
- 37. Tao W, Peng Y. Differentiation and unity: a cross-platform comparison analysis of online posts’ semantics of the Russian–Ukrainian war based on Weibo and Twitter. Communication and the Public. 2023;8(2):105–24.
- 38. Matassi M, Boczkowski P. An agenda for comparative social media studies: the value of understanding practices from cross-national, cross-media, and cross-platform perspectives. International Journal of Communication. 2021;15:22.
- 39. Boumhidi A, Benlahbib A, Nfaoui EH. Cross-platform reputation generation system based on aspect-based sentiment analysis. IEEE Access. 2022;10:2515–31.
- 40. Kaufhold M-A, Rupp N, Reuter C, Habdank M. Mitigating information overload in social media during conflicts and crises: design and evaluation of a cross-platform alerting system. Behaviour & Information Technology. 2019;39(3):319–42.
- 41. Talaat AS. Sentiment analysis classification system using hybrid BERT models. J Big Data. 2023;10(1).
- 42. Chumakov S, Kovantsev A, Surikov A. Generative approach to Aspect based sentiment analysis with GPT language models. Procedia Computer Science. 2023;229:284–93.
- 43.
Amraouy M, Bellafkih M, Bennane A, Talaghzi J. Sentiment analysis for competence-based e-assessment using machine learning and lexicon approach. In: The International Conference on Artificial Intelligence and Computer Vision. 2023. p. 327–36.
- 44. Kanipriya M, Krishnaveni R, Krishnamurthy M. Recognizing audience feedback through facial expression using convolutional neural networks. International Journal of Engineering Research and Technology. 2020;13(12):4230–5.
- 45. Kobs K, Zehe A, Bernstetter A, Chibane J, Pfister J, Tritscher J. Emote-controlled: obtaining implicit viewer feedback through emote-based sentiment analysis on comments of popular twitch. tv channels. ACM Transactions on Social Computing. 2020;3(2):1–34.
- 46.
Harjule P, Gurjar A, Seth H, Thakur P. Text classification on Twitter data. In: 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE). 2020. p. 160–4. https://doi.org/10.1109/icetce48199.2020.9091774
- 47.
Haque TU, Saber NN, Shah FM. Sentiment analysis on large scale Amazon product reviews. In: 2018 IEEE International Conference on Innovative Research and Development (ICIRD). 2018. p. 1–6. https://doi.org/10.1109/icird.2018.8376299