Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

RETRACTED: Enhanced E-commerce decision-making through sentiment analysis using machine learning-based approaches and IoT

Retraction

The PLOS One Editors retract this article [1] because it was identified as one of a series of submissions for which we have concerns about peer review integrity and potential manipulation of the publication process. These concerns call into question the validity and provenance of the reported results. We regret that the issues were not identified prior to the article’s publication.

SK and OMG did not agree with the retraction. YF, AE, EA, IBP, and KK either did not respond directly or could not be reached.

5 Feb 2026: The PLOS One Editors (2026) Retraction: Enhanced E-commerce decision-making through sentiment analysis using machine learning-based approaches and IoT. PLOS ONE 21(2): e0342322. https://doi.org/10.1371/journal.pone.0342322 View retraction

Abstract

E-commerce is a vital component of the world economy, providing people with a simple and convenient method for shopping and enabling businesses to expand into new global markets. Improving e-commerce decision-making by utilizing IoT and machine intelligence represents an important area for the impact of these technologies. Our objective is to elevate online shopping to a new level, making it a practical and genuinely delightful experience for customers. Businesses can acquire valuable insights to improve their operations and sales strategies by employing IoT devices to collect customer behavior and preference data and using machine learning (ML) algorithms to analyze them. In addition, companies can make simple recommendations using machine learning on the collected data. Our creative implementation of ML algorithms extends beyond simple recommendations. It also includes demand forecasting, guaranteeing that popular products are constantly in stock, reducing disappointments, and increasing consumer satisfaction. We applied several ML techniques, including logistic regression, Naïve Bayes, Support Vector Machine (SVM), Random Forest (RF), AdaBoosting, Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM). AdaBoosting outperformed the deep learning (DL) techniques LSTM and GRU and four ML techniques, logistic regression, Naïve Bayes, SVM, and RF, regarding F1 scores, accuracy, precision, and recall. It achieved an accuracy of 88%, an F1-score of 0.927, precision-1 of 0.908, and the ability of identifying true negatives and true positives (recall-0 and recall-1) of 0.569 and 0.947 respectively. Except for SVM, the other ML techniques did not exhibit much performance difference when using the count vectorizer and TD-IDF vectorizer. This study advances e-commerce capabilities through IoT and machine learning and paves the way for a new era of customer-centric, efficient, and adaptive retail strategies.

1. Introduction

The emergence of online commerce has revolutionized business operations and the shopping experience for consumers globally. Many people nowadays prefer to use e-commerce platforms to buy and sell from within their palaces (homes and the like); it is more convenient for consumers. To increase their sales, a lot of the Brick-and-Mortar companies have established their online platforms. This shift has reshaped business practices and operations by allowing companies to connect with global markets and provide customers with tailored and more convenient shopping journeys. Accordingly, new opportunities for businesses’ growth and expansion are created. That said, a new set of challenges, such as managing the supply chain, competing with global rivals, and satisfying customer expectations, have emerged and need to be addressed. This means that businesses have to look for new effective strategies to align with these advancements, handling the challenges and providing personalized, seamless customer experiences.

Here is where machine learning (ML) and the Internet of Things (IoT) come into play. ML and IoT technologies have grown substantially and become popular for their problem-solving capabilities. The wide variety of IoT gadgets, like sensors, smartphones, screens, smartwatches, and interconnected systems, produce quantities of data streams. ML methods and models analyze the information collected to generate meaningful perceptions. Many companies have adopted these technologies to generate insights into customer behaviors and preferences that help their decision-making, business operations, and strategic planning.

Integrating ML and IoT into the e-commerce industry is still at its outset. It requires further examination of the benefits they provide and the obstacles to widespread acceptance. Even though there has been a seeming progress in the use of these technologies, there remains a critical gap in the literature as relates to the integration of IoT and ML in recommendation systems, especially when it comes to real-time, data-driven decision making in e-commerce. Most solutions to date have lacked the potential for leveraging IoT data streams to the fullest for building highly personalized recommendations that can dynamically adapt to customer behaviors. Moreover, scalability, integration complexity, and latency further contribute to the low adoption rate of IoT-enabled e-commerce systems.

This work aims to bridge the gap by assessing the effectiveness of integrating IoT and machine learning technologies in improving decision-making in e-commerce. Using machine learning models, we analyze IoT data about customer behavior, buying habits, and reviews to create better-personalized recommendations. The study also assesses the benefits of this approach in terms of sales figures, customer satisfaction, and operational effectiveness of the business. Also, a comparison of the performance of the machine learning and deep learning models is performed.

Motivation

The online business world is constantly changing due to advancements that shape the industry landscape. E-commerce is the main dynamic industry that provides the working environment and technology for such businesses. E-commerce (online shopping) has become an essential part of our daily routine and an important industry for businesses to grow and prosper. Thus, understanding customer behavior and needs becomes critical for businesses to remain competitive. The study seeks to minimize the gap between emerging technologies and the constantly evolving field of e-commerce. Machine learning and IoT were selected as the central technologies for this research because of their important roles in contemporary business transformation. In today’s business landscape, the utilization of machine learning allows businesses to make decisions based on data and improves customer satisfaction. Similarly, IoT plays a pivotal role by providing real-time data insights into customer actions and suggesting products within online retail settings. We are convinced that embracing these technologies is necessary for companies to prosper in the digital age.

This research introduces a novel and thorough approach to e-commerce that integrates IoT-collected data with machine learning algorithms and thoroughly compares conventional and cutting-edge approaches. Therefore, even though machine learning has been utilized within the realm of e-commerce for a while now, incorporating IoT data stands out as an emerging area that significantly differentiates this work. Using data about women’s clothing, our study highlights how these methods are directly applicable and address the industry’s challenges such as improving customer satisfaction and refining sales strategies and suggestions.

Main contributions

Businesses must comprehend the needs and behaviour of their customers to remain competitive in the ever-evolving world of e-commerce, where technology is permeating every aspect of our everyday lives.

Utilising technology in this industry is essential because it is evolving daily. The goal is to transform technology into a valuable instrument that provides value by bridging the gap between these technologies and the constantly changing area of e-commerce. Since they are spearheading this shift, IoT and machine learning were my choices.

Machine learning because it helps companies to make data-driven decisions and enhance the customer experience, and IoT because it is a crucial data source that offers real-time insights into user behaviour and product recommendations in e-commerce situations. Businesses that want to thrive in the digital age, ought to embrace these technologies rather than merely opting for them.

Main Contributions of this paper can be summarized as follows

This work provides a new and comprehensive view of the e-commerce sector by fusing machine learning techniques with IoT data.

In particular, we conducted a comprehensive comparison analysis that encompassed both traditional and innovative methodologies.

Therefore, even though e-commerce has long employed machine learning, the incorporation of IoT data is a recent development that significantly distinguishes my work.

This study demonstrates innovative applications of basic ML techniques within e-commerce, bridging the gap between e-commerce, machine intelligence, and IoT and providing actionable insights for businesses in the e-commerce sector.

The key contributions of this article can be summarized as follows:

  1. 1). This study represents a significant advancement in business intelligence and customer-centric strategies.
  2. 2). Machine Learning and IoT Integration in e-commerce: We transform online shopping from a purely functional task into a genuinely useful and delightful experience. We use IoT and ML technologies in e-commerce to create a dynamic, delightful, and engaging shopping environment for online consumers. We use machine learning algorithms to analyze the data received from IoT devices and customer preferences, leading to better product recommendation systems and, thus, for businesses, better utilization of real-time data, resulting in more accurate and personalized insights into their customers.
  3. 3). Comparison and Model Selection: We did a comparative study among different ML and DL models like Logistic Regression, Naïve Bayes, SVM, Random Forest, GRU, LSTM, and AdaBoost. Thus, from the comparison results in this paper, we could deduce that AdaBoost would be the better model, together with TF-IDF vectorization for customer recommendation prediction, for highly accurate and consistent performance in imbalanced datasets.
  4. 4). Business Impact: The proposed approach (combining IoT and machine learning) for data-driven decision-making in our approach makes businesses more capable of understanding market trends and opportunities as well as customer behaviors and interests, allowing them to make changes in their business operations and strategies to meet such demand. This will also mean achieving efficient inventory management, enhanced customer satisfaction, and improved overall sales strategies.

Organization

The remainder of the paper is organized as follows: Section 2 reviews the related literature. Section 3 presents the system model and problem definition. Section 4 presents the research methodology. Section 5 introduces machine learning-based approaches. Section 6 presents simulation results, a discussion, and conclusions. Finally, Section 7 concludes the paper and suggests directions for future work.

2. Related work

New technology, the expansion of e-commerce, and market globalization have posed difficulties in establishing business models. Supply chain management and logistics are crucial to competition between industries. Machine learning, IoT, and blockchains can drastically change the supply chain. The integration of these technologies increases control over supply chain networks. The idea of a circular economy has become essential for enterprises to continue operating. Circularity, digitization, industry, green, and lean techniques are only a few examples of how sustainability can be achieved. Optimization models that make good business sense are needed because of the challenges of implementing machine learning, IoT, and blockchain facilities in the supply chain, sustainability, and the circular economy.

This literature provides insights into incorporating technologies like ML, IoT, and blockchain into supply chain management for e-commerce and sustainability efforts. It provides a holistic approach with extensive and varied methodologies to compare research literature in this field. As a result, it offers a detailed summary of the best way to use technologies to positively influence e-commerce activities.

The presented work demonstrates how technologies like ML, IoT, and blockchain are employed for the benefit of supply chain management, specifically to enhance e-commerce and sustainability efforts. It examines each technology individually, comparing evolving methods and studies in the field. This review differentiates itself from others is the way it groups study, which, consequently, offers a crisp summary of how these technologies are being used and their influence on e-commerce.

The hasty growth of global markets and the growing need for sustainable practices place a rising need for this review as it demonstrates clear evidence that it is more advanced than another relevant research. It highlights how blockchain, IoT, and machine learning are utilized to overcome hurdles that face e-commerce. These technologies can expand consumers’ experiences, improve supply chain operations, and enhance sustainability objectives. Therefore, the review offers a new standpoint regarding the advantages of using these technologies by presenting a substantial and explicit picture of their interactions.

This review is specifically exciting because it provides a holistic examination of the combined use of blockchain, IoT, and ML in e-commerce. This review accounts for the way these technologies work together to solve problems that they cannot handle individually. For instance, it discusses studies such as using blockchain to improve supply chain transparency or leveraging IoT for smarter inventory management, which are approaches that research has not shed light on yet.

The review also highlights the growing significance of economic and sustainability trends in the current business.

The review also draws connections between technological advancements and business priorities to emphasize the fact that modern commerce prioritizes responsibility and sustainability in achieving goals or executing tasks, which, consequently, mirrors the call for greater change towards greener and more ethical practices.

To conclude, this review affirms that IoT, ML, and blockchain are more connected in the world of e-commerce. In addition, it lays down the present research map and singles out the available chances for futuristic exploration, especially those that examine how these technologies could redesign new models for supply chain management with sustainability at the forefront.

Internet of things (IoT)-based approach

In [1], the authors explored the concept of IoT data fusion, which combines data from numerous sources to present more precise and insightful information than what a single source would provide. The authors developed a system that combines data from different IoT devices—like sensors and beacons—to gather insights and improve decision-making in e-commerce. A major challenge they tackle is the issue of inconsistent or poor-quality data, and they offer several ways to overcome these problems. In [2], a trust-based service ranking system (TBSRS) is introduced to help identify reliable service providers in e-commerce. It includes a similar algorithm, a method for evaluating trust, and a mechanism to rank services. Article [3] describes establishing a cutting-edge IoT-based warehouse to increase inventory reliability and responsiveness. It has intelligent weight-based picking regions, enabling real-time inventory tracking in the retail space and reducing the possibility of excess or insufficient inventory. On the e-commerce side, it strives to improve efficiency and match client requests.

The work in [4] offers prototype improvements for compatibility with a large-scale system after implementing an instantaneous IoT-based automated packaging system that may be employed in a small e-commerce system. Article [5] offers fresh suggestions for innovation and reform of electrical household appliance enterprises to take advantage of IoT opportunities in the era of e-commerce. The authors in [6] examined various technologies utilized for integrating service OWSN and IoT for e-commerce and discussed the current issues and challenges with integration. By researching the e-commerce scenario, they also suggested a mathematical formulation for the problem of safe delivery between the seller and receiver. The study in [7] introduces the concept of IoT, elaborates on IoT technology such as sensing and communications technologies and cloud computing, and then conducts additional analysis of how IoT affects the advancement and innovation of e-commerce. In the work [8], the concept of IoT, its characteristics, security challenges, and trends in adoption in technology are discussed. They also propose a reference architecture for e-commerce enterprises. Article [9] demonstrates how IoT and RFID technology can be incorporated into an e-commerce system to automate sales transactions. In [10], multiplex IoT scheduling technology was combined with descriptive statistical analysis to handle information fusion and packet sample detection for data entering and exiting the e-commerce storehouse. By assisting with information access, IoT can reduce obstacles for those with disabilities. For those with sensory (vision and hearing), motor (limited hand use), and cognitive (learning difficulties of language) impairments, IoT and cloud computing were blended in [11].

Machine learning (ML)-based approach

One of the primary techniques for addressing the issue of massive data mining is machine learning (ML). ML can help e-commerce systems develop and improve independently by gathering prior information. The generation of big data due to the interactions, transactions, and observations in e-commerce businesses can considerably contribute to the decision-making process for marketing strategies. Machine learning was employed in [12] to forecast user attitudes from Bangla texts regarding goods offered on e-commerce websites. The paper [13] offers a prediction approach to identify good and negative Bangla reviews through a comparison-based study. They do this by gathering a large amount of data from e-commerce reviews written in Bangla and using various machine learning algorithms. The work in [14] attempted to use linear regression, a flexible method for forecasting the most well-liked platform that earns the most money. Various machine learning libraries have been employed to carefully examine each element (parameter), such as the number of registrants and frequency of visits, to predict a value that may help e-commerce companies promote their business platforms and increase their revenue.

The study [15] aids in understanding essential issues and examines how e-commerce is related to achieving sustainable development. It employs a positive research design to help researchers understand the subject of the study in greater detail. In addition, the researchers use quantitative research tools to gather data and conduct in-depth analyses to understand the subject properly. The work in [16] analyzes the limitations and difficulties in conventional online shopping behavior analysis and prediction systems using the XGBoost model, decision tree-based linear model logistic regression, and online shopping behavior prediction approaches. The work in [17] proposes an online system using various ML and DL algorithms that make real-time recommendations to assist users in creating intelligent weekly grocery lists based on their past purchases, current promotions, and other data. The study promotes low-carbon consumption by actively encouraging consumers to switch to energy-saving appliances by incorporating ML models in the online reviews analysis process.

The study in [18] created a predictive system for anticipating near-real-time arrivals of e-commerce orders in distribution centers to help third-party providers of logistics services manage the hourly and hourly arrival rates of e-commerce orders. Using supervised ML algorithms such as K-Nearest Neighbor (KNN), Decision Tree (DT), and Support Vector Machine (SVM), the work in [19] predicts consumer preferences for online shopping. Paper [20] used e-commerce data from a tea-device company as an example and advised utilizing the FP-grow method to obtain common item sets to mine and evaluate user behavior association rules to obtain feature vectors for user classification. The construction of a chatbot for e-commerce sales that offers customer services and boosts sales was discussed in [21]. The system is built on a modular chatbot framework and employs machine learning to interpret natural languages.

The work in [22] offers a framework for preventing the sale of counterfeit goods online. Based on the information gathered from IoT devices, the authors suggest developing a system for making decisions that recognize and categorize counterfeit goods. To track and monitor the circulation of goods through the supply chain, the framework uses various sensors, including RFID tags and barcodes. When examining gathered data, machine learning algorithms are applied to determine whether a product is authentic. The research paper [23] focuses on the use of machine learning methods to enhance e-commerce procedures. The use of machine learning algorithms to evaluate consumer behavior, forecast demand, modify pricing policies, provide personalized recommendations, and improve all aspects of customer experience in e-commerce is covered. Methodologies that may be used to forecast potential client churn include statistical and machine-learning approaches like the work of Klimantaviciute [24].

Paper [25] presents a model for assessing users’ attitudes from online reviews on an e-commerce platform. It makes use of machine-learning classifiers, specifically Naive Bayes, logistic regression, support vector machine, and neural networks. The study in [26] selects the five most representative e-commerce enterprises in China and the United States, gathers their stock price information over the last five years, uses machine learning (LSTM neural network and GRU neural network) to predict stock price trends, evaluates the results, and provides a conclusion. Given that e-commerce reviews and comments regarding certain products reveal consumers’ insights and attitudes as well, [27] suggests a hybrid ML approach by combining various ML approaches and training using the K-nearest neighbor (KNN) with a Support Vector Machine (SVM).

The work in [28] focuses on detecting online fraud thanks to machine learning techniques. A comparison study was undertaken comparing various machine learning techniques now accessible. False reviews on the Web can seriously harm e-commerce companies and their clients. Customers rely on reviews to make educated decisions about their purchases, and false evaluations may trick people into purchasing goods that fall short of their expectations. Revenue loss and a decline in customer confidence in the company and its goods may result. The research paper [29] compares the effectiveness of three distinct methods while constructing a machine learning model for false review detection. Sentiment analysis is frequently utilized in the e-commerce sector to enhance efficiency and offer a clearer understanding of making business decisions in the current highly competitive business environment.

In [30], authors built a machine learning model that reviewed three languages (Bangla, English, and Romanized Bangla) and utilized six machine learning methods. The researchers then performed a comparative analysis and discussed the ROC area, accuracy, precision, recall, and F1-scores. Paper [31] focuses on the key factors that motivate businesses to build a data analytics infrastructure on the cloud and demonstrates how to incorporate machine learning models into data analytic workflows to produce more complex analyses for e-commerce activities. Sentiment analysis may identify positive or negative polarity in text, including entire texts, paragraphs, lines, and subsections. Through a systematic review and assessment of corporate and community white papers, academic papers, magazines, and reports, [32] explores and discusses sentiment analysis and provides common techniques for analyzing sentiment from a machine learning standpoint. They aimed to analytically classify, examine, and implement ML methods for sentiment analysis on different applications. After that, they present a research study for e-commerce environments to apply machine learning algorithms to sentiment analysis.

Blockchain-based approach

Since blockchain technology primarily addresses issues such as intermediaries, transparency, decentralization, data security, accuracy, and transactional freedom, the study [33] seeks to build a network of related approaches and techniques for cross-border e-commerce as well as building an architecture based on blockchain for obtaining transactions and traceable goods. The article [34] examines how the traditional “centralized” supply chain model is unable to satisfy cross-border e-commerce needs anymore and how blockchain technology can offer a more effective and decentralized approach to capital flow, logistics, and information flow. The article [35] suggests a framework based on blockchain for product traceability in cross-border e-commerce supply chain management. A blockchain-enabled logistics finance execution platform (BcLFEP) is proposed in [36] to make logistics finance (LF) easier for e-commerce retail.

Paper [37] proposes credibility modeling for e-commerce networks utilizing blockchain technology and big data mining. It introduces a TCB unified management and scheduling security system that links and connects the host’s multiple security systems. The corresponding event interface is then located by analyzing the source code of the PBFT algorithm to comprehend the common code-writing standard formats. According to the paper, blockchain and machine learning technologies can be combined to create secure, effective, and long-lasting real-time systems [38]. It offers an overview of the theory and cutting-edge methods used to address big data and security issues in various application domains, including healthcare, intelligent transportation, e-commerce, and IoT. Using intelligence analysis to predict and prevent financial risks and advancing information science is indispensable in today’s financial and digital world. Blockchain is a powerful tool that can be utilized due to the intricate nature of its structure, in which data segments are combined chronologically as a sequential chained structure. It also operates as a distributed ledger, utilizing cryptographic techniques to ensure the data is reliable and tamper-proof. An investment and financing model for smart rural e-commerce based on blockchain and data mining is presented in [39]. According to [40], a student management model that integrates e-commerce is based on blockchain technology and maintained by tokens. This approach employs blockchain for storing student data and giving prizes in Bitcoin, which speeds up and accurately verifies students’ accomplishments.

The paper [41] proposes a system that is based on blockchain for e-commerce platforms that protects user privacy and permits cross-platform reputation access and anonymous and private ratings to protect users’ identities and fend off multiple rating attacks. The work [42] offers a methodology that aims to inspire further research into the possible effects of blockchain on e-commerce. They also show the potential impact of blockchain on various electronic commercial components in these areas of interest, where the four main areas include technology, law, organizational and quality obstacles, and customer concerns. Owing to blockchain’s decentralized nature, we can expedite the verification of product grading. Accordingly, paper [43] proposes a trustworthy and secure blockchain-based PGS (BPGS) to handle big data for this business model so that attacks cannot be successful unless more than half of the merchants and e-commerce companies in this alliance are simultaneously compromised.

Discussing the impact of various AI techniques

E-commerce has witnessed an increase in using advanced natural language processing (NLP), ML, artificial intelligence (AI). AI-based technologies, called recommender systems (RSs), can propose products that a potential customer would find interesting or useful. The authors of the work in [44] examines the effects of focused immersion, temporal dissociation, and curiosity separately on the Continuous Use Intention of Recommender Systems. Article [45] addresses AI applications in e-commerce, including the usage of chatbots and AI assistants, smart logistics, and recommendation engines. It offers case studies on how Amazon, JD.com, and Rakuten employ AI to advance their companies. The paper [46] offers a theoretical basis for establishing reasonable government subsidy policies, formulating workable production and operational plans for businesses, and maximizing profits in e-commerce environments. The main focus of [47] is examining the aspects that contribute to the quality of service, within 5th generation communication networks, for e-commerce purposes.

In order to enhance the service standards for e-commerce applications operating over 5G networks the authors propose the utilization of enhanced data communication methods. A study referenced as [48] outlines the methods through which AI is leveraged in the realms of business administration, online commerce and financial services. The writers provide an insight into the application of AI technologies within these sectors to improve decision making processes and optimize business operations. The establishment of a supply chain framework for B2C e-commerce businesses involved with trade is addressed by [49]. By enhancing the distribution process of e-commerce supply chain structure, writers successfully boosted efficiency of distribution logistics within the realm of trade. The report from the OECD mentioned at [50] talks about the growing adoption of intelligence (AI), within the sector especially concerning credit assessment, asset supervision, algorithmic trading and finance based on blockchain technology. An automated technique for categorizing HTML forms is showcased in reference [51]. This method is significant for applications of search engine, such as Yahoo Shopping and Google’s Froogle, as it may be employed for enhancing index quality and precision of search results. The work in [52], based on data from Alibaba mobile e-commerce platform, suggests an upgraded deep forest model and the interactive behavior characteristics of users and commodities are added to the original feature model to forecast e-commerce customer repurchase behavior. An information fusion and ensemble learning-based approach for predicting the users’ purchase behavior is proposed in [53]. To prevent the departure of prediction outcomes, they improved the model by choosing ten distinct model types as base learners and changing pertinent parameters, particularly for each.

3. Problem statement and dataset

E-commerce is now a vital component of the global economy. It has allowed consumers to shop conveniently and businesses to access new markets. However, new challenges have emerged due to the huge scale of online transactions. For instance, while e-commerce platforms can access tons of data about customers’ behaviors and sentiments, businesses cannot analyze this information to make better decisions. Simply offering a broad range of products is not enough, nor is it presenting basic recommendations to customers. In other words, businesses rely on conventional methods that examine past customer behaviors without attempting to predict future preferences or behaviors effectively. Clearly, understanding customer behaviors and sentiments through comprehensive analysis provides businesses with valuable insights into grasping customers’ behavior, making it crucial for business growth and enhancing customer recommendation systems. Current models seem to fall short in predicting whether a product is recommended only on sentiments in reviews.

The problem stems from the fact that this data, despite its huge size, is underutilized, wasting opportunities for effective improvement of many domains including decision making in e-commerce. To cover this gap, this research looked at the relationship between customer sentiment and product recommendations. For our research undertaking, we employed several ML algorithms such as Logistic Regression, Naive Bayes, SVM, Random Forest, and AdaBoost, to predict products based on sentiment with greater accuracy. We then compared the performance of the algorithms-based models with the advanced GRU and Bidirectional LSTM deep learning architectures.

For conducting our experiments and analysis of customer sentiments of the products, we used Kaggle’s Women’s E-Commerce Clothing Reviews dataset. The dataset is rich in features and information pertaining to e-commerce research, including product details and description, purchase history, and customer reviews, as well as the items’ attributes such as category, type, and brand. It also contains numerical ratings based on size, fit, and quality. The textual information about the customer opinions, recommendations, and sentiments in the reviews, and the numerical ratings are the foundations for our sentiment analysis. It provides a better understanding of the overall customer experience. With these features and data, we were able to investigate different aspects of sentiment analysis, recommendation systems, and utilizing machine learning approaches within the context of e-commerce systems powered by IoT real-time data.

This study aims to create a comprehensive system which exceeds conventional recommendation-based approaches. By merging ML algorithms with IoT data and analyzing sentiments of products in customer reviews, our approach not only examines reviews but also predicts customers’ future behaviors and preferences with greater accuracy. It also allows businesses make more informed decisions and improve performance in general, and in turn, enhance personalized customer experiences which ultimately leads to sustainable advantages in a competitive market.

4. Methodology

This section outlines the methodology used in this research. It starts with research design and then proposes a novel approach to addressing the problem. The working group is then introduced, and finally, the process of model training and evaluation is detailed.

Research design

This subsection introduces the system model and defines the problem precisely before proceeding with the methodological details.

The research aims to develop a system model that integrates machine learning algorithms with IoT data into one combined model for intelligent decision-making in e-commerce. It is supposed to conduct sentimental analysis of customers’ reviews for giving active insights to businesses. In this work, we analyze the appropriateness of the ML algorithms logistic regression, Naïve Bayes, SVM, Random Forest, and AdaBoost, taking the “Women’s E-Commerce Clothing Reviews” dataset to decipher the sentiment-driven recommendations by customers. The process is rather straightforward: IoT devices track real-time customer interactions that, upon integration with historical data, feed machine learning models. The integration of all these will enable business-to-consumer forecasting of trends, optimization of inventory, and personalized product recommendations. Performance metrics used precision, recall, F1-score, and accuracy—all with a focus on AdaBoost, which generally performed better compared to other techniques. This study will integrate IoT-driven data collection and ML algorithms to provide an effective framework for real-time decision-making in e-commerce.

Sentiment analysis thus involves a number of components in the system model to make the process robust and effective. Only a cleaned and prepared raw dataset, preprocessed by various data preprocessing techniques, is first needed and then the textual content must be transformed into numerical representations using suitable feature extraction methods, in order to prepare them as input for ML or DL models. The system then proceeds to selecting, training, and testing the models’ phase to determine the most effective algorithm to use for sentiment analysis. We aim, by means of this system model, to scale up the increasingly important area of sentiment analysis and empower businesses to obtain useful insights into different customers’ opinions and recommendations. Through this approach of combining ML and DL, we hope to offer businesses a complete and precise look into product sentiments, enabling them to make appropriate data-driven decisions, leading in turn, to better product offerings, and improved customer satisfaction and loyalty.

Proposed approach

In order to carry out sentiment analysis and set the stage for this study, our approach involved few systematic stages working on the “Womens E-commerce Clothing Reviews” dataset and the system models. These steps include data preprocessing, feature extraction, model selection, model training, and evaluation. We, then, compare performance of machine learning techniques with that of deep learning approaches to assess model suitability for sentiment analysis tasks. We describe each stage in the following subsections: data preprocessing and feature extraction; model selection; model training and evaluation; and predictions and recommendations. Fig 1 depicts the proposed approach and methodology.

Data preprocessing and feature extraction

Data preprocessing is a data preparation step that involves cleaning and handling of missing data values. It is required for ensuring integrity and accuracy of data for use in machine-learning models. Therefore, textual data in reviews underwent several steps, such as deleting punctuation, changing the case of text to lowercase, and removing of stop words. For the handling of missing values, we employed complementary imputation methods to maintain the integrity and completeness of the data. This was necessary for reducing noise and enhancing the quality of the input data because textual data are inherently unstructured and contain forms of noise that must be eliminated before the analysis is performed.

Initially, we standardized the names of the columns to ensure consistency. Upon analyzing the proportion of the target class variable “recommended_ind,” the target class variable was imbalanced, indicating that “Recommended” values were more dominant than “Not Recommended”.

We selected the Recall score as a metric for evaluating the model performance because it is more suitable for evaluating imbalanced datasets than the Accuracy score, which is known for evaluating balanced target class variables. Recall score effectively measures sensitivity and provides better insights into the model’s capabilities to identify instances of the minority class, which is typically the class of interest in imbalanced datasets.

Based on the examination of the target class variable “recommended_ind” and other features, unnecessary columns and irrelevant features were removed.

The preprocessing of text consisted of three main steps: tokenization, noise removal, and lexicon normalization.

Tokenization: It is the process of breaking up a written text/document or a phrase into smaller components, which could be words or phrases. Each of these smaller units is known as a token.

Noise Removal: Any piece of text that is irrelevant to the context of the text and the end output can be specified as noise, such as language stopwords (is, am, the, of, in, etc.), URLs or links, upper and lower case differentiation, and punctuation.

Lexicon Normalization: This among converts all disparities of a word into its normalized form.

Several user-defined functions were available for tokenization, noise removal, and lexicon normalization. We opted to implement the first option (the tokenization function). Word clouds were then generated for the reviews to represent the most common words in each target class. A word cloud is a data visualization method used to represent text data, where each word’s size corresponds to its frequency or importance. Thus, significant textual data patterns are highlighted.

Our objective was to generate separate word clouds for positive and negative reviews. The review was classified as positive or negative based on its recommended status. This process involved three steps: 1) detection of positive and negative reviews, 2) collection of positive and negative words separately, and 3) creation of word clouds.

Before proceeding to the modeling stage, we conducted vectorization and a train-test split as part of data preprocessing. Vectorization is converting text documents into numerical formats (vectors) which are typically required as inputs for machine learning algorithms to process and analyze the data. Therefore, each text document was converted into a numerical vector of word counts (feature vector), resulting in review text columns consisting of tokens without punctuation and stopwords.

For the extraction of numerical features, we used the CountVictorizer tool to factorize the tokenized review texts into a 2D matrix of token counts, basically representing the frequency of words across all the reviews. Each row of the matrix represented a unique word, while each column represented a review.

In order to convert the text files into numerical feature vectors, we employed the Bag-of-Words (BoW) model, a commonly used approach for text analysis in natural language processing (NLP) and text-based machine learning. The model operates by representing the text as an unordered collection of words, emphasizing word frequencies while ignoring grammar, word order, syntax, and context, which may limit the model’s suitability for tasks requiring semantic understanding. To address some of these limitations, we also used Term Frequency-Inverse Document Frequency (TF-IDF), a more sophisticated technique that weighs words on their frequency within and across documents.

How CountVectorizer works?.

Count Vectorization is a simple representation of word frequencies without the need to differentiate the importance of words.

  • Tokenization: The text is divided into individual words or tokens.
  • Vocabulary Creation: A vocabulary of all the unique words in the text corpus is established.
  • Counting Word Occurrences: Each document is then represented as a vector, with each element corresponding to a word in the vocabulary. Each element’s value indicates how frequently that word appears in the text.
  • Normalization: Optionally normalizing word frequencies using factors such as document length to account for document size variations.

How TF-IDF works?.

TF-IDF considers both local and global information, calculates how important is a term or a word in a document relative to its importance across a collection of documents, and helps identify important and discriminative words or phrases. Hence, it is more complex and informative than Count Vectorization and more suitable for tasks requiring a nuanced understanding of text, such as information retrieval or document classification. In many cases, TF-IDF is preferred over simple Count Vectorization because it provides more informative features for various NLP applications.

After splitting the data into a training set of 80% and a test set of 20% and creating a numerical feature vector for each document and before diving into modeling, we created a user-defined function for comparing models at the end.

Finally, we trained all the models using TF-IDF and Count Vectorizer.

5. Model selection

In this section we outline the model selection process for sentiment analysis in our study. The process involved exploring several ML algorithms based models, including Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, AdaBoosting, and a deep learning approach using Gated Recurrent Units (GRU), for sentiment analysis. We considered the performance of these algorithms in binary classification tasks using our dataset in particular, and further looked into their strengths and weaknesses.

Logistic regression

To achieve our recommendation status prediction challenge, logistic regression has been chosen due to the ease of use it provides, interpretability, and computational effectiveness. This technique is commonly used to resolve binary classification issues. The intuitive assumptions and mathematical tractability contribute to the simplicity of usage of logistic regression. The logistic function effectively maps any genuine input to a probability between 0 and 1, reflecting the connection between independent variables, or features, and the binary result. The probability indicates the possibility that an occurrence will belong to a specific class. The model’s coefficients allow easy explanation and practical knowledge extraction by presenting insightful information about how each independent variable influences the outcome. In instances when investors need to know exactly how choices are made, this directness is really helpful.

Naïve bayes

Naive Bayes is a probabilistic classification algorithm initiated on the foundation of Bayes’ theorem and the hypothesis of feature independence. It’s been selected for being renowned for its simplicity, efficiency, robustness, interpretability, and effectiveness with text classification problems. The assumption states that if a feature is present or not, it has no effect whatsoever on the existence or absence of other features within the same class. This means that Naive Bayes algorithm, in text classifications and when making predictions, considers each word or term in a document as independent of all the others. This simplification reduces the complexity of the model, which might not accurately reflect features’ dependencies among each other, yet the algorithm has often demonstrated impressive performance in text classification problems.

Support vector machine (SVM)

Due to its brilliant ability in managing high-dimensional data and finding complex patterns that may be missed by other models. SVM, a supervised machine learning algorithm, has been chosen to be added to our selected algorithms. It is commonly known for its prevailing abilities in solving classification and regression problems, precision, effectiveness, and being less influenced by overfitting. SVM is well known for its ability to find the data points (support vectors), that sit closest to the boundary between different classes. These points are key because they help define the best possible dividing line (hyperplane) between categories. By focusing on just these crucial examples, SVM does an excellent job of maximizing the space between classes, which means there’s less likelihood of data overfitting, and promotes generalization to fresh untested data. This ability is especially useful when the data relationships are intricate and not immediately obvious.

Random forest

Complex patterns with nonlinear behavior may be difficult to model using conventional linear models, especially when dealing with intricate nonlinear interactions in data. For that reason and seeking to enhance the overall precision of the results, Random Forest, which is a collective learning method and a widely preferred option, was used for the purpose of addressing nonlinear correlations. Moreover, Random Forest is mainly used to overcome this downside by using a selected group of decision trees that reflect a particular aspect of the nonlinear interactions in the data. The selected trees then share an algorithm that successfully captures complicated relationships for more accurate predictions.

Ada boosting

Ada Boosting, which is another collective learning technique, adds more weight to misclassified samples to help upcoming weak learners focus on those instances, make more precise predictions, and address imbalanced datasets to reduce bias. Within the same goal, we also examined the efficiency of a DL algorithm, such as a Recurrent Neural Network (RNN) or Long Short-Term Memory (LSTM) network, for sentiment analysis. DL models possess the capacity to distinguish complex patterns and dependencies in textual data, which will potentially boost the accuracy of sentiment analysis predictions.

Deep learning with gated recurrent unit

One type of RNN architecture is the Gated Recurrent Unit (GRU). It is mainly used in deep learning to process sequential data. GRUs are designed to address many problems in the RNNs, but it is specifically used to address the vanishing gradient problem in standard RNNs. Accordingly, we selected GRUs because of their effectiveness in training and performance in several tasks such as NLP and time series prediction. By leveraging GRUs, our goal is to enhance the accuracy of text classification accuracy within the complex structure of e-commerce domain.

Model training and evaluation

This section explains the methodology for training the machine learning models and evaluating their performance in the following steps:

  1. Dataset Splitting:
    The dataset was divided into a training set (80%) and a testing set (20%) to train our machine-learning models.
    The training data were used to train the models to learn the relationships between the text features and target variable (recommendation status).
    The testing data were used to evaluate the performance of the models by measuring the accuracy, precision, recall, and F1-score metrics. The equations (1), (2), (3), and (4), show the accuracy, precision, recall, and F1-score of the models, respectively.
  2. Performance Metrics:

The following equations were used to compute the performance metrics:

Accuracy measures the proportion of correct predictions (true positives and true negatives) to the total predictions:

(1)

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.

Precision quantifies the proportion of true positive predictions among all positive predictions:

(2)

Recall (or Sensitivity) indicates the proportion of actual positives correctly identified:

(3)

F1-Score is the harmonic mean of precision and recall, balancing their contributions:

(4)
  1. Cross-Validation for Robustness:

For robustness and minimizing overfitting of the models, we used k-fold- (cross-validation technique). The model is iteratively trained on various subsets (folds) of the training data while its performance is assessed on the remaining fold in k-fold cross-validation,

With our approach we try to enhance the accuracy of predictions and determine recommendation status of the products using sentiment analysis in customer review--using Women’s E-Commerce Clothing Reviews dataset. We evaluate the performance of the different ML and DL models for sentiment analysis to identify the most effective method in the context of e-commerce clothing reviews.

Prediction and recommendations

This section presents our predictions and recommendation system and its evaluation.

Models can predict the recommendation status of new or unseen reviews just after their training and evaluating phases.

The models analyze the sentiment expressed in the textual content, in the reviews, and provide recommendations based on the patterns learned from the training phase.

The system can generate recommendations for individual products, allowing users to make well-informed choices based on the sentiment analysis results.

The system model described here provides a framework for sentiment analysis on the Women’s E-Commerce Clothing Reviews dataset, incorporating machine learning algorithms and deep learning techniques. Through the integration of data preprocessing, feature extraction, model selection, training, evaluation, and prediction, the model enables the identification of the recommendation status based on the sentiment expressed in the reviews, empowering users with valuable insights for their decision-making.

6. Numerical results

In this section, we attempt to evaluate the performance of Logistic Regression, Naive Bayes, SVM, Random Forest, Ada Boosting, GRU and LSTM.

Logistic regression

With count vectorizer.

Below, the section sheds light on the performance of Logistic Regression. Table 1 and 2 shows a statistical account of the recommendation classification using Logistic Regression with count vectorizer, while Fig 2 explains the confusion matrix of Logistic Regression. These metrics offer understanding of the performance of the model by predicting the recommendation status of the products.

thumbnail
Table 2. Statistical report on recommendation classification using logistic regression with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t002

The following table (Table 1) demonstrates some results from the training and testing sets.

In the Test set

Precision: The model accomplished a precision degree of 0.61 for class 0 (not recommended) and 0.95 for class 1 (recommended). This shows that when a product is predicted by the model as not recommended (class 0), it is correct 61% of the time; whereas when it predicts a product as recommended (class 1), it is 95% of the time correct.

Recall: A recall of 0.80 for class 0 and 0.89 for class 1are achieved by the model. Based on these results, the model correctly identified 80% of the products that were not recommended (class 0) and 89% of the products that were recommended (class 1).

F1-score: Based on the results, the F1-score proved to be the harmonic mean of accuracy and recall. The model reflected an F1-score of 0.69 for class 0 and 0.92 for class 1. The F1-score considers both precision and recall, providing a reasonable evaluation of the effectiveness of the model, the connections between the features and the target variable. The total accuracy of the model on the test set is 0.87, indicating that it correctly projected the recommendation status for 87% of the products.

In the Training set

Precision: The model’s precision on the training set is 0.72 for class 0 and 0.99 for class 1. This proposes that the model is relatively precise in predicting not recommended products (class 0) and highly precise in predicting recommended products (class 1).

Recall: The model’s recall on the training set is 0.96 for class 0 and 0.92 for class 1. This signifies that the model efficiently captures a high percentage of not recommended products (class 0) and recommended products (class 1).

F1-score: The F1-score is 0.82 for class 0 and 0.95 for class 1, signifying a good stability between precision and recall for both classes.

In conclusion, strong performance was displayed by the model on both the training sets and the test. It proved high precision and recall for both classes, with good F1 scores, representing a balance between precision and recall. The total precision on the test set was also notable, demonstrating its ability to foresee product recommendations with a high degree of accuracy.

Logistic regression with TF-IDF vectorizer

The Logistic Regression model, as presented in Table 3 with its confusion matrix depicted in Fig 3, trained with TF-IDF Vectorization reveals a strong action in classifying text data. Based on the precision score that is approximately 0.860 indicates that the model correctly predicts the class labels for a significant portion of the dataset. Recall-0, or the model’s ability to correctly identify true negatives, is approximately 0.846, which suggests effective detection of negative instances. Recall-1, representing the model’s ability to capture true positives, is approximately 0.863, indicating a good performance in identifying positive cases.

thumbnail
Table 3. Statistical Report on Recommendation Classification using Logistic Regression with TF-IDF Vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t003

thumbnail
Fig 3. The confusion matrix of LR with TF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.g003

Naïve bayes

With count vectorizer.

This section presents the performance of Naive Bayes. A statistical analysis of the recommendation classification using Naive Bayes is presented in Table 4. Fig 4 shows the confusion matrix of Naive Bayes.

thumbnail
Table 4. Statistical report on recommendation classification using naive Bayes with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t004

thumbnail
Fig 4. The confusion matrix of Naive Bayes with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.g004

The Naive Bayes model, when applied with Count Vectorization, demonstrates a strong performance in classifying text data. The accuracy score, standing at approximately 0.883, indicates that the model correctly predicts class labels for a significant portion of the dataset.

Recall-0, showing the model’s ability to correctly identify true negatives, is approximately 0.748, suggesting effective detection of negative instances. Recall-1, indicating the model’s ability to capture true positives, is about 0.912, showing good performance in identifying positive cases.

With TDF-IDF vectorizer.

The Naive Bayes model, as shown in Table 5 with the confusion matrix in Fig 5, when used in conjunction with TF-IDF Vectorization, demonstrates commendable performance in text classification tasks. The accuracy score of nearly 0.877 signifies that the model correctly predicts the class labels for a sizable percentage of the dataset.

thumbnail
Table 5. Statistical report on recommendation classification using Naive Bayes with TF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t005

thumbnail
Fig 5. The confusion matrix of Naive Bayes with TF-IDF vectorization.

https://doi.org/10.1371/journal.pone.0326744.g005

Recall-0, showing the ability of the model to correctly identify true negatives, is approximately 0.717, suggesting the effective detection of negative cases. Recall-1, indicating the model’s ability to capture true positives, is approximately 0.912, showing good performance in identifying positive cases.

Support vector machine (SVM)

With count vectorizer.

This subsection presents the performance of the SVM. A statistical analysis of the recommendation classification using SVM is presented in Table 6. Fig 6 shows the confusion matrix of the SVM.

thumbnail
Table 6. Statistical Report on Recommendation Classification using SVM with Count Vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t006

thumbnail
Fig 6. The confusion matrix of SVM with count vectorization.

https://doi.org/10.1371/journal.pone.0326744.g006

The SVM model, when utilized with Count Vectorization, delivers strong performance in text classification tasks. The accuracy score is approximately 0.865. Recall-0 is approximately 0.831, suggesting effective detection of negative cases. Recall-1, is around 0.873, demonstrating excellent performance in identifying positive cases. In summary, the SVM model with Count Vectorization proves itself to be a robust choice for text classification tasks.

With TDF-IDF vectorizer.

The SVM model, represented by the confusion matrix in Fig 7 and performance in Table 7, when employed with TF-IDF Vectorization, demonstrates notable performance in text classification tasks. The accuracy score of approximately 0.843 reflects the model’s ability to correctly predict class labels. Recall-0 is approximately 0.858, suggesting effective identification of negative cases. Recall-1 is approximately 0.840, demonstrating a good performance in identifying positive cases.

thumbnail
Table 7. Statistical report on recommendation classification using SVM with TDF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t007

thumbnail
Fig 7. The confusion matrix of SVM with TF-IDF vectorization.

https://doi.org/10.1371/journal.pone.0326744.g007

Random forest (RF)

With count vectorizer.

The Random Forest model showcases a commendable accuracy of approximately 0.85, Table 8 and confusion matrix in Fig 8, demonstrating a competitive performance similar to that of the SVM and Logistic Regression models. Notably, this model may exhibit a marginally lower precision and F1-scores when predicting products that are not recommended. Conversely, it displays heightened precision in predicting the recommended products. The optimal model selection depends on the particular objectives and requirements of the application. This necessitates trade-offs careful consideration among precision, recall, and accuracy.

thumbnail
Table 8. Statistical report on recommendation classification by random forest with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t008

With TDF-IDF vectorizer.

The accuracy score, reaching approximately 0.836, demonstrates the model’s proficiency in correctly classifying a significant portion of the dataset, Table 9 and Fig 9 show the results and the Confusion matrix of RF with TF-IDF Vectorizer respectively.

thumbnail
Table 9. Statistical report on recommendation classification by RF with TDF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t009

Recall-0 is approximately 0.792, highlighting the model’s effectiveness in capturing actual negative cases. Recall-1 is approximately 0.846, indicating good performance in identifying positive cases. The Random Forest model with TF-IDF Vectorization offers a strong combination of precision and recall for both negative and positive classes.

Ada boosting

With count vectorizer.

The AdaBoosting model, presented in Table 10 and Fig 10, did a pretty solid job, hitting an accuracy of 88%, which puts it right in line with some of the other models we tested. Beyond just the overall accuracy, it handled precision, recall, and F1-scores quite well. The model’s recall-0 is 0.569 by which we can say that true negatives cases can be identified. The model showed very high performance in identifying positive cases (Recall-1) pulling off an impressive score of 0.947, indicating its strength in capturing true positive examples.

thumbnail
Table 10. Statistical report on recommendation classification using ada boosting with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t010

thumbnail
Fig 10. The confusion matrix of ada boosting with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.g010

With TD-IDF vectorizer.

When we switched to using the TF-IDF vectorizer, the AdaBoosting model was still strong but had a slightly lower accuracy of 87.2%, Table 11 and Fig 11. The recall for negative cases (0.568) was almost the same, indicating there’s still some room for improvement in spotting negatives. Again and just like before, the model performed well with positive cases, with Recall-1 coming in at 0.939. Overall, this setup—AdaBoosting with TF-IDF—turned out to be a well-rounded option for text classification. It did an especially great job with precision and recall for the positive class, though there’s a bit of room for fine-tuning how it handles negatives.

thumbnail
Table 11. Statistical report on recommendation classification via ada boosting with TDF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t011

thumbnail
Fig 11. The confusion matrix of ada boosting with TF-IDF vectorization.

https://doi.org/10.1371/journal.pone.0326744.g011

Gated recurrent unit (GRU)

The Gated Recurrent Unit is essentially a recurrent neural network (RNN) architecture that has gained popularity for its ability to effectively capture sequential patterns while mitigating some of the vanishing gradient problems associated with traditional RNNs. In this research, the GRU model plays a pivotal role in text classification tasks within the e-commerce domain. Some key features and characteristics of the GRU model are as follows:

  1. Sequential Data Processing: GRUs are well suited for processing sequential data, such as text. They can capture the dependencies and relationships between words/tokens in a sentence, which is critical for sentiment analysis and product-review classification.
  2. Gating Mechanism: The distinguishing feature of the GRU is its gating mechanism. It utilizes two gates, the reset gate and the update gate, which control the information flow within the network. This gating mechanism enables the model to selectively retain or discard information from previous time steps, allowing for the capture of long-range dependencies.
  3. Less Prone to Vanishing Gradient: Compared to traditional RNNs, GRUs are less prone to the vanishing gradient problem. This means that they can effectively capture information from distant time steps in a sequence without losing too much information during training.
  4. Parallelization: GRUs are computationally efficient and can be parallelized, making them suitable for training on large datasets.
  5. Text Classification: The GRU model is applied to text classification tasks, such as sentiment analysis and product review classification. It excels at learning representations from textual data and making predictions based on the learned features.
  6. Hyperparameter Tuning: Similar to any DL model, the GRU model performance can be influenced by hyperparameters, such as the batch size, number of hidden units, and learning rate. Tuning these hyperparameters is a crucial part of optimizing the model’s performance.

Overall, the GRU model in this study highlights the commitment to leveraging advanced deep learning techniques to tackle complex problems in e-commerce, particularly in the realm of natural language processing and text classification. Its ability to understand and interpret textual data provides valuable insights for businesses seeking to automate processes and make data-driven decisions in the e-commerce sector.

In this section, we attempt to evaluate the performance of the GRU Deep Learning model. Table 12 shows the statistical analysis, using the GRU model, of recommendation classification. A precision-recall curve with illustrations of accurate and inaccurate predictions is depicted in Fig 12. The precision-recall curve is used here to evaluate the performance of classification models, especially when dealing with imbalanced datasets. The accurate and inaccurate predictions provided by the deep learning model are displayed. In contrast to off-diagonal elements, which signify incorrect predictions., diagonal elements represent more accurate predictions. In this case, the model correctly predicted 3252 instances of class 0 (not recommended) and 14190 instances of class 1 (recommended). This being said, it made 29 false-positive predictions (predicted as not recommended but actually recommended) and 641 false-negative predictions (predicted as recommended but actually not recommended).

thumbnail
Table 12. Statistical report on recommendation classification using deep learning GRU.

https://doi.org/10.1371/journal.pone.0326744.t012

thumbnail
Fig 12. Precision-recall curve for the deep learning model GRU.

https://doi.org/10.1371/journal.pone.0326744.g012

Precision, recall, and F1-score are the metrics used to further evaluate the performance of the model. Precision: Class 0 (not recommended) has a precision of 0.84, meaning that 84% of the time, the model is correct when predicting that a product is not recommended. Class 1 (recommended) has a precision of 1.00, meaning that when the model forecasts a product as recommended, it is 100% accurate. Recall: For class 0 the recall is 0.99, translating to 99% of the not-recommended products are accurately identified by the model, and recall of class 1 is 0.96, suggesting that 96% of the recommended goods are correctly identified by the model. F1-score (the harmonic mean of recall and precision) is 0.91 for class 0 and 0.98 for class 1. These results can be translated as the classes (0 and 1) have a fair mix of precision and recall. The overall accuracy of the model is 0.87 rendered as accurate 87% in predicting recommendation status for 87% of the products.

The Deep Learning Model GRU scored exceptional precision, recall, and F1-scores, rendering, in aggregate, high performance for both classes (0 and 1) and highlighting its strong predictive ability for product recommendation status.

LSTM model

In the following, the section gives an overview of the LSTM model performance. We describe the statistical account of the recommendation classification using LSTM model presented in Table 13. The model’s confusion matrix is shown in Fig 13. The results of the LSTM model indicate how it performed in predicting the recommendation status of products. For detailed analysis of the performance, we, again, compare it with machine learning algorithms using the same metrics, namely, precision, recall, and F1-score.

thumbnail
Table 13. Statistical report on recommendation classification using bidirectional LSTM.

https://doi.org/10.1371/journal.pone.0326744.t013

Precision: For class 0 (not recommended), the LSTM model scored a precision of 0.70 while did even better for class 1 (recommended) achieving 0.92; translating to being correct 70% of the time when predicting “not recommended” products, while it is correct 92% of the time for predicting “recommended products”.

Recall: The scores of the model are 0.65 and 0.94 for class 0 and class 1 respectively. In other words, the model can correctly captures a decent percentage (65%) of the “not recommended” products, and a high 94% of the “recommended” products.

F1-score: Class 0 has an F1-score of 0.68, whereas class 1 has an F1-score of 0.93; denoting a good balance between precision and recall for both classes.

We included the LSTM, a recurrent neural network (RNN), in our selected models as a benchmark (reference) for our proposed approaches performances, because of its popularity as a deep learning tool for NLP and speech recognition applications, and its role as a vital component in advances in artificial intelligence and machine learning.

The LSTM Model’s precision, recall, and F1-scores demonstrate its effectiveness in predicting product recommendation statuses

Summary of results and discussion

With count vectorizer.

Using multiple metrics, such as accuracy, precision, recall, and F1 scores for both positive (class-1) and negative (class-0) classes, the performance of various machine learning classifiers was evaluated. Table 14 summarizes the metrics results of the models when used with Count vectorizer.

thumbnail
Table 14. Performance metrics of classifiers with count vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t014

The results of AdaBoosing, highest recall-1 (0.947) and strong F1-1 (0.927), indicate its superior ability in accurately identifying true positives. This becomes very important in the context of e-commerce customer satisfaction, as this makes personal recommendations and demand forecasts effective and meet customer expectations. With a recall-0 value of 0.568, which is relatively low compared to other models, AdaBoost generally showed balanced performance in precision and recall metrics, hence suggesting its reliability in real-world applications.

Achieving the highest F1-0 (0.697) for the negative class, Naïve Bayes also performed competitively. This reflects the ability to identify true negatives accurately. This could be beneficial for detecting customer irrelevant recommendations or dissatisfaction. However, the slightly lower recall-1 (0.912) it presents, if compared to AdaBoost, may limit its efficiency in increasing customer satisfaction.

Logistic Regression and SVM exhibited steady performances. The SVM achieved the highest recall-0 (0.830), which is important for evading false negatives, specifically when distinguishing dissatisfied customers. However, their marginally lower recall-1 values compared to AdaBoost indicate they may not be as valuable in the positive interactions of customers.

While demonstrating balanced metrics, the Random Forest classifier lagged behind AdaBoost and Naïve Bayes in terms of accuracy and recall. This indicates its moderately weaker suitability for the kind of tasks that require high sensitivity, such as customer satisfaction prediction.

Results in the context of E-commerce

The findings here go hand-in-hand with the objectives of the study in improving the decision-making in e-commerce with machine learning and IoT integration. The high accuracy, precision, and recall of class-1 (positive customer interaction), in AdaBoost, make it suitable for many tasks such as personalized recommendations and demand forecasting. AdaBoost’s emphasis on true positives guarantees that businesses can successfully and effectively meet customer need, which as a result, enhances customer satisfaction and achievement of operational efficiency.

Certainly, other models have their strengths, however, our comparative analysis underlines that AdaBoost provides the most inclusive performance for supporting customer-centric strategies, closely supporting the study’s importance of improving customer satisfaction and operational outcomes.

With TF-IDF vectorizer.

Using the TF-IDF vectorizer, the performance of various classifiers was evaluated by utilizing the same metrics: accuracy, precision, recall, and F1 scores. Table 15 presents the summarized results using TF-IDF vectorizer, for easy comparison.

thumbnail
Table 15. Performance metrics of classifiers with TF-IDF vectorizer.

https://doi.org/10.1371/journal.pone.0326744.t015

Analysis of TF-IDF vectorizer results.

  • With the TF-IDF vectorizer, AdaBoost performed the best achieving an accuracy of 87.1%, along with strong F1-1 (0.923) and recall-1 (0.939). This shows its persistent effectiveness in detecting true positive cases, which makes it an excellent choice for customer satisfaction tasks such as personalized recommendations.
  • Closely following AdaBoost with an accuracy of 87.6% and F1-1 of 0.923, Naïve Bayes reflects its stability in balancing accuracy and recall. However, its recall-0 (0.716) implies it might strive slightly with detecting true negatives compared to its performance with the Count Vectorizer.
  • Logistic Regression reflected strong recall-0 (0.846) and F1-1 (0.910), and it indicated its capability to handle both classes efficiently.
  • While still performing reasonably well, Support Vector Machine (SVM) and Random Forest (RF) showed insignificant decline in general metrics when compared to their performance with the Count Vectorizer, especially in recall-1.
.
.

Performance with Count and TF-IDF Vectorizers a Comparison

When altering from the Count Vectorizer to the TF-IDF Vectorizer, the results showed variations in model performance. Particularly:

  1. AdaBoost: AdaBoost depicted strong and consistent performance with both vectorizers, but its recall-0 decreased slightly with TF-IDF (from 0.568 to 0.567). Nevertheless, its recall-1 increased (from 0.947 to 0.939), continuing its suitability for tasks that need sensitivity to true positives, like recommendations.
  2. Naïve Bayes: Naïve Bayes performed marginally better with the Count Vectorizer in terms of F1-0 and recall-0, proposing that the Count Vectorizer better captured characteristics useful for identifying true negatives. Its precision-1 and recall-1, nonetheless, remained strong with both vectorizers, which emphasizes its effectiveness for customer satisfaction insights.
  3. SVM and RF: Both classifiers demonstrated a weakening performance with TF-IDF when compared to Count Vectorizer, especially in recall-1 and F1-1. This indicates that the Count Vectorizer’s simpler frequency-based representation of text better suited these algorithms for the given dataset.
  4. Logistic Regression: across vectorizers, Logistic Regression showed fairly stable performance across vectorizers, which shows a slight trade-off between recall-0 (higher with TF-IDF) and F1-1 (lower with TF-IDF), making it a consistent baseline model for general tasks.

Vectorizer impact.

The contrast in performance between the Count Vectorizer and TF-IDF Vectorizer stems from the way these vectorizers deal with feature representation:

  • Count Vectorizer, which focuses on raw term frequencies, favors common words and larger text samples. This can work better with models like SVM and Random Forest, which rely on more straightforward feature distributions.
  • TF-IDF Vectorizer weighs term frequencies versus their significance across the dataset and, as a result, reduces the weight of common words. This leans to benefit algorithms such as AdaBoost and Naïve Bayes, leveraging the richer semantic differentiation that TF-IDF provides.

In summary, AdaBoost is the most flexible model among the two vectorizers (Count and TF-IDF), according to the study’s findings, which also support the study’s goals of e-commerce decision-making and improving customer satisfaction. These differences in performance highlight how crucial it is to choose appropriate feature extraction methods depending on the particular objectives and dataset properties.

Time complexity of the models.

One of the important issues of applicability of ML and DL models in real-time e-commerce applications is their time complexity. Thereafter, one may discuss the trade-off between model performance and computational cost to choose an appropriate model with respect to business needs.

AdaBoosting:

AdaBoosting is relatively lightweight from a computational viewpoint. Generally speaking, its time complexity can be expressed as O(T × N × D), where T is the number of iterations, N is the number of training samples, and D is the dimensionality of the feature space. It may be appropriate for small datasets, or in situations where super fast deployment is crucial-say updating recommendations during flash sales or promotional events. This itself can be computationally costly on very large datasets by virtue of AdaBoosting’s iterative nature.

LSTM and GRU:

While deep learning models such as LSTM and GRU provide much better accuracy in sequential data, they also incur a much higher computational cost. The time complexity is roughly O(S × H²), where S is the sequence length, and H denotes the number of hidden units in the network. Including sequential data processing adds more latency; hence, they can be used where real-time processing is not required or when the system has powerful computation.

GRU vs. LSTM:

GRU is a relatively simpler architecture, and hence it is marginally quicker than LSTM on both training and inference. However, this comes at the expense of a slight trade-off in accuracy for complex datasets. Businesses operating under time-limited constraints may find more appeal toward GRU than LSTM in real-time applications, which would include live chat recommendations or even services related to streaming.

Other ML Models:

Traditional machine learning models like Logistic Regression, Naïve Bayes, and Support Vector Machines are usually cheaper to compute than deep learning models. Of the three, SVM has the highest because of its inherent reliance on kernel functions, which could scale as O(N²) or worse for nonlinear kernels. Random Forest is an ensemble model and thus has a time complexity of O(M × N log N), where M = no. of trees. RF is very suitable for medium-sized datasets but may be behind AdaBoosting in the case of datasets that need frequent updates.

In this work, AdaBoosting emerged as a feasible solution for a good trade-off between accuracy and computational efficiency. Both LSTM and GRU have higher-order time complexities and have shown excellent performance on complex and sequential data. Thus, both are suitable for batch processing, not for real-time applications. On the contrary, AdaBoosting has low computational complexity and can thus be considered in scenarios that require frequent retraining or model deployment on less powerful systems.

Ethical considerations in real-time IoT and ML applications

Businesses have gained much in informed decision-making by incorporating ML and IoT in e-commerce. Today, large volumes of data regarding consumers are available online. The use of collected and real-time data has helped many businesses gain insights into customers’ shopping habits, buying preferences, and even location and browsing patterns. This information may then be used to provide recommendations specifically tailored to each individual customer in addition to improving business growth, operational efficiency, and practice. That said, this does not come without a cost. Significant ethical challenges resulting from the continuous aggregation of personal data, real-time processing, and analysis are to be addressed. Businesses must ensure the existence of good ethics to maintain customer trust, follow the legal regulatory frameworks, and promote the well-being of society. The companies must clearly and transparently inform their customers about the data collection practices and acquire their well-informed consent [54]. Moreover, adherence to the principle of minimization-data collection limited to what is absolutely necessary for specific applications; mitigates the risks to privacy, and shows concern for ethical data use [55].

Another critical ethical consideration is data analysis fairness. If the ML algorithms are trained on biased or unrepresentative datasets, they will make discriminatory outcomes and might involuntarily propagate injustices in society. A recommendation system that is overly biased toward the tastes of one client segment, for instance, could not sufficiently take into account the requirements of other customer segments, making it unjust to them [56]. For this reason, businesses should routinely scan their machine-learning models for biases of this kind and then correct them. Secondly, automated decision-making needs to be transparent to gain the trust of customers. For instance, providing consumers with an explanation of the reasons behind a product’s suggestion might help them comprehend the system’s logic and reduce the difficulty of miscommunication or distrust [57].

IoT applications add to these challenges due to their real-time nature. Continuous collection and transmission of data expose the systems to various cybersecurity threats, including hacking and unauthorized access. This raises the bar for fighting such threats through implementing advanced encryption protocols and considering established regulations of privacy such as GDPR and CCPA for protecting sensitive data [58]. For example, the end-to-end encryption protocol can protect the data transferred between IoT devices, and from the point of collection to the point of storage. Furthermore, companies must refrain from falsifying results or unfairly influencing consumer behavior by distorting insights obtained from ML and IoT data, since this could damage their brand’s reputation and long-term customer confidence [59]. Environmental and socioeconomic concerns are among the wider ramifications of IoT and ML adoptions, in addition to privacy and equity. The wide spread of IoT devices and the computational intensity of ML models tend to result in greater energy consumption and high amounts of electronic waste. Companies can address such issues by applying sustainable practices, such as investments in energy-efficient hardware and leveraging green data centers [60]. This could also create unequal access to such advanced technologies and further widen the gap between large enterprises and smaller businesses, further constraining market opportunities for the less resourced. Policies that encourage equitable access to IoT and ML tools would foster inclusivity and competition in the marketplace.

In fact, businesses have to balance innovation with responsibility by addressing ethical considerations so that real-time IoT and ML applications respect privacy, foster fairness, and contribute toward sustainable development. Such proactive steps would help in regulatory compliance while consumers would develop a sense of trust in developing a resilient and ethically sound e-commerce ecosystem.

7. Conclusions and future work

In this work, we showed how IoT and ML can be used together to enhance the decision-making process in e-commerce. We demonstrated the applicability of incorporating these technologies within e-commerce to produce valuable information that can positively impact how businesses conduct their operations, provide better customer experiences, and improve their business in general. Initially, we conducted information analysis using several machine learning algorithms on the customer data collected by IoT devices. The goal was to provide actionable insights into businesses about customer behaviors and preferences. Ultimately, businesses will be able to optimize operations, enhance customer experience and personalization, and boost revenue.

We focused on analyzing customer sentiments in reviews using ML and deep learning models and understanding their effect on consumer recommendations. After extensive experimentation, we found that AdaBoost consistently performed best against all other ML and deep learning models in classifying customer recommendations. This superiority can be attributed to the algorithm’s effectiveness in handling imbalanced datasets, which is the case in our sentiment analysis. Using the TF-IDF vectorizer further increased the precision of predictions based on customer sentiment.

To conclude, this study shows how IoT and machine learning, particularly when combined, may completely change the way e-commerce is conducted. As a result, businesses may develop new strategies that improve customer experience, streamline processes and operations, and eventually achieve long-term success in a highly aggressive global market.

Even though this study illustrated how we could leverage IoT and machine learning to improve e-commerce decision-making, several challenges are yet to be resolved. An important avenue for future investigation is the issue of data heterogeneity. IoT devices produce more data than usual from various nodes in various formats, and it is often difficult to integrate those data sources. Future work needs additional robust data cleaning, normalization, and combining methods to maintain data consistency and validity.

Furthermore, additional privacy and security concerns relevant to IoT data should also be examined. Businesses must ensure that real-time customer-collected data, especially sensitive information, is safe from breaches and misuse. Federated learning, a machine learning technique used for privacy-preserving, may be investigated to mitigate these risks. More sophisticated deep learning models to boost predictive accuracy will be considered in the future; for example, the iterations of deep learning architectures, like transform-based models, have opened a wealth of possibilities for improving recommendation systems within e-commerce.

References

  1. 1. Mahesh Kulkarni P, Nautiyal B, Kumar S, Medidha R, Rameshbhai Savaliya R, Eknath M. IOT data Fusion framework for e-commerce. Meas: Sens. 2022;24:100507.
  2. 2. Wu X, Liang J. Study on trust evaluation and service selection for Service-Oriented E-Commerce systems in IoT environments. Egypt Inform J. 2023;24(2):257–63.
  3. 3. Kalkha H, Khiat A, Bahnasse A, Ouajji H. Toward a reliable and responsive E-commerce with IoT. Procedia Comput Sci. 2022;198:614–9.
  4. 4. Shouborno S a. I, Mahmud TI, Ishraq N, Ali R, Joy TH, Fattah SA, et al. Complete automation of an E-commerce system with internet of things. 2019.
  5. 5. Liu F, Lv Y, Yang P, Liu Y, Xu Z, Luo J. Innovation of business model for electrical household appliance enterprises to deploy IoT+AI and IoT+5G. In: 2020 International Conference on E-Commerce and Internet Technology (ECIT), Zhangjiajie, China, 2020. 245–7. https://doi.org/10.1109/ecit50008.2020.00063
  6. 6. Kushwaha N, Mahule R, Singh AP, Vyas OP, Singh B. Integration of service oriented WSN and IoT for E-commerce. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2015. 1731–6.
  7. 7. Guo P, Han M, Cao N, Shen Y. The research on innovative application of E-commerce in IoT Era. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 2017. 410–3. https://doi.org/10.1109/cse-euc.2017.263
  8. 8. Singh S, Singh N. Internet of things (IoT): Security challenges, business opportunities & reference architecture for E-commerce. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Greater Noida, India, 2015. https://doi.org/10.1109/icgciot.2015.7380718
  9. 9. Mohamed S, Sethom K. Incorporating RFID technology and IOT into a E-commerce: A step toward intelligent purchasing processes. In: 2020 20th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), Monastir, Tunisia, 2020. 328–32. https://doi.org/10.1109/sta50679.2020.9329337
  10. 10. Shi S. Research on automatic statistics of electronic commerce data based on internet of things. In: 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Changsha, China, 2019. 360–4. https://doi.org/10.1109/icitbs.2019.00095
  11. 11. Sohaib O, Lu H, Hussain W. Internet of things (IoT) in E-commerce: For people with disabilities. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Siem Reap, Cambodia, 2017. 419–23. https://doi.org/10.1109/iciea.2017.8282881
  12. 12. Zulfiker S, Chowdhury A, Roy D, Datta S, Momen S. Bangla E-commerce sentiment analysis using machine learning approach. In: 2022 4th International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 2022. https://doi.org/10.1109/sti56238.2022.10103350
  13. 13. Alam MM, Shome A, Saha S, Mridha MF. BRevML: Classifying bangla reviews for E-commerce using machine learning. In: 2021 International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 2021. 1–6. https://doi.org/10.1109/icsct53883.2021.9642682
  14. 14. Kamal R, Karan A, Arungalai VS. Investigations on E-commerce data for forecasting the efficient promotional platform using supervised machine learning. In: 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), Bangalore, India, 2019. 939–43. https://doi.org/10.1109/rteict46194.2019.9016688
  15. 15. Wisetsri W, Syam E, Alanya-Beltran J, Kulkarni GR, Vardhan Reddy RK, Alam Sheikh MF. Assessing and comparing the role of machine learning (ML) and supply chain management (SCM) towards enhancing E-commerce. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2022. 2574–7. https://doi.org/10.1109/icacite53722.2022.9823586
  16. 16. Liu C-J, Huang T-S, Ho P-T, Huang J-C, Hsieh C-T. Machine learning-based e-commerce platform repurchase customer prediction model. PLoS One. 2020;15(12):e0243105. pmid:33270714
  17. 17. Chabane N, Bouaoune A, Tighilt R, Abdar M, Boc A, Lord E, et al. Intelligent personalized shopping recommendation using clustering and supervised machine learning algorithms. PLoS One. 2022;17(12):e0278364. pmid:36454766
  18. 18. Leung KH, Mo DY, Ho GTS, Wu CH, Huang GQ. Modelling near-real-time order arrival demand in e-commerce context: a machine learning predictive methodology. IMDS. 2020;120(6):1149–74.
  19. 19. Akter S, Islam MdK, Hossain MdN, Rahman M, Boshra SJ. People thoughts prediction using machine learning on online shopping in bangladesh during COVID-19 pandemic. In: 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2022. 1–7. https://doi.org/10.1109/icccnt54827.2022.9984625
  20. 20. Rao H, Zeng Z, Liu A. Research on personalized referral service and big data mining for e-commerce with machine learning. In: 2018 4th International Conference on Computer and Technology Applications (ICCTA), Istanbul, Turkey, 2018. 35–8. https://doi.org/10.1109/cata.2018.8398652
  21. 21. Khan MM. Development of an e-commerce sales chatbot. In: 2020 IEEE 17th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET), Charlotte, NC, USA, 2020. 173–6. https://doi.org/10.1109/honet50430.2020.9322667
  22. 22. Chin S-H, Lu C, Ho P-T, Shiao Y-F, Wu T-J. Commodity anti-counterfeiting decision in e-commerce trade based on machine learning and Internet of Things. Comput Stand Interfaces. 2021;76:103504.
  23. 23. Appendix S: Machine learning–driven e-commerce. Elsevier eBooks. 2022. p. e247–58.
  24. 24. Klimantaviciute G. Customer churn prediction in e-commerce industry. J Mach Learn Res. 2021;1(1):1–14.
  25. 25. Alquhtani SA, Muniasamy A. Analytics in support of e-commerce systems using machine learning. In: 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), Prague, Czech Republic, 2022. 1–5. https://doi.org/10.1109/icecet55527.2022.9872592
  26. 26. Hu Y, Zhou Z, Wang T. Stock price prediction of e-commerce platforms under COVID-19’s influence based on machine learning. In: 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Guangzhou, China, 2022. 431–6. https://doi.org/10.1109/mlise57402.2022.00092
  27. 27. Sarowar MdG, Rahman M, Yousuf Ali MdN, Rakib OF. An automated machine learning approach for sentiment classification of bengali e-commerce sites. In: 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India, 2019. 1–5. https://doi.org/10.1109/i2ct45611.2019.9033741
  28. 28. K A, Pani AK, M M, Kumar P. An approach for detecting frauds in e-commerce transactions using machine learning techniques. In: 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 2021. 826–31. https://doi.org/10.1109/icosec51865.2021.9591720
  29. 29. Sumathi VP, Pudhiyavan SM, Saran M, Kumar VN. Fake review detection of e-commerce electronic products using machine learning techniques. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 2021. 1–5. https://doi.org/10.1109/icaeca52838.2021.9675684
  30. 30. Hossain MdJ, Joy DD, Das S, Mustafa R. Sentiment analysis on reviews of e-commerce sites using machine learning algorithms. In: 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), Chittagong, Bangladesh, 2022. 522–7. https://doi.org/10.1109/iciset54810.2022.9775846
  31. 31. Yeung J, Wong S, Tam A, So J. Integrating machine learning technology to data analytics for e-commerce on cloud. In: 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4), London, UK, 2019. https://doi.org/10.1109/worlds4.2019.8904026
  32. 32. Anvar Shathik R, Prasad K. A literature review on application of sentiment analysis using machine learning techniques. IJAEML. 2020;4(2):41–77.
  33. 33. Verma N, Kulkarni P. Impact of blockchain technology on e-commerce. In: 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA), Uttarakhand, India, 2023. 668–72. https://doi.org/10.1109/icidca56705.2023.10100222
  34. 34. Lai J. Research on cross-border e-commerce logistics supply under block chain. In: 2019 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi'an, China, 2019. 214–8. https://doi.org/10.1109/iccnea.2019.00049
  35. 35. Liu Z, Li Z. A blockchain-based framework of cross-border e-commerce supply chain. Int J Inf Manage. 2020;52:102059.
  36. 36. Li M, Shao S, Ye Q, Xu G, Huang GQ. Blockchain-enabled logistics finance execution platform for capital-constrained E-commerce retail. Robot Comput-Integr Manuf. 2020;65:101962.
  37. 37. Jiang L, Dong K. Credibility modelling of e-commerce networks based on block-chain and massive data mining. In: 2020 Fourth International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 2020. 441–4. https://doi.org/10.1109/icisc47916.2020.9171184
  38. 38. Jebamikyous H, Li M, Suhas Y, Kashef R. Leveraging machine learning and blockchain in E-commerce and beyond: benefits, models, and application. Discov Artif Intell. 2023;3(1).
  39. 39. Huang F. Rural E-commerce investment and financing model based on blockchain and data mining. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2021. 515–8. https://doi.org/10.1109/iccmc51019.2021.9418423
  40. 40. Vo KT, Nguyen T, Ta T-T, Nguyen-Hoang T-A, Dinh N-T. Student management model integrating e-commerce based on blockchain technology. In: 2023 15th International Conference on Computer and Automation Engineering (ICCAE), Sydney, Australia, 2023. 45–9. https://doi.org/10.1109/iccae56788.2023.10111175
  41. 41. Li M, Zhu L, Zhang Z, Lal C, Conti M, Alazab M. Anonymous and verifiable reputation system for e-commerce platforms based on blockchain. IEEE Trans Netw Serv Manage. 2021;18(4):4434–49.
  42. 42. K N, Panduro-Ramirez J, Gehlot A, Barve A, P SC, Ponnusamy R. Blockchain effect on E-commerce: A framework for key research areas. In: 2023 International Conference on Artificial Intelligence and Smart Communication (AISC), Greater Noida, India, 2023. 624–8. https://doi.org/10.1109/aisc56616.2023.10085137
  43. 43. Yang C-N, Chen Y-C, Chen S-Y, Wu S-Y. A reliable e-commerce business model using blockchain based product grading system. In: 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), Suzhou, China, 2019. 341–4. https://doi.org/10.1109/icbda.2019.8713204
  44. 44. Acharya N, Sassenberg A-M, Soar J. Effects of cognitive absorption on continuous use intention of AI-driven recommender systems in e-commerce. FS. 2022;25(2):194–208.
  45. 45. Li Y, Sun Y. The application of artificial intelligence in electronic commerce. In: 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), IEEE, 2019. 60–5. https://doi.org/10.1109/aiam48774.2019.00019
  46. 46. Guo J, Yu H, Gen M. Research on green closed-loop supply chain with the consideration of double subsidy in e-commerce environment. Comput Ind Eng. 2020;149:106779.
  47. 47. Pethuraj MS, Aboobaider BbM, Salahuddin LB, Analyzing QoS factor in 5 G communication using optimized data communication techniques for E-commerce applications, Optik. 2023;272:170333.
  48. 48. Pallathadka H, Ramirez-Asis EH, Loli-Poma TP, Kaliyaperumal K, Ventayen RJM, Naved M. Applications of artificial intelligence in business management, e-commerce and finance. Mater Today: Proc. 2023;80:2610–3.
  49. 49. Qi B, Shen Y, Xu T. An artificial-intelligence-enabled sustainable supply chain model for B2C E-commerce business in the international trade. Technol Forecast Soc Change. 2023;191:122491.
  50. 50. OECD. Artificial intelligence, machine learning and big data in finance. 2021.
  51. 51. Ru Y, Horowitz E. Automated classification of HTML forms on e‐commerce web sites. Online Inf Rev. 2007;31(4):451–66.
  52. 52. Zhang W, Wang M. An improved deep forest model for prediction of e-commerce consumers’ repurchase behavior. PLoS One. 2021;16(9):e0255906. pmid:34543319
  53. 53. Xu J, Wang J, Tian Y, Yan J, Li X, Gao X. SE-stacking: Improving user purchase behavior prediction by information fusion and ensemble learning. PLoS One. 2020;15(11):e0242629. pmid:33237926
  54. 54. European Union. General Data Protection Regulation (GDPR). https://gdpr-info.eu/. 2016.
  55. 55. California Consumer Privacy Act (CCPA). Consumer Privacy Rights. https://oag.ca.gov/privacy/ccpa. 2018.
  56. 56. Wachter S, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv J Law Technol. 2018;31(2):841–87.
  57. 57. Boyd D, Crawford K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc. 2012;15(5):662–79.
  58. 58. Selbst AD, et al. Fairness and abstraction in sociotechnical systems. In: Proc. ACM Conf. Fairness, Accountability, and Transparency, Barcelona, Spain, 2020. 59–68.
  59. 59. O’Neill K. Weapons of math destruction: How big data increases inequality and threatens democracy. New York, NY, USA: Crown. 2016.
  60. 60. Sheth A, Henson C, Sahoo SS. Data semantics for internet of things: Towards intelligent internet-scale applications. In: Proc. IEEE Int. Conf. Data Eng. Workshops, Washington, DC, USA, 2008. 1–9.