Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Multi-feature fusion-based consumer perceived risk prediction and its interpretability study

  • Lin Qi ,

    Roles Conceptualization, Formal analysis, Writing – review & editing

    qilin@bistu.edu.cn

    Affiliations School of Economics & Management, Beijing Information Science & Technology University, Beijing, China, Beijing Key Lab of Green Development Decision Based on Big Data, Beijing, China

  • Yunjie Xie,

    Roles Writing – original draft

    Affiliations School of Economics & Management, Beijing Information Science & Technology University, Beijing, China, Beijing World Urban Circular Economy System (Industry) Collaborative Innovation Center, Beijing, China

  • Qianqian Zhang,

    Roles Investigation, Methodology

    Affiliations School of Economics & Management, Beijing Information Science & Technology University, Beijing, China, Beijing World Urban Circular Economy System (Industry) Collaborative Innovation Center, Beijing, China

  • Jian Zhang,

    Roles Validation, Visualization

    Affiliations School of Economics & Management, Beijing Information Science & Technology University, Beijing, China, Beijing World Urban Circular Economy System (Industry) Collaborative Innovation Center, Beijing, China

  • Yanhong Ma

    Roles Formal analysis, Funding acquisition, Project administration, Software, Supervision

    Affiliations School of Economics & Management, Beijing Information Science & Technology University, Beijing, China, Beijing Key Lab of Green Development Decision Based on Big Data, Beijing, China

Abstract

E-commerce faces challenges such as content homogenization and high perceived risk among users. This paper aims to predict perceived risk in different contexts by analyzing review content and website information. Based on a dataset containing 262,752 online reviews, we employ the KeyBERT-TextCNN model to extract thematic features from the review content. Subsequently, we combine these thematic features with product and merchant characteristics. Using the PCA-K-medoids-XGBoost algorithm, we developed a predictive model for perceived risk. In the feature extraction phase, we identified 11 key features that influence perceived risk in online shopping. During the prediction phase, the model performs excellently across different sample types in the test set, achieving a precision (P) of 84%, a recall (R) of 86%, and an F1 score of 85%. Through the model’s interpretability analysis, we find that quality, functionality, and price are key features affecting perceived risk for electronic products. In the case of skincare products, skin safety is the most critical feature. Additionally, there are significant differences in feature characteristics between high-risk samples and normal samples.

1 Introduction

With the rapid development of Internet technology and the increasing prevalence of e-commerce, online shopping has become an indispensable part of people’s lives. Statistics indicate that global retail e-commerce sales are projected to exceed 6.3 trillion USD by 2024 [1]. While consumers benefit from the convenience of online shopping, they simultaneously face perceived risks stemming from various uncertainties inherent in the e-commerce environment [2]. Excessively high perceived risk not only affects consumers’ purchasing decisions but may also lead to customer attrition, thereby impeding the sustainable development of e-commerce [3].

The perceived risk in online shopping stems from the information asymmetry between offline and online shopping experiences [4]. Unlike traditional offline shopping, consumers in e-commerce environments cannot directly interact with or experience products, making it challenging to comprehensively understand product quality and performance. Furthermore, the recurrence of issues such as online fraud and privacy breaches exacerbates consumers’ perception of uncertainty, leading to an increase in perceived risk [5]. Moreover, the overwhelming volume of product information and complex promotional strategies on e-commerce platforms often lead to confusion and anxiety among consumers during their decision-making process [6]. Consequently, the development of innovative measurement models using big data analytics and machine learning techniques to achieve a comprehensive quantitative assessment of perceived risk has become an urgent issue in e-commerce research.

However, the subjective, multidimensional, and dynamic nature of perceived risk poses significant challenges for accurate prediction in the context of online shopping environments [7]. In recent years, machine learning has gained widespread application in the domain of risk prediction. Through the analysis and learning of massive datasets, machine learning algorithms have demonstrated the capability to automatically identify risk patterns, enabling quantitative risk assessment and early warning mechanisms [8]. However, extant research has predominantly focused on financial domains such as financial risk and credit risk, with relatively limited exploration in the realm of perceived risk prediction [9]. Indeed, the vast repositories of user behavior data and online review content accumulated by e-commerce platforms encapsulate rich consumer perception information. Leveraging machine learning techniques to conduct in-depth mining of these datasets holds promise for achieving precise prediction of perceived risks in online shopping contexts [10]. Furthermore, prior research has employed techniques such as topic modeling and deep learning to analyze online review data, aiming to uncover the dimensions of consumer risk perception across diverse contextual scenarios [11, 12].

Considering the multidimensionality and timeliness of perceived risk, as well as the limited predictive capability of traditional surveys in e-commerce scenarios, this study proposes a comprehensive machine learning approach to predict perceived risk. This method not only effectively combines objective factors and subjective reviews but also addresses the interpretability challenges in practical predictive applications.

2 Literature review

2.1 Dimensional categorization of perceived risk in online shopping

According to Cox, the fundamental assumption underlying perceived risk theory is that consumer behavior is goal-oriented. Perceived risk arises when consumers are subjectively uncertain about which consumption choice (product, brand, etc.) will best satisfy their objectives [13]. Cox posited that perceived risk originates from multiple factors, a perspective that catalyzed academic exploration into the dimensions of perceived risk. Building on Cox’s research, Kaplan proposed a five-dimensional model of perceived risk, encompassing financial risk, performance risk, physical risk, psychological risk, and social risk [14]. Mitchell notes that existing literature mainly categorises perceived risks from the perspectives of physical risk, financial risk, functional risk, and social risk, which lacks consistency. He believes that the dimensions of perceived risk should be dynamically adjusted based on factors such as product type and purchasing context [15].

Compared to traditional brick-and-mortar shopping, online purchasing is characterized by information asymmetry and non-face-to-face transactions [1618]. Crespo et al. developed an extended e-commerce acceptance model and proposed six dimensions of perceived risk in online shopping: financial, social, time, psychological, and privacy risks. Through a comparative analysis of two sample groups, they discovered that the financial dimension exhibited higher significance [19]. Kamalul employed quantitative analysis to test hypotheses through an online survey of 350 respondents. The findings revealed that financial risk, product risk, and security risk exerted significant negative influences on consumers’ online purchase intentions. In contrast, social risk was found to have no significant impact [20]. Based on the consideration of the emotional dynamics of online shopping, Zhang has introduced the importance-Kano model into the fresh produce e-commerce sector, proposing new dimensions such as product quality, delivery service, customer service, and discrepancies in descriptions [21]. These new dimensions reflect the uniqueness and complexity of perceived risk in the context of online shopping. The dimensions of perceived risk are summarized in Table 1.

2.2 Factors influencing perceived risk

In the context of online shopping, the factors influencing consumers’ perceived risk are characterized by diversity and complexity. These factors encompass traditional commercial elements such as product attributes and vendor characteristics, as well as e-commerce-specific elements like online reviews.

At the product level, Zheng et al. employed an ordered logit model to examine the frequency of consumers’ online food purchases. Their study considered various factors, including product attributes and consumer perceptions. The results revealed that product attributes exerted the most significant influence on perceived risk [22]. Chen et al. utilized a panel data regression model to analyze a large-scale dataset of product reviews from Amazon. Their findings indicated that product type is a crucial factor influencing customers’ perceived risk [23]. Wu integrated perceived risk theory, signaling theory, and equity theory to examine the impact of price variations on perceived risk. By manipulating product prices, Wu found that price dispersion positively influences perceived risk. Specifically, larger price differences were associated with higher levels of perceived risk among consumers [24].

At the merchant level, Hong integrated trust theory with the Technology Acceptance Model (TAM) to examine the antecedents of trust, including perceived risk and information quality. The study revealed an inverse relationship between vendor reputation and consumers’ perceived risk: the better the overall reputation of the vendor, the lower the perceived risk among consumers [25]. Chopdar combines signal theory with the stimulus-organism-response (S-O-R) framework and finds through a questionnaire survey that higher merchant transparency correlates with lower consumer perceived risk. This transparency is reflected in merchants’ proactive disclosure of information such as their background and qualifications [26]. Matute et al. investigated the mediating role of trust in the relationship between electronic word-of-mouth (eWOM) and purchase intention. Their findings revealed that enhancing the marketing effectiveness of eWOM can significantly reduce consumers’ perceived risk [27, 28].

At the level of online reviews, Roy et al. demonstrated that the information embedded within reviews exerts a significant influence on perceived risk [29]. Yadav et al. employed the Stimulus-Organism-Response framework to empirically examine the mediating role of perceived risk between online reviews and behavioral intentions. Their findings revealed that online reviews serve as a crucial reference for consumers in making purchase decisions and can effectively mitigate consumers’ perceived risk [30]. Furthermore, Moliner et al. demonstrated that in the context of online shopping, online reviews and word-of-mouth (WOM) emerge as critical factors influencing perceived risk. Their research indicated that positive reviews and WOM can effectively reduce consumers’ perceived risk [31]. To gain a more profound understanding of the impact of review characteristics on perceived risk, Yang et al. integrated Signaling Theory and the Heuristic-Systematic Model (HSM) to analyze various dimensions of reviews, including quantity, valence, and depth. Their findings indicated that a large number of positive and detailed reviews can effectively mitigate perceived risk [32].

In summary, the factors influencing perceived risk in online shopping exhibit complex characteristics across multiple levels and pathways. Factors such as insufficient product information, lack of merchant transparency, and an abundance of negative reviews tend to exacerbate risk perception. The factors influencing perceived risk are summarized in Table 2.

thumbnail
Table 2. Synthesis of research on factors influencing perceived risk in consumer behavior.

https://doi.org/10.1371/journal.pone.0316277.t002

2.3 Measurement of perceived risk

Methods for measuring perceived risk can be categorized into traditional approaches and machine learning based techniques. Traditional methods primarily employ a two-factor model, collecting respondent data through designed scales, questionnaire surveys, or interviews. These methods calculate perceived risk values by computing the two-dimensional relationship between risk dimensions and uncertainty [33]. In the context of e-commerce, the majority of scholars employ multi-item scales to measure consumers’ psychological perception dimensions. The design of these scales primarily draws upon existing literature while adapting to current research objectives. Typically, Likert scales are utilized for measurement [34, 35]. For instance, Bashir et al. developed a perceived risk scale for online shoppers comprising eight risk dimensions and 26 items. The scale underwent rigorous reliability and validity testing, demonstrating high internal consistency and robust construct validity. Results indicated that the scale effectively measures the perceived risk levels of online shoppers [36].

With the advancement of big data and artificial intelligence technologies, the methodologies for measuring perceived risk are evolving from traditional questionnaire-based surveys towards machine learning-driven multivariate data analysis. Lee et al. developed and validated a supervised machine learning model, GSVM (Generalized Support Vector Machine), utilizing physiological data collected from construction workers via wearable devices. The model achieved a prediction accuracy of 81.2% in distinguishing between low and high perceived risk levels, effectively addressing the subjectivity-objectivity issue in perceived risk measurement [37]. Trivedi et al. proposed an enhanced machine learning model based on stacked renowned classifiers, employing feature selection techniques to identify the most significant predictors. Through the analysis of consumers’ online shopping behavior data, they discovered that concerns related to privacy, security, and product quality, among others, contribute to increased perceived risk [38]. Rausch et al. employed various machine learning algorithms to predict online shopping cart abandonment behavior. Their findings indicate that higher levels of financial and time risks significantly increase the likelihood of consumers abandoning their shopping carts. Moreover, the authors discovered that most tree-based methods demonstrate superior predictive power compared to other machine learning approaches [39]. Furthermore, researchers have explored extracting perceived risk information from the vast corpus of consumer generated reviews on social media platforms. By applying text mining techniques to analyze this extensive user-generated content, they have effectively complemented traditional questionnaire-based surveys, enabling more precise measurement of perceived risk [40, 41]. Lin et al. leveraged social media review data and applied the BERT pre-trained language model to learn distributed representations of review texts. This approach enabled precise multi-dimensional prediction and sentiment analysis of perceived risk [42].

From the existing studies, it can be found that online reviews have an important impact on the prediction effect, but the current perceived risk feature mining is insufficient, and the existing studies fail to consider the characteristics of fragmented and spoken review texts, which leads to the low applicability of the traditional measurement models. Secondly, although machine learning models can adaptively acquire features and have been strengthened in prediction performance, the black box models constructed are still lacking in the interpretability of prediction effects.

3 Research methodology

The model mainly includes three stages: feature extraction, ensemble training, and interpretability analysis. (1) Feature extraction phase: Perform numerical mapping of product and merchant data. Use KeyBERT-TextCNN to achieve keyword extraction and topic classification tasks for review texts. (2) Integrated training phase: Use the PCA-K-medoids clustering algorithm to generate risk category labels for the samples. Subsequently, train the XGBoost model for risk prediction. (3) Interpretability Analysis Phase: The interpretability analysis is completed by embedding SHAP values into the output layer of XGBoost. The detailed research framework is illustrated in Fig 1.

3.1 Risk feature extraction based on text topic analysis

3.1.1 Topic classification based on KeyBERT-TextCNN.

In recent years, the BERT model, based on the Transformer architecture, has achieved remarkable success in various natural language processing (NLP) tasks. KeyBERT combines an attention mechanism to identify keywords that are highly relevant and informative to the document’s theme by calculating the semantic similarity between candidate keywords and the document [43]. As shown in Fig 2, KeyBERT maintains stable high performance when handling different languages and types of text.

thumbnail
Fig 2. Comparison of keyword extraction effects.

Extract the top-5 keywords (N = 5) from the keyword sequences output by each algorithm to form the benchmark parameter set.

https://doi.org/10.1371/journal.pone.0316277.g002

3.1.2 Text topic sentiment analysis.

After obtaining the topic keywords from the review texts, the next step is to perform label classification on these texts to identify the thematic categories to which the reviews belong. TextCNN utilizes convolutional layers to extract features from text and achieves text classification through pooling layers and fully connected layers [44]. The specific network architecture is illustrated in Fig 3.

The SnowNLP library offers a pre-trained sentiment analysis model that has been trained on a large-scale Chinese corpus. SnowNLP employs a comprehensive sentiment analysis approach by calculating the sentiment score of each word in the text while considering semantic and contextual information [45]. This method enables SnowNLP to determine not only the overall sentiment polarity of the text but also to provide specific sentiment scores.

3.2 Perceived risk prediction model

3.2.1 Label generation based on PCA-K-medoids clustering.

In risk prediction, obtaining high-quality sample labels is crucial for model training. However, manual labeling of samples is not only time-consuming and subjective, but it may also introduce labeling bias, which can further affect model performance. To address this issue, this study proposes an automated sample labeling generation method based on the PCA-K-medoids clustering algorithm. PCA projects high-dimensional data into a lower-dimensional space through linear transformation, while retaining as much of the data’s informational content as possible, which helps solve issues of correlation and redundancy between indicators [46]. K-medoids clusters data by selecting actual sample points as cluster centers, offering robust performance and high interpretability [47]. The training process is presented in Table 3.

thumbnail
Table 3. Pseudocode for the PCA-K-medoids clustering algorithm.

https://doi.org/10.1371/journal.pone.0316277.t003

3.2.2 Perceived risk prediction using XGBoost algorithm.

The core principle of XGBoost is to iteratively build a series of decision trees to minimize the loss function. Each new decision tree is trained on the residuals of the previous tree, thereby progressively approximating the optimal solution [48]. During the model training process, we employed grid search and cross-validation techniques to optimize the hyperparameters of XGBoost, including the number of trees, tree depth, and learning rate, thereby enhancing the model’s performance and generalization capability. The mathematical model is represented by Eqs (1) and (2) as follows: (1) (2)

In Eqs (1) and (2), i∈(1,2,…,n) represents the sample size, where n is the total number of samples. denotes the predicted value of the model, and xi represents the perceived risk evaluation indicator. The variable t signifies the number of sub-models. wq(x) is the weight vector of all leaf nodes in the XGBoost model. fk represents the weight of the leaf node in the k-th regression tree, and F denotes the ensemble of all regression trees.

4 Results and discussion

4.1 Data sources and preprocessing

JD.com (JD) is one of the most popular Business-to-Consumer (B2C) e-commerce platforms in China. To evaluate the comprehensive service characteristics of merchants, JD assigns an overall score to each merchant based on their recent transaction records. These characteristics encompass various aspects, including transaction disputes, logistics fulfillment, after-sales service, customer service consultation, and store ratings. Given the diversity of product types on the platform and the need to ensure data representativeness, this study has selected Bluetooth headphones, mobile phones, air conditioners, and facial creams as research subjects, based on the following representational considerations: these four categories cover a wide price range from high-end to entry-level, reflecting the purchasing behaviors of different consumer groups; they are widely used in modern life, representing the daily needs and preferences of most consumers; in brand diversity, they include both international and local brands, adding a degree of randomness.

For each category of products, we employed web scraping technology to collect online consumer reviews, product details, and seller characteristic data from the top 1,000 merchants ranked by overall performance. To ensure the validity of the reviews, comments with fewer than 5 characters and invalid reviews were removed. In terms of product information, price and sales volume were extracted as key features. Missing values in merchant characteristics were filled using the mode. After data cleaning and preprocessing, the final dataset includes product information and merchant characteristics from 3474 stores, along with 262,752 online reviews. The relevant data mentioned in the manuscript can be found in S1 Table. Crawling technology and analytical methods comply with the terms and conditions of the data source.

4.2 Analysis results of online review topic features

4.2.1 Results of topic feature extraction from online reviews.

In this study, BERT Embeddings were generated using the Sentence-Transformers package, with the distilbert-base-nli-mean-tokens model selected as the pre-trained model. The model parameters were configured as follows: MMR (Maximal Marginal Relevance) was set to 0.15, N-Gram range was (1,1), and Top_n was set to 15. All other parameters were left at their default values. Using the KeyBERT algorithm, this study identified 13 representative topics, which were subsequently categorized into four main dimensions of perceived risk: functional risk, quality risk, service risk, and appearance risk. The results are presented in Table 4. The results of the thematic feature extraction of online reviews for other products can be found in S2 Table.

thumbnail
Table 4. Topic clustering and dimensional classification of online reviews.

A Case Study of Bluetooth Headphones.

https://doi.org/10.1371/journal.pone.0316277.t004

In terms of functional risk, consumers are primarily concerned about the sound quality of headphones (T1), including audio fidelity and background noise. Following that, the fit and stability of the headphones (T2), as well as Bluetooth connectivity and call experience (T3), are also important. This indicates that sound quality and connection stability are the main factors consumers consider when evaluating functional risk. Under the quality risk dimension, consumers pay significant attention to the overall quality and craftsmanship of the headphones (T4), reflecting that product quality and perceived value play a crucial role in their perception of quality risk. Regarding service risk, consumers place considerable importance on the service quality (T8) and after-sales support (T9) provided by suppliers, including return and exchange policies and logistics efficiency. Additionally, consumers are attentive to the reputation and word-of-mouth of the suppliers (T10), which shows that the credibility of suppliers also affects consumers’ perception of risk.

4.2.2 Sentiment analysis based on topic classification results.

A rule-based approach utilizing keyword matching was employed to generate a labeled dataset comprising 103,545 entries. Subsequently, a TextCNN model was trained on this annotated corpus of review data. The model architecture comprises a 768-dimensional word embedding layer, multiple convolutional layers, a global max pooling layer, and two fully connected layers. The model was trained using sparse categorical cross-entropy as the loss function, Adam optimizer, and accuracy as the evaluation metric. Four distinct parameter configurations were selected for experimental evaluation. The specific parameter settings are presented in Table 5.

The model’s performance on the validation set is illustrated in Fig 4, where one epoch is defined as a complete pass through the entire sample set. The TextCNN model based on Configuration 4 exhibited superior performance, demonstrating high accuracy and low loss on the validation set. This configuration effectively balances the model’s generalization capability while mitigating overfitting. Consequently, this optimized model was selected for the thematic classification of all review texts in the corpus.

Based on the classification results of the model, the sentiment of the review text is quantified, and for each shop, the sentiment intensity of all its review data is calculated using the Snownlp library. The formula is , where senti is the sentiment polarity of the i-th comment, with positive sentiment tending towards 1 and negative sentiment tending towards 0, and n is the number of comments. The sentiment analysis results in Fig 5 show that T7, T2, and T1 have the highest proportion of negative emotions, which belong to the quality dimension and the functionality dimension, respectively.

thumbnail
Fig 5. Topic-based sentiment analysis: A case study of Bluetooth headphones.

https://doi.org/10.1371/journal.pone.0316277.g005

4.3 Sample clustering results

Before clustering using PCA-K-medoids, it is necessary to standardize the indicators for easier comparison. In this study, we adopted a reverse transformation method, where higher indicator values correspond to lower risk levels. The visualization of the clustering results is presented in Fig 6.

thumbnail
Fig 6. Clustering results based on PCA-K-medoids algorithm.

(a)-(d) represent the perceived risk clustering results for headphones, mobile phones, air conditioners, and facial cream, respectively.

https://doi.org/10.1371/journal.pone.0316277.g006

4.4 Risk prediction results of XGBoost

This article uses multi-class logarithmic loss (mlogloss) as the optimization objective and employs grid search (GR) for hyperparameter tuning. The dataset is divided into a training set and a test set in an 8:2 ratio. We set the learning rate to 0.1, subsample to 0.8, and colsample_bytree to 0.8. This article conducts an in-depth validation of two important parameters for XGBoost: Max_depth and N_estimators, as illustrated in Fig 7. As the grid search progresses, the value of the loss function continues to decrease. The optimal parameter combination yields a Loss of 0.32, with Max_depth = 5 and N_estimators = 160.

The model’s performance on the test set, including precision, recall, and F1 score, is presented in Table 6. The prediction results indicate that the model achieved an accuracy of 0.87, suggesting a relatively precise prediction of overall perceived risk in online shopping. This demonstrates the model’s effectiveness in distinguishing between different cluster types.

thumbnail
Table 6. Performance evaluation of models on test dataset.

https://doi.org/10.1371/journal.pone.0316277.t006

In order to explore the misclassification of risk types, this paper utilizes a confusion matrix visualization, as shown in Fig 8. Clearly, the model exhibits high accuracy along the diagonal, with particularly good performance for categories 1 and 2. However, there are some challenges in classifying category 0, which indicates issues such as data imbalance and insufficient feature extraction in the dataset.

thumbnail
Fig 8. Confusion matrix.

(a)-(d) represent the perceived risk prediction results for headphones, mobile phones, air conditioners, and facial cream, respectively.

https://doi.org/10.1371/journal.pone.0316277.g008

4.4.1 Comparative analysis of models.

To further assess the performance of the models, this paper selects several representative models and benchmark models for comparison. The parameter settings for each model are as follows: for the BSVM, a Gaussian kernel function is used, with the penalty coefficient C set to 10 and the kernel parameter gamma set to 0.1. The prior probability distribution is defined as a Gaussian distribution. LightGBM uses default parameters, with a subsampling ratio of 0.8, a feature sampling ratio of 0.8, and an early stopping round set to 100. The CNN architecture consists of three convolutional layers and two fully connected layers, with the number of convolutional kernels set to 32, 64, and 128, respectively, while max pooling is applied in the pooling layers. The benchmark models mainly include logistic regression(LR), decision trees(DT), random forests(RF), and K-nearest neighbors (KNN). The parameter settings for these benchmark models mainly adopt default values but have been appropriately adjusted based on the characteristics of the multi-classification tasks.

The predictive performance of each model is shown in Table 7. As indicated in the table, the risk prediction model based on GR-XGBoost demonstrates more stable performance and higher accuracy, consistently outperforming those that use other algorithms.

4.4.2 Model interpretability analysis.

To further identify the key factors influencing consumers’ overall perceived risk, we adjusted the clustering labels, combining Class 1 and Class 2 as the normal perceived risk cluster for comparison with the high perceived risk cluster of Class 0. SHAP was integrated into the output layer of the XGBoost model. SHAP assigns a Shapley value to each feature and simulates different orders of feature inclusion to measure their contribution to the model output [49]. Fig 9 presents the model interpretability results based on SHAP values.

thumbnail
Fig 9. Explainability analysis.

The y-axis represents various features sorted by their importance; the x-axis indicates SHAP values, with the positive direction representing normal clustering and the negative direction indicating high-risk clustering. Each point represents a sample in the dataset, with high values (red) corresponding to positive influence and low values (blue) indicating negative influence. (a)-(d) represent the interpretability analysis of perceived risk prediction results for headphones, mobile phones, air conditioners, and facial cream, respectively.

https://doi.org/10.1371/journal.pone.0316277.g009

Fig 9 indicates that perceived risk exhibits significant heterogeneity across different product categories. In subfigures (a) and (b), the effects of quality, price, and functional features on the model output are the most pronounced, with a wide distribution of SHAP values. This suggests that improving product quality and highlighting product features are key factors in reducing perceived risk for consumers. Our research reveals an intriguing inverse relationship between price points across different product categories and perceived risk. Specifically, when consumers encounter products positioned at a high price point, their perceived risk significantly decreases. This relationship indicates that consumers use price as a heuristic tool to reduce risk, with higher prices serving as a quality assurance mechanism, which aligns with the price-quality inference theory proposed by Guizzardi et al [50]. Furthermore, this inverse relationship is moderated by the characteristics of product categories. In high-involvement products, such as electronics, where functional risk is greater, this effect is more pronounced, while in experiential goods, such as creams, other attributes like safety and efficacy have a stronger influence on risk perception. Sales exhibit a clear bimodal distribution, where both low and high sales volumes may increase perceived risk, while moderate sales volumes tend to reduce perceived risk. This may reflect the dual influence of the “scarcity effect” and the “herding effect” [51]. It is noteworthy that service-related characteristics, such as customer inquiries and transaction disputes, consistently show relatively low importance across all product categories. Although certain specific product attributes dominate perceived risk (for example, the cooling performance of air conditioners and the skin safety of creams), some common factors like quality and price remain significantly important across multiple product categories [52, 53].

5 Conclusions and limitations

This study constructs a fine-grained online shopping perceived risk prediction model based on multi-dimensional feature fusion, which possesses strong interpretability and practical guidance significance. Empirical research results indicate that this model can provide data-driven decision support to reduce users’ perceived risks and optimize merchants’ risk management strategies. For product developers and marketing practitioners, the research findings emphasize the significant impact of intrinsic product attributes on perceived risk. Recommendations include: (1) prioritizing product quality and functionality optimization as core strategies to reduce users’ perceived risks; (2) continuously improving product functionality indicators and highlighting differentiated features to meet consumers’ heterogeneous needs; (3) conducting precise market segmentation and targeting customer positioning based on sales characteristic data.

Practical recommendations for e-commerce platforms include: (1) optimizing the product information display system by providing standardized product specifications and high-resolution multimedia content to reduce information asymmetry; (2) developing user experience-based feature comparison tools to lessen the cognitive burden on consumers during the product evaluation process; (3) implementing a transparent price protection mechanism that enhances the perception of price fairness through visualized price fluctuation data, thereby boosting consumer purchasing confidence.

Despite these findings, the current model has several limitations, including issues related to data representativeness and the comprehensiveness of feature selection. Future research should further expand the data dimensions by incorporating consumer purchase behavior data, product images, videos, and other unstructured data to dynamically characterize perceived risk from multidimensional perspectives.

Supporting information

S1 Table. Relevant data underlying the findings described in manuscript.

https://doi.org/10.1371/journal.pone.0316277.s001

(XLSX)

S2 Table. Topic clustering and dimensional classification of online reviews.

https://doi.org/10.1371/journal.pone.0316277.s002

(XLSX)

Acknowledgments

The authors are grateful to the anonymous reviewers and the editor for their valuable comments and suggestions that have greatly improved the quality of this paper.

References

  1. 1. E-Commerce Statistics of 2024. Available online. Available from: https://www.forbes.com/advisor/business/ecommerce-statistics/. (accessed on 1 March 2024)
  2. 2. Kim DJ, Ferrin DL, Rao HR. A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents. Decision support systems. 2008;44(2):544–64. https://doi.org/10.1016/j.dss.2007.07.001
  3. 3. Ou C, Chen K, Tseng W, Lin Y. A study on the influence of conformity behaviors, perceived risks, and customer engagement on group buying intention: A case study of community e-commerce platforms. Sustainability. 2022;14(4):1941. https://doi.org/10.3390/su14041941
  4. 4. Zhuang H, Leszczyc PTP, Lin Y. Why is price dispersion higher online than offline? The impact of retailer type and shopping risk on price dispersion. Journal of Retailing. 2018;94(2):136–53. https://doi.org/10.1016/j.jretai.2018.01.003
  5. 5. Rehman ZU, Baharun R, Salleh NZM. Antecedents, consequences, and reducers of perceived risk in social media: A systematic literature review and directions for further research. Psychology Marketing. 2020;37(1):74–86. https://doi.org/10.1002/mar.21281
  6. 6. Yu Y, Liu BQ, Hao J, Wang C. Complicating or simplifying? Investigating the mixed impacts of online product information on consumers’ purchase decisions. Internet Research. 2020;30(1):263–87. https://doi.org/10.1108/INTR-05-2018-0247
  7. 7. Carvache-Franco O, Loaiza-Torres J, Soto-Montenegro C, Carvache-Franco M, Carvache-Franco W. The risks perceived by the consumer in the acceptance of electronic commerce. A study of Bolivia. PLOS ONE. 2022;17(11):e0276853. pmid:36441731
  8. 8. Lin S, Shen S, Zhou A, Xu Y. Risk assessment and management of excavation system based on fuzzy set theory and machine learning methods. Automation in Construction. 2021;122:103490. https://doi.org/10.1016/j.autcon.2020.103490
  9. 9. Lappas PZ, Yannacopoulos AN. A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Applied Soft Computing. 2021;107:107391. https://doi.org/10.1016/j.asoc.2021.107391
  10. 10. Qi J, Zhang Z, Jeon S, Zhou Y. Mining customer requirements from online reviews: A product improvement perspective. Information Management. 2016;53(8):951–63. https://doi.org/10.1016/j.im.2016.06.002
  11. 11. Calheiros AC, Moro S, Rita P. Sentiment classification of consumer-generated online reviews using topic modeling. Journal of Hospitality Marketing Management. 2017;26(7):675–93. https://doi.org/10.1080/19368623.2017.1310075
  12. 12. Shu T, Wang Z, Lin L, Jia H, Zhou J. Customer perceived risk measurement with NLP method in electric vehicles consumption market: empirical study from China. Energies. 2022;15(5):1637. https://doi.org/10.3390/en15051637
  13. 13. Cox DF, Rich SU. Perceived risk and consumer decision-making—the case of telephone shopping. Journal of marketing research. 1964;1(4):32–9. https://doi.org/10.1177/002224376400100405
  14. 14. Kaplan LB, Szybillo GJ, Jacoby J. Components of perceived risk in product purchase: A cross-validation. Journal of applied Psychology. 1974;59(3):287. https://doi.org/10.1037/h0036657
  15. 15. Mitchell VW. Consumer perceived risk: conceptualisations and models. European Journal of marketing. 1999;33(1/2):163–95. https://doi.org/10.1108/03090569910249229
  16. 16. Phamthi VA, Nagy Á, Ngo TM. The influence of perceived risk on purchase intention in e‐commerce—Systematic review and research agenda. International Journal of Consumer Studies. 2024;48(4):e13067. https://doi.org/10.1111/ijcs.13067
  17. 17. Hong IB. Understanding the consumer’s online merchant selection process: The roles of product involvement, perceived risk, and trust expectation. International journal of information management. 2015;35(3):322–36. https://doi.org/10.1016/j.ijinfomgt.2015.01.003
  18. 18. Fürst A, Pecornik N, Hoyer WD. How product complexity affects consumer adoption of new products: The role of feature heterogeneity and interrelatedness. Journal of the Academy of Marketing Science. 2024;52(2):329–48. https://doi.org/10.1007/s11747-023-00933-7
  19. 19. Crespo ÁH, Del Bosque IR, de los Salmones Sánchez MG. The influence of perceived risk on Internet shopping behavior: a multidimensional perspective. Journal of Risk Research. 2009;12(2):259–77. https://doi.org/10.1080/13669870802497744
  20. 20. Kamalul Ariffin S, Mohan T, Goh Y. Influence of consumers’ perceived risk on consumers’ online purchase intention. Journal of research in Interactive Marketing. 2018;12(3):309–27. https://doi.org/10.1108/JRIM-11-2017-0100
  21. 21. Zhang D, Shen Z, Li Y. Requirement analysis and service optimization of multiple category fresh products in online retailing using importance-Kano analysis. Journal of Retailing Consumer Services. 2023;72:103253. https://doi.org/10.1016/j.jretconser.2022.103253
  22. 22. Zheng Q, Chen J, Zhang R, Wang HH. What factors affect Chinese consumers’ online grocery shopping? Product attributes, e-vendor characteristics and consumer perceptions. China Agricultural Economic Review. 2020;12(2):193–213. https://doi.org/10.1108/CAER-09-2018-0201
  23. 23. Chen P, Hitt LM, Hong Y, Wu S. Measuring product type and purchase uncertainty with online product ratings: a theoretical model and empirical application. Information systems research. 2021;32(4):1470–89. https://doi.org/10.1287/isre.2021.1041
  24. 24. Wu K, Vassileva J, Noorian Z, Zhao Y. How do you feel when you see a list of prices? The interplay among price dispersion, perceived risk and initial trust in Chinese C2C market. Journal of Retailing Consumer Services. 2015;25:36–46. https://doi.org/10.1016/j.jretconser.2015.03.007
  25. 25. Hong IB, Cha HS. The mediating role of consumer trust in an online merchant in predicting purchase intention. International Journal of Information Management. 2013;33(6):927–39. https://doi.org/10.1016/j.ijinfomgt.2013.08.007
  26. 26. Chopdar PK, Paul J. The impact of brand transparency of food delivery apps in interactive brand communication. Journal of Research in Interactive Marketing. 2024;18(2):238–56. https://doi.org/10.1108/JRIM-12-2022-0368
  27. 27. Matute J, Polo-Redondo Y, Utrillas A. The influence of EWOM characteristics on online repurchase intention: Mediating roles of trust and perceived usefulness. Online Information Review. 2016;40(7):1090–110. https://doi.org/10.1108/OIR-11-2015-0373
  28. 28. Wang Q, Zhu X, Wang M, Zhou F, Cheng S. A theoretical model of factors influencing online consumer purchasing behavior through electronic word of mouth data mining and analysis. PLOS ONE. 2023;18(5):e0286034. pmid:37200302
  29. 29. Roy R, Shaikh A. The impact of online consumer review confusion on online shopping cart abandonment: A mediating role of perceived risk and moderating role of mindfulness. Journal of Retailing Consumer Services. 2024;81:103941. https://doi.org/10.1016/j.jretconser.2024.103941
  30. 30. Yadav N, Verma S, Chikhalkar R. Online reviews towards reducing risk. Journal of Tourism Futures. 2024;10(2):299–316. https://doi.org/10.1108/JTF-01-2022-0016
  31. 31. Moliner Velázquez B, Fuentes-Blasco M, Gil Saura I. Antecedents of online word-of-mouth reviews on hotels. Journal of Hospitality Tourism Insights. 2022;5(2):377–93. https://doi.org/10.1108/JHTI-10-2020-0184
  32. 32. Yang J, Sarathy R, Lee J. The effect of product review balance and volume on online Shoppers’ risk perception and purchase intention. Decision Support Systems. 2016;89:66–76. https://doi.org/10.1016/j.dss.2016.06.009
  33. 33. Mitchell V, Greatorex M. Risk perception and reduction in the purchase of consumer services. Service Industries Journal. 1993;13(4):179–200. https://doi.org/10.1080/02642069300000068
  34. 34. Bettman JR. Perceived risk and its components: A model and empirical test. Journal of marketing research. 1973;10(2):184–90. https://doi.org/10.1177/002224377301000209
  35. 35. Jia J, Dyer JS, Butler JC. Measures of perceived risk. Management Science. 1999;45(4):519–32. https://doi.org/10.1287/mnsc.45.4.519
  36. 36. Bashir S, Khwaja MG, Mahmood A, Turi JA, Latif KF. Refining e-shoppers’ perceived risks: Development and validation of new measurement scale. Journal of Retailing Consumer Services. 2021;58:102285. https://doi.org/10.1016/j.jretconser.2020.102285
  37. 37. Lee G, Choi B, Jebelli H, Lee S. Assessment of construction workers’ perceived risk using physiological data from wearable sensors: A machine learning approach. Journal of Building Engineering. 2021;42:102824. https://doi.org/10.1016/j.jobe.2021.102824
  38. 38. Trivedi SK, Patra P, Srivastava PR, Zhang JZ, Zheng LJ. What prompts consumers to purchase online? A machine learning approach. Electronic Commerce Research. 2022:1–37. https://doi.org/10.1007/s10660-022-09624-x
  39. 39. Rausch TM, Derra ND, Wolf L. Predicting online shopping cart abandonment with machine learning approaches. International Journal of Market Research. 2022;64(1):89–112. https://doi.org/10.1177/1470785320972526
  40. 40. Ahmad F, Abbasi A, Li J, Dobolyi DG, Netemeyer RG, Clifford GD, et al. A deep learning architecture for psychometric natural language processing. ACM Transactions on Information Systems. 2020;38(1):1–29. https://doi.org/10.1145/3365211
  41. 41. Humphreys A, Wang RJ-H. Automated text analysis for consumer research. Journal of Consumer Research. 2018;44(6):1274–306. https://doi.org/10.1093/jcr/ucx104
  42. 42. Lin L, Shu T, Yang H, Wang J, Zhou J, Wang Y. Consumer-Perceived Risks and Sustainable Development of China’s Online Gaming Market: Analysis Based on Social Media Comments. Sustainability. 2023;15(17):12798. https://doi.org/10.3390/su151712798
  43. 43. Anwar Z, Afzal H, Altaf N, Kadry S, Kim J. Fuzzy ensemble of fined tuned BERT models for domain-specific sentiment analysis of software engineering dataset. PLOS ONE. 2024;19(5):e0300279. pmid:38805433
  44. 44. Guo B, Zhang C, Liu J, Ma X. Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing. 2019;363:366–74. https://doi.org/10.1016/j.neucom.2019.07.052
  45. 45. Ye T, Zhao S, Lau CKM, Chau F. Social media sentiment of hydrogen fuel cell vehicles in China: Evidence from artificial intelligence algorithms. Energy Economics. 2024;133:107564. https://doi.org/10.1016/j.eneco.2024.107564
  46. 46. Jin Y, Liu B, Li C, Shi S. Origin identification of Cornus officinalis based on PCA-SVM combined model. PLOS ONE. 2023;18(2):e0282429. pmid:36854014
  47. 47. Pinheiro DN, Aloise D, Blanchard SJJFS. Convex fuzzy k-medoids clustering. Fuzzy Sets Systems. 2020;389:66–92. https://doi.org/10.1016/j.fss.2020.01.001
  48. 48. Chen H. Enterprise marketing strategy using big data mining technology combined with XGBoost model in the new economic era. PLOS ONE. 2023;18(6):e0285506. pmid:37276212
  49. 49. Bifarin O. Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification. PLOS ONE. 2023;18(5):e0284315. pmid:37141218
  50. 50. Guizzardi A, Mariani M M, Stacchini A. A temporal construal theory explanation of the price-quality relationship in online dynamic pricing. Journal of Business Research. 2022;146:32–44.https://doi.org/10.1016/j.jbusres.2022.03.058
  51. 51. Sun H, Guo Z, Qian H. The self on display: The impact of self‐objectification on luxury consumption. Psychology & Marketing. 2024;41(10): 2412–2430. https://doi.org/10.1002/mar.22061
  52. 52. Loureiro F, Garcia-Marques T, Wegener D T. Norms for 150 consumer products: Perceived complexity, quality objectivity, material/experiential nature, perceived price, familiarity and attitude. Plos ONE. 2020;15(9): e0238848. pmid:32956402
  53. 53. Cao J, Jiang H, Ren X, Shi J. Consumers’ risk perception, market demand, and firm innovation: Evidence from China. Plos ONE. 2024;19(5): e0301802. pmid:38758805