Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Dynamic pricing modeling and inventory management in omnichannel retail using Quantum Decision Theory and reinforcement learning

Abstract

In the world of omnichannel retail, where customers seamlessly switch between online and offline channels, pricing and inventory management decisions have become more complex than ever. Customer purchasing behavior is influenced by uncertainty, market fluctuations, and competitive interactions, which traditional models fail to accurately predict. In such conditions, the need for intelligent and adaptive decision-making frameworks is more critical than ever. For the first time, this study presents a novel approach combining Quantum Decision Theory, Quantum Markov Chains (QMC), Quantum Dynamic Games, and Reinforcement Learning to optimize dynamic pricing and inventory management. By leveraging concepts such as superposition, observer effect, and quantum interference, the proposed model overcomes the limitations of classical models and provides a deeper understanding of customer behavior in uncertain environments. Additionally, a Quantum Multi-Level Markov Process (QMLMP) is employed to model market variations and enhance predictions. The results of this study demonstrate that the innovative model improves the accuracy of purchase behavior predictions, optimizes pricing and inventory management strategies, and helps retailers make more competitive and profitable decisions. This research introduces a transformative approach to tackling retail challenges in the digital age and paves the way for future studies in this domain.

1. Introduction

The retail industry is considered one of the most important and active sectors of the global economy. In the United States, this industry contributes $3.9 trillion to the annual Gross Domestic Product (GDP) and supports one-quarter of American jobs (Rios and Vera, 2023). E-commerce has brought transformations in omnichannel retailing. Many traditional retailers are entering the digital world. By adopting dual-channel and multi-channel strategies, retailers seek to increase their market share. On the other hand, customers no longer differentiate between various channels and expect to seamlessly switch between online and offline platforms. As a result, the boundaries between physical and digital channels are fading, and retail is moving towards a seamless omnichannel experience [1]. For example, Walmart has not only launched online stores but has also partnered with e-commerce platforms. Meanwhile, many online retailers are expanding their physical presence. For instance, in October 2016, The Idle Man, a UK-based online menswear retailer, opened its first physical store in London [1]. From the customer’s perspective, the omnichannel model enables shopping through various methods, such as buying online and picking up in-store, buying online and shipping from a store, reserving online and purchasing in person, and in-store shopping with home delivery. However, from the retailer’s perspective, these changes present both opportunities and challenges, as they require a reevaluation of sales channel structures, strategies, and operational decisions [1]. Omnichannel retailers must make a wide range of decisions that impact their profitability and competitiveness.

Demand is one of the key factors in sales for any organization or company. Researchers have proposed dependent demand functions for modeling demand, such as the price-dependent demand function [2,3] and the inventory-dependent demand function [4,5]. Therefore, the joint management of pricing and inventory is considered a fundamental issue in supply chain management [6]. Numerous studies have simultaneously addressed the importance of these two factors, and interested readers can refer to them [712].

Pricing of products is one of the most important decisions for companies and organizations. Setting the right price for products and services significantly impacts profitability and business success. According to Utility Theory, customers seek to maximize utility in their purchasing decisions; therefore, pricing should align with their perceived value of the product. In other words, the price acceptance level represents the maximum amount a customer is willing to pay for a product or service, and price acceptance is one of the consequences of customer satisfaction [13,14]. Furthermore, the Price Elasticity of Demand theory indicates that customers respond differently to price changes. For some products, a price reduction leads to increased demand, while for others, price changes have a minimal impact. For this reason, multichannel retailers should adjust their pricing strategies based on the demand elasticity in each channel [15].

According to Reference Price Theory, customers have a reference price in mind for each product, which is shaped by past experiences and advertisements. If a product’s price exceeds the customer’s mental reference price, demand may decrease. Therefore, maintaining price consistency across different channels and using targeted discounts can be effective in attracting customers [16]. The impact of price on customers’ purchasing decisions is crucial, as appropriate pricing can either encourage customers to buy or deter them from making a purchase. Customers generally seek good value for their money—if the price of a product does not match its perceived value and quality, they may look for alternatives. Moreover, discounts and price increases can directly influence customer decisions. Proper pricing strategies help improve business profitability: if the price is too high, sales volume may decrease, reducing overall profit; conversely, if the price is too low, profit margins may shrink. Therefore, setting a balanced and strategic price is essential for sustained profitability.

Unlike traditional retail, where pricing is determined independently for each store, online customer interactions have blurred physical boundaries, making pricing strategies more complex. Multichannel retailers must consider inventory and demand across the entire network to determine optimal pricing for each channel and geographic region. Implementing unified pricing and joint order fulfillment policies can reduce the average order processing cost compared to separate pricing and fulfillment strategies [7]. E-commerce retailers employ various marketing strategies to boost sales. One common strategy is offering price discounts by temporarily lowering the regular price of a product [14]. Additionally, most retailers adopt dynamic pricing policies, where the initial price changes based on sales performance over time. The main pricing challenges in retail include: Monitoring and analyzing sales behavior over time and adjusting prices on a daily, weekly, or monthly basis. Coordinating prices across different store locations to prevent arbitrage, where resellers exploit price differences between stores. Managing inventory while considering product substitutes and complements. Optimizing final prices for remaining stock at the end of a sales period [17].

Inventory management in omnichannel retailing is a constant challenge. The efficiency and profitability of retail businesses largely depend on proper inventory management. Inventory-related issues play a crucial role in supply chain and logistics, as they directly impact a company’s financial performance. This area involves various challenges, such as reducing lost sales due to stock shortages, managing perishable products, and controlling inventory with random replenishment times [17,18]. Inventory levels also influence customer purchasing behavior—particularly when stock is low, customers may feel a greater sense of urgency to buy. One of the key challenges in pricing and inventory management in omnichannel retailing is complex customer behavior, market fluctuations, and uncertainty in purchase decisions. Customers expect consistent pricing across all channels, yet intense competition and rapid market changes make dynamic pricing essential.

Additionally, accurate demand forecasting is difficult due to fast-changing trends, seasonal shopping patterns, and diverse product options. As a result, inventory management becomes increasingly complex. To mitigate rising costs, prevent customer dissatisfaction, and maintain profitability, retailers must leverage data analytics and advanced technologies to optimize their pricing and inventory strategies.

Selecting the optimal pricing strategy and inventory management approach requires accurate forecasting of customer behavior. However, real customer behavior is often irrational and unpredictable. Traditional models, such as Markov models and classical reinforcement learning, assume that customers always make rational decisions. However, recent research has shown that this assumption is not always valid.

Recent research in cognitive psychology focuses on probabilistic theories that best describe human decision-making uder uncertainty. Three main theories have been proposed in this field: Classical Probability Theory (CPT): Assumes rational and logical decision-making. Heuristics and Biases: Describes human behavior based on intuitive and fast decision-making rules. Quantum Probability Theory (QPT): A novel approach for analyzing uncertain decision-making, which can address the limitations of CPT in explaining customer cognition. Traditional pricing and inventory models, such as those based on classical Markov processes or reinforcement learning, often assume that customers make decisions rationally, consistently, and based on utility maximization principles. However, empirical observations in omnichannel retail environments reveal that real-world customers often behave unpredictably. For example, a customer may view a product multiple times, abandon it, and then suddenly decide to purchase after seeing a social media post or a temporary discount. These irregularities — driven by emotions, external stimuli, and mental hesitation — cannot be adequately modeled using classical deterministic frameworks. Such models treat indecisive behavior as disinterest and ignore the dynamic, context-sensitive nature of customer cognition. Therefore, a more flexible and realistic approach is needed. Quantum Decision Theory (QDT) provides this by allowing for probabilistic states of decision-making, where a customer can simultaneously exist in both “buy” and “not buy” cognitive states until an external interaction collapses this uncertainty into a final decision. This quantum perspective better captures the ambiguity, non-linearity, and interference effects inherent in modern consumer behavior.

Quantum Probability Theory (QPT) has emerged as a new paradigm in recent years, gaining widespread application in psychology, cognitive science, and decision-making analysis [19]. Cognitive quantum research suggests that QPT can better explain cognitive biases and decision-making deviations in customers. While CPT assumes that a customer always exists in one specific state, QPT allows a customer to exist in multiple cognitive states simultaneously. QPT provides a mathematical probabilistic model where a customer can be in an infinite number of potential states at any given moment, accessing all possible options [20]. In recent years, quasi-quantum decision theories have demonstrated their ability to explain inconsistencies in human behavior using quantum probabilities [21]. Quantum Decision Theory (QDT) introduces a new framework for modeling customer behavior, naturally incorporating cognitive uncertainty. In this framework, customers remain in multiple probabilistic states until they make a final decision. Their final choice is influenced by quantum superposition and interference effects. Unlike classical models, which assume that a customer either purchases or does not purchase, the quantum model allows a customer to be in both states (buying and not buying) simultaneously—until an external interaction (such as a price change or advertisement) triggers their final decision.

Customer Lifetime Value (CLV) is an important concept in marketing management and customer relationship management, representing the profit a customer generates for a business over their lifetime. This metric is crucial for organizations because retaining existing customers is less costly than acquiring new ones. Considering CLV is particularly important when delivering goods to various regions. If only current demand is considered, businesses may meet short-term needs but miss out on long-term growth opportunities. By focusing on CLV, companies can look to the future and make decisions that help retain customers and increase profitability in the long run. Customers with high CLV tend to make more purchases and generate higher profits for the company. Therefore, even if current demand is low in some regions, paying attention to CLV can help businesses optimally allocate resources and provide better services to these customers, keeping them loyal [2224].

Customer satisfaction and loyalty are key factors that lead to business success in competitive markets [25,26]. Customer loyalty refers to something that influences customer behavior to ensure ongoing, long-term purchases. The costs of acquiring new customers are higher than retaining them, so service organizations must find ways to retain their customers. Loyal customers buy from a business and may recommend it to others, but this does not mean they are always satisfied. For example, disloyal customers may change their purchasing location if a better option is available. However, satisfied customers may continue to buy from their current suppliers. The consumer is the core of organizations around which marketing activities revolve. Retaining consumer loyalty is what organizations aspire to achieve. The highly competitive and broad market with numerous brands is a harsh reality that marketing managers face today, and acquiring a new customer is more expensive than serving an existing one. Therefore, consumer loyalty is considered the most desirable phenomenon for companies, as commitment leads to repeat buying behavior and positive word-of-mouth advertising, thus resulting in a more significant market share and superior financial performance for the company [27].

Dynamic games are a powerful tool for modeling interactions between economic agents over time. This framework is particularly useful in scenarios where past decisions influence the current state and future choices. In the field of retail and dynamic pricing, numerous studies have utilized dynamic game theory to analyze optimal pricing strategies, inventory management, and customer responses [8,28,29].

The Markov Decision Process (MDP) is a mathematical model used for decision-making in stochastic or uncertain environments. This model is specifically designed for problems where the decision-maker cannot predict future outcomes with certainty. MDP has significant applications in fields such as reinforcement learning, robotics, game design, and resource management. Many studies have applied MDP-based approaches to dynamic pricing and inventory control [30,31]. In general, MDP provides a mathematical framework for decision-making in uncertain environments, where state transitions occur probabilistically based on the actions taken. This model is particularly useful for analyzing dynamic games, where state changes occur in a probabilistic manner.

Common dynamic pricing methods include Markov chains, reinforcement learning (PPO, Q-learning), and dynamic game theory. These methods have been successful in various economic problems but have limitations. They assume that customers always make rational and deterministic decisions, whereas in reality, purchase decisions are influenced by emotions, social factors, and random interactions. This scientific gap creates the need for a new approach that can account for these complexities in modeling. Traditional pricing and inventory management models, based on Markov chains and classical reinforcement learning, assume that customers always make logical and definite decisions. However, recent research in quantum cognition suggests that customer decision-making can exist in multiple cognitive states simultaneously and be influenced by quantum superposition. Despite this breakthrough, few studies have applied this concept to omnichannel retailing and dynamic pricing. Recent findings in psychology and behavioral economics show that customer decisions are affected by irrational factors, emotions, advertisements, and even discounts. These unexpected and fluctuating behaviors cannot be accurately modeled using classical methods. Thus, there is a growing need for a new framework to analyze and predict customer behavior in omnichannel retailing.

This scientific gap led us to explore Quantum Markov Chains (QMC) and Reinforcement Learning (RL) to better model customer purchasing behavior and optimize pricing strategies and inventory management. In this study, we use Quantum Decision Theory (QDT) to model customer purchasing behavior. Unlike classical models, quantum theory enables more accurate predictions of customer behavior. In this framework, decisions are modeled using quantum concepts such as superposition and the observer effect, rather than traditional probabilities. In this model, customers can exist in multiple decision states simultaneously until an external factor, such as an advertisement or price change, collapses the state and determines their final decision. This research aims to answer: How can Quantum Decision Theory, combined with Reinforcement Learning, optimize pricing and inventory management in omnichannel retailing? Specifically, we focus on:

  1. Quantum Markov Chains (QMC) for predicting customer purchasing behavior.
  2. Quantum probabilities in pricing models, integrating price dynamics with market conditions.
  3. Reinforcement Learning (RL) to optimize pricing and inventory management policies.
  4. Pricing formulation using Schrödinger-like quantum equations.
  5. Modeling customer-seller interactions as a Quantum Game Theory framework.
  6. Using Quantum Markov Chains (QMC) to enhance purchasing behavior predictions.

This study introduces a new framework for dynamic pricing and inventory management in omnichannel retail by integrating quantum-based models with machine learning techniques. Our proposed model, leveraging quantum concepts, overcomes the limitations of traditional methods such as classical Markov chains and classical reinforcement learning. This approach better models the cognitive uncertainty of customers, leading to more accurate predictions of purchasing behavior. Furthermore, by integrating quantum probabilities and reinforcement learning (RL), our model enables the simultaneous optimization of pricing and inventory management, ultimately enhancing the profitability of retailers in an omnichannel environment. The remaining structure of the paper is as follows: In Section 2, the literature review is presented. In Section 3, the mathematical modeling is introduced. In Section 4, the solution method is examined. In Section 5, the implementation and execution process of the model is discussed. In Section 6, the results of the model are presented. In Section 7, a comparison of the model with traditional models is made. Section 8 covers sensitivity analysis, and Section 9 provides the conclusion and outlines future research directions.

2. Literature review

Joint pricing and inventory management is a fundamental issue in supply chain management [6]. With the expansion of supply chains and the rise of e-commerce and omnichannel retailing, this area has garnered increasing attention, leading to numerous studies. Managing pricing and inventory in omnichannel retailing presents significant challenges due to the complex interactions between sales channels, the uncertain behavior of customers, and intense competition. Many studies have attempted to model these challenges using classical approaches, such as Markov chains, reinforcement learning (RL), and game theory. To address the complexities of pricing and inventory management, researchers have increasingly relied on reinforcement learning (RL) and advanced algorithms like Proximal Policy Optimization (PPO) and Q-learning [11,30]. These models can learn optimal policies in dynamic environments, outperforming traditional models. However, reinforcement learning models still assume rational customer behavior. Recent research has shown that customers exhibit unpredictable behaviors under uncertainty, which cannot be fully captured by traditional RL models.

Dynamic pricing has emerged as a key strategy for increasing profitability in retail. Many studies have utilized Markov chains and the Markov Decision Process (MDP) to determine optimal pricing policies [7,8]. These models assume that the decision-maker can observe the market conditions and, based on this information, adopt optimal pricing strategies. However, these models have a fundamental limitation: they assume that customers always make rational decisions based on economic utility, whereas behavioral psychology studies have shown that purchase decisions are influenced by irrational factors, such as emotions, mental perceptions, and social effects [32,33].

Cognitive psychology research has shown that customer decision-making is not always rational and is often influenced by factors such as intuition, emotions, and social effects [34,35]. One of the emerging theories proposed to explain these behaviors is Quantum Probability Theory (QPT), which, instead of deterministic models, uses superposition and probabilistic interference to predict decisions [20]. While QPT has been successful in cognitive sciences and psychology, few studies have applied this approach to retail and dynamic pricing. This paper aims to bridge the gap in customer behavior modeling by utilizing quantum modeling, dynamic games with quantum concepts, and Quantum Markov Chains (QMC). Table 1 presents a summary of research on dynamic pricing and inventory management.

By reviewing the literature, it became evident that traditional models, such as Markov processes and reinforcement learning in dynamic pricing and inventory management, assume that customers act rationally. However, recent research in cognitive sciences has shown that customer decisions can be irrational and uncertain. In this paper, for the first time, quantum modeling, quantum dynamic games, and Quantum Markov Chains (QMC) are utilized to model customer purchasing behavior and optimize pricing and inventory policies in omnichannel retail. This approach can enhance prediction accuracy, improve decision-making efficiency, and provide innovative solutions for sales management under uncertainty.

3. Modeling the omnichannel retail problem using quantum decision theory

This study focuses on modeling an omnichannel retail system that integrates both physical stores and online sales. The primary goal of this modeling approach is to optimize inventory management and pricing strategies in a way that not only maximizes retailer profit but also increases the likelihood of customer purchases. To achieve this objective, the proposed model follows a bi-objective approach, where a dynamic mathematical model is designed to simultaneously analyze multiple parameters, including costs, demand, customer lifetime value (CLV), and purchasing behavior. This model particularly emphasizes both online and physical demand and considers two types of demand [1]:

  1. Physical Demand – Demand generated at physical stores, each corresponding to a specific market region.
  2. Online Demand – Customers can make purchases through the retailer’s website and choose from various delivery options, such as standard delivery, in-store pickup, or instant delivery.

With the expansion of omnichannel retail, pricing and inventory management have become challenging due to the complex and uncertain behavior of customers. Traditional models, such as Markov chains and classical reinforcement learning, assume that customers always make rational and deterministic decisions. However, recent studies have shown that purchase decisions are influenced by uncertainty, emotions, and environmental factors. Quantum Decision Theory (QDT) is a novel mathematical framework that can model human decisions in a non-classical manner. Unlike classical models, which assume that decisions are made deterministically and rationally, QDT demonstrates that human decisions, especially under uncertainty, follow quantum principles. Additionally, in the study by [45], it is argued that mathematical structures derived from quantum theory provide a much better explanation of human thinking than traditional models. Moreover, in the research by [46], the preference reversal phenomenon in Quantum Decision Theory (QDT) is examined, showing how quantum interference effects can explain cognitive biases, particularly in cases where classical decision-making theories fail. To address this issue, we utilize Quantum Decision Theory (QDT), which incorporates concepts such as superposition, the observer effect, and Quantum Markov Chains (QMC) to more accurately model customer behavior. In the proposed model, customer purchase behavior is represented as a probabilistic state rather than a fixed decision. According to the principle of superposition, a customer remains in multiple possible states (buying or not buying) until an external factor—such as an advertisement or price change—collapses the state into a definite decision (observer effect). Additionally, the interference effect shows that customer choices influence each other in unexpected ways.

Superposition: In quantum mechanics, a system can exist in multiple possible states simultaneously until an observation or measurement forces it into one definite state. In the customer purchase model, this concept suggests that a customer may simultaneously consider both buying and not buying a product. Only after seeing an advertisement or a price change does the customer finalize their decision. To better grasp our proposed model and how we apply quantum concepts to analyze customer behavior, let’s simplify it. Take superposition as an example: we assume that until a customer makes their final decision, they exist in a state of both buying and not buying simultaneously. This means the customer is in both mental states at once, and their final decision only becomes clear after they’re exposed to an external factor, like seeing a new price or an advertisement.

Observer Effect: This principle states that observing or interacting with a system can alter its state. In retail, seeing advertisements, customer reviews, or special offers can shift the customer’s mental state, influencing them toward making a purchase decision [47]. To better understand, we can explain the Observer Effect in quantum theory like this: observation or interaction with a system changes its state. For instance, every time a customer sees a new advertisement, user review, or price change, their mental state shifts, and the probability of them buying or not buying undergoes a transformation.

Interference Effect: In quantum physics, quantum waves can interfere with each other, altering the final outcome. In the proposed model, if a customer is undecided between multiple options, advertisements and interactions can unexpectedly shift their final decision. For example, if a customer is considering two similar products, a discount or an advertisement on one product can increase its likelihood of being chosen [48]. To better understand, we can say that in Interference, a customer’s different choices (like choosing between two brands or two products) can quantum mechanically interfere with each other. This means if a customer is thinking about two options simultaneously, these two probabilities can either strengthen or weaken each other. For example, an advertisement for one product might cause the probability of buying a rival product to decrease.

Wave Function: In quantum mechanics, the wave function represents all possible states of a system. In the proposed model, the customer’s wave function reflects the probabilities of different purchasing decisions. When the customer sees an ad or experiences a price change, their wave function collapses, resulting in a definite purchasing decision [49].

Considering the discussions presented in the introduction, not all customers hold the same level of importance, and they do not generate equal profit for the organization. Therefore, taking into account Customer Lifetime Value (CLV) and customer loyalty is crucial when determining pricing and inventory for retail stores, which will be considered in this study. Accordingly, the pricing and inventory model incorporating quantum characteristics, along with CLV and customer loyalty, will be structured as follows:

General Wave Function: The general wave function is defined as a combination of classical and quantum variables. This wave function dynamically evolves based on customer purchasing decisions:

Wave Function Normalization Condition:

This equation is chosen to model customer decision-making in uncertain environments. The combination of the wave function and conditional probabilities allows for the simultaneous consideration of multiple influencing factors. This model dynamically represents the customer’s state transitions. If increases (for example, if a significant discount is applied to brand ii), the probability of selecting that brand increases. Viewing advertisements can alter the wave function and cause its collapse into a specific decision.

Customer Behavior Change Equation: To model customer behavior changes over time, we use the Schrödinger equation, where the wave function is influenced by price, inventory, demand, advertising, and external factors.

This equation is one of the fundamental equations of quantum mechanics, used to model system changes over time. In this model, customer decisions evolve over time, and a final decision is made only when an observation (such as an advertisement or a price change) occurs. Price, inventory, advertising, and demand are among the most influential factors in customer purchases. Research has shown that these factors play a crucial role in customer decision-making [13] and [15]. This model is proposed based on the concepts of Quantum Decision Theory; however, it still requires empirical testing on real-world data. Some studies, such as [45] and [46], have explored modeling human cognitive behavior using quantum theory.

Numerical Example for Better Understanding of the Model: Suppose a store sells two types of products, A and B, in two regions, C and D. The advertising level and price of these products are as follows:

Examining Changes in Customer Wave Function Over Time with Advertising

Before Advertising:

The customer is more inclined to purchase Product B.

After Intensive Advertising for Product A:

Increased advertising has changed the wave function, thereby altering the customer’s choice.

Price Change Analysis

If the price of Product A decreases:

This results in an increase in , leading to a higher probability of purchasing Product A.

Quantum and Classical Interactions: To incorporate quantum and classical interactions, the purchase probability is modeled as follows:

where represents the quantum effects on customer behavior. This function captures how quantum concepts such as superposition, observer effect, and interference influence customer decision-making.

where:

  • is the probability of customer m making a purchase, calculated based on historical data or a logistic model.
  • represents the probability of the customer not purchasing.
  • is an environmental variable that accounts for factors such as advertising impact, pricing, and special offers.
  • is an exponential function that moderates the effect of discounts or advertising, meaning that if the impact of advertising or pricing is high, this value decreases.
  • are coefficients that regulate classical and quantum effects.
  • controls the sensitivity of the customer to environmental variables.

Mathematical Model:

Modeling:

(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)

The first objective function represents revenue maximization. The second objective function expresses the maximization of the probability of purchase. Constraint 3 is a logistic function that calculates the probability of purchasing product i from brand j in region k through purchasing method r at time t. Constraint 4 shows how quantum effects (superposition, interference, and cognitive uncertainty) influence customer purchase decisions. Constraint 5 describes the quantum wave function of customer decision-making, meaning that the customer’s decision is a combination of all possible purchasing probabilities and is influenced by both classical probabilities and quantum effects. Constraint 6 ensures that the total probability of all customer decision states equals 1, meaning that the customer must ultimately choose between purchasing or not purchasing. Constraint 7 is a version of the Schrödinger equation from quantum mechanics, used to model changes in customer behavior over time. Here, Ĥ is the Hamiltonian operator applied to the quantum decision-making wave function. Constraint 8 states that the overall Hamiltonian of the system consists of multiple components representing the impact of various variables on customer decision-making. Constraint 9 further specifies how each Hamiltonian component affects customer decisions. Constraint 10 ensures that the number of products sent to region k does not exceed the warehouse capacity. Constraint 11 states that the inventory level at time t equals the previous inventory level plus the amount of incoming stock. Constraint 12 guarantees that the warehouse has sufficient inventory to fulfill customer orders. Constraint 13 defines the initial inventory level, ensuring it does not drop below a specified threshold at the start of the model. Constraint 14 ensures that demand must be at least equal to the amount of goods shipped to different regions. Constraint 15 demonstrates how price, advertising, and Customer Lifetime Value (CLV) influence the probability of purchase. Constraint 16 guarantees that the number of items shipped does not exceed demand. Constraint 17 determines how goods should be distributed across different regions to maximize profit. Constraint 18 ensures that all these variables have valid, non-negative values.

Linearization: Additionally, the purchase probability is also nonlinear, which will be linearized using a logit approximation as follows [50].

is a constant coefficient that can be adjusted to provide a better approximation.

4 Approach to solving the problem

4.1 Combining the Quantum Model with reinforcement learning

To optimize pricing policies and inventory management, reinforcement learning (RL) has been utilized. This approach, leveraging quantum dynamic games and quantum Markov chains, enables learning optimal decisions in an uncertain environment. In this model, demand levels, discounts, inventory levels, and market competition are simultaneously analyzed within a quantum wave function. Due to the mathematical complexity of the proposed model—incorporating quantum Markov chains, superposition effects, and quantum interference in customer behavior—an exact analytical solution is highly challenging. The model consists of nonlinear equations and complex probability distributions, which traditional mathematical methods struggle to solve. Moreover, random factors such as advertising, market fluctuations, and the uncertain behavior of customers necessitate the use of numerical methods and simulations. Therefore, to optimize pricing and inventory management, reinforcement learning methods such as the PPO (Proximal Policy Optimization) algorithm will be used. This allows the model to learn from empirical data and market interactions to determine the most effective decision-making policies. The advanced PPO reinforcement learning algorithm is applied to optimize decision-making in inventory and pricing management. In this framework, dynamic games are employed to more accurately simulate competitor reactions and market interactions, leading to improved system dynamics and decision-making. Additionally, adaptive memory plays a crucial role in the learning process and decision optimization. This memory enables the system to retrieve past experiences over different time periods and make better decisions based on that information. In other words, the system can dynamically update current conditions and learn from past events, resulting in continuous performance improvement. In this model, Random Search is used to fine-tune key parameters such as learning rate, gamma, lambda, clipping parameter (clip_param), entropy coefficient, and the number of episodes. Instead of performing an exhaustive search across the entire parameter space, this method randomly selects a set of parameter values and evaluates the model accordingly.

4.2 Quantum dynamic game for market interactions

In a competitive environment, retailers and customers participate as players in a Quantum Dynamic Game. In this game, prices, discounts, and competitor decisions interact simultaneously and are analyzed using quantum models. The proposed model integrates Quantum Decision Theory, Quantum Markov Chains, Reinforcement Learning, and Quantum Dynamic Games to more accurately model customer behavior and retailer decisions in an uncertain environment. Compared to traditional methods, this model offers higher accuracy in predicting purchasing behavior and optimizing pricing and inventory management. By adopting this approach, retailers can develop more competitive and profitable strategies in complex markets. The dynamic game includes intricate interactions between customers, sellers, and competitors, aiming to simulate strategic decision-making in a competitive market. All components of the dynamic game are detailed in Table 2.

This quantum game model introduces uncertainty, entanglement, and probabilistic decision-making to dynamic market interactions, allowing for more flexible and unpredictable outcomes.

To clarify the practical implementation of Quantum Dynamic Games in this study, we provide a step-by-step explanation of how the model was operationalized:

Definition of Players and Actions: The retailer, customers, and competitors were modeled as agents in the quantum dynamic game.

  • Retailer actions: Pricing, discounting, advertising strategies.
  • Customer actions: Purchase decision modeled as a quantum superposition state (buy/ not buy).
  • Competitor actions: Dynamic pricing and discounting modeled as entangled quantum states influencing customer decisions.

Quantum State Representation: Each agent’s decision state (e.g., retailer price levels or customer purchase intent) was represented by a quantum wave function.

The quantum dynamic game allowed these wave functions to interfere, creating constructive or destructive effects on final outcomes.

Integration with Reinforcement Learning (PPO): The quantum dynamic game was integrated with Proximal Policy Optimization (PPO) so that:

  • The PPO agent (retailer) learns optimal strategies based on the quantum probability distributions of customer and competitor behaviors.
  • Actions were selected not deterministically but based on evolving quantum probability amplitudes, updated in each learning episode.

Markov Quantum State Transitions: The game incorporated Quantum Multi-Level Markov Processes (QMLMP) to model staara_Ite transitions between different market conditions (e.g., normal, competitive discount war, boom/recession).

  • Transition probabilities were determined by quantum amplitudes rather than classical fixed probabilities.

Practical Computation: The game was simulated using numerical solvers to evolve the quantum states over time (Schrödinger-like equations) and PPO for learning.

Key practical steps included:

  • Initial parameter estimation from Kaggle data (e.g., price sensitivity, loyalty).
  • Running simulations with different competitor strategies to capture entanglement effects.
  • Using random search to tune PPO hyperparameters (e.g., learning rate, entropy coefficient).

Outputs:

The quantum dynamic game produced pricing and inventory policies that adaptively responded to competitor moves and customer quantum states, validated through simulations showing improved profit and reduced stockouts.

4.3. Utilizing Quantum Markov Chains (QMC)

In many systems and models, especially in reinforcement learning and dynamic games, uncertainty and randomness in various behaviors play an important role. These uncertainties are usually caused by the unpredictability of states and future behaviors. One of the powerful tools for modeling such systems is the Markov process. Quantum Markov Chains (QMC), compared to classical Markov models, enable probabilistic and dynamic modeling of customer decisions. This model simultaneously considers different customer decision-making states and helps optimize pricing. Simply put, the Markov process means that the system is in one state at any given moment, and its transition to the next state depends only on the current state and has no connection to previous states. This feature is known as the Markov property, meaning that the system’s future depends only on its present state and not on its history. This property is very useful in many games and decision-making models, as optimal decisions can be made using the probabilities of transition from one state to another. In the real world, especially in dynamic games, players (such as sellers, competitors, and customers) usually cannot precisely predict each other’s future behaviors. For example, a seller does not know what decisions competitors will make in the future or where the market is heading (whether it will move towards growth or enter a recession). These situations change randomly, and decision-making in such environments is accompanied by uncertainty. In such conditions, the Markov process can help model these probabilistic changes. Specifically, in the Markov process, each player or decision-making agent (such as a seller) examines its current state and, based on it, decides which state (and decision) to transition to. The probability of transitioning to the next state is determined using probabilistic models. In reinforcement learning methods such as PPO, which are used in dynamic games and complex decision-making, an agent (e.g., a seller) must use past decisions and the rewards received from them to optimize future decisions. In other words, past rewards and decisions guide the agent towards optimal strategies, even if it cannot precisely predict future states. In general, Markov processes allow us to model transitions between probable states more accurately under uncertainty and thus make better decisions. Especially in complex reinforcement learning models aimed at improving long-term strategies, the Markov process is a useful tool for managing uncertainty and predicting future behaviors. Algorithm 1 provides an overview of the problem-solving approach.

The Proximal Policy Optimization (PPO) algorithm is a reinforcement learning method that helps the agent (here, the seller) find optimal decision-making policies through interaction with the environment. At each step, the seller observes the market conditions (such as competitor prices, demand levels, and economic conditions) and selects an action (e.g., changing prices, offering discounts, or adjusting inventory). After taking the action, a reward is received, indicating the success or failure of that decision (e.g., increased sales or reduced profit). Then, PPO uses these experiences to update the decision-making policy (Policy Update). However, unlike traditional methods, it clips changes to ensure more stable and effective learning. In this model, a Quantum Multi-Level Markov Process (QMLMP) is used to predict future states. PPO processes this data and dynamically and competitively optimizes the seller’s strategies.

5. Implementation and result analysis

5.1. Data and related information

Several reputable studies, have utilized Kaggle data for analyses in areas such as customer behavior, churn rate prediction, purchasing behavior, and more [5158]. In this study, Kaggle data has also been used to predict customer loyalty and extract behavioral features required for the modeling process. This dataset contains 302,011 records, each with features listed in the table below. The data pertains to customer activities in 2023 and 2024. The variables used for learning techniques are explained in Table 4. The data was collected from the Kaggle website and relates to a multi-channel retailer. This dataset contains 302,011 records, each with features listed in Table 3. The data pertains to customer activities in 2023 and 2024. A summary of the dataset used is provided in Table 4.

In this study, model parameterization has been conducted based on the following methods: (1) extraction from real data, (2) utilization of previous studies, and (3) calibration through simulation. The following sections explain how each category of parameters has been assigned values. The dataset of retail customers and transactions introduced in the paper has played a key role in parameter estimation. Table 5 presents the method of obtaining the parameters.

thumbnail
Table 5. Parameter Extraction and Assignment Methods for Model Implementation.

https://doi.org/10.1371/journal.pone.0333068.t005

5.2. Data-driven identification of quantum superposition and decision collapse

In order to empirically validate the assumptions of our quantum decision-making framework, we utilized specific features from the real-world dataset to identify customer cognitive states and behavioral transitions. The process involved a multi-step interpretation of behavioral data to detect quantum phenomena such as superposition, the observer effect, and wave function collapse.

  1. Identifying Cognitive Superposition: We analyzed customers who exhibited repeated interest in a product but delayed the purchase decision. For example, the variables Product_Category, Product_Brand, Purchase_Channel, Date, and Total_Purchases allowed us to detect patterns where customers revisited the same product multiple times across different dates without making a transaction. This behavior — frequent but inconclusive engagement — was interpreted as a cognitive superposition, where the customer simultaneously considers both buying and not buying. Unlike classical models, which assume non-purchase indicates disinterest, our quantum model captures the uncertainty and hesitation inherent in such behavior.
  2. Detecting External Stimulus (Observer Effect): To determine when a customer’s decision wave function collapses into a final state (e.g., a purchase), we observed changes in Amount, Order_Status, Shipping_Method, and Date. A sudden drop in purchase price (recorded in the Amount column), especially when deviating from the average price range for that product, was taken as a sign of promotional activity or discount. Simultaneously, a change in Shipping_Method to “Same-Day” or “Express” implied urgency in decision-making, often triggered by a campaign or push notification. When such changes were followed by a purchase event (Order_Status = Completed), it indicated that the customer had received an external stimulus that collapsed their probabilistic state into a definite decision — consistent with the quantum observer effect.
  3. Confirming Wave Function Collapse (Final Decision): The final confirmation of a decision collapse was based on the simultaneous presence of the following: a non-zero Amount, a valid Payment_Method, a completed Order_Status, and an updated Purchase_Channel. These fields together confirm that the customer transitioned from indecision to a completed transaction, supporting the quantum model’s prediction. Additionally, positive feedback recorded in the Feedback or Ratings columns post-purchase helped reinforce the presence of satisfaction and loyalty shifts as modeled through quantum interference effects. By integrating these behavioral indicators, the quantum model was able to interpret indecisive customer behavior more effectively than classical deterministic frameworks. This approach not only validated key theoretical assumptions (such as superposition and observer effect) but also demonstrated how real-world behavioral data can be used to inform quantum-based decision modeling in omnichannel retail contexts.

To demonstrate how the proposed quantum decision model interprets customer behavior more accurately than classical models, we analyze a real instance from the Kaggle dataset. Customer ID 94823, a 28-year-old female with low income, repeatedly visited a Zara shirt product via the mobile app across five different days over a two-week period. Despite this engagement, she made no purchase initially. In classical models, this pattern is interpreted as low intent to buy, and the probability of purchase is reduced accordingly. However, under our quantum decision framework, such repeated yet inconclusive interactions indicate that the customer is in a cognitive superposition — simultaneously inclined to both buy and not buy.

Shortly after the last visit, a significant price drop was recorded (from $350 to $320), alongside a “Completed” order status within 24 hours. This behavioral shift suggests an external stimulus — likely a push notification or promotional discount — which, according to the quantum model, acts as an observation event, collapsing the customer’s wave function into a definite decision state (purchase). Our model predicted a 68% likelihood of purchase after the intervention, compared to only 12% under the classical model. This case clearly illustrates the applicability of the quantum observer effect and superposition in modeling real customer behavior. The customer’s transition from indecision to purchase, triggered by a contextual stimulus (e.g., discount or targeted ad), cannot be effectively captured by deterministic models. In contrast, the quantum model inherently accommodates cognitive uncertainty and external influences, enabling more accurate behavioral prediction and strategic decision-making.

5.3. Integrating quantum concepts into the Dynamic Game Model

The dynamic game in this model incorporates quantum decision-making principles alongside traditional economic interactions among customers, vendors, and competitors. In this framework, customer decisions are not strictly deterministic but rather exist in a quantum superposition of choices until an interaction (such as a price change or advertisement) collapses the decision into a final state.

Quantum Dynamic Game & PPO Integration

  1. Quantum Superposition in Customer Decisions:
    • Unlike classical models where a customer either buys or does not buy, the Quantum Decision Theory (QDT) allows customers to be in multiple potential states simultaneously (e.g., “interested in buying” and “not interested”).
    • A purchase decision collapses to a specific choice only after external factors (like promotions or competitor strategies) influence their probability wave function.
  2. Quantum Interference in Competitor Reactions:
    • Competitors’ pricing and discount strategies interfere constructively or destructively with the seller’s decisions.
    • For example, if two competitors reduce prices simultaneously, their influence may cancel each other out for a segment of customers, akin to quantum wave interference.
  3. Quantum Markov Chain for State Transitions:
    • The transition between market states is modeled using a Quantum Markov Process (QMP), where a seller’s pricing strategy probabilistically influences the next state.
    • Instead of transitioning deterministically, state changes are influenced by a probability amplitude, accounting for market uncertainty.
  4. Quantum Game Theory in Dynamic Market Competition:
    • The interaction between the seller and competitors follows Quantum Game Theory, where each player’s strategy exists in a mixed quantum state rather than a fixed classical choice.
    • The seller’s pricing adjustments and advertising strategies affect the probability distribution of competitor reactions, rather than a single deterministic response.

Quantum-Enhanced PPO for Seller Decision-Making

In the Proximal Policy Optimization (PPO) algorithm, the seller acts as a quantum-learning agent, dynamically adapting strategies based on probabilistic interactions in the market.

  1. Quantum Market State (Quantum State Representation):
    • The seller does not exist in a single discrete market state but rather in a quantum superposition of states, representing multiple possible outcomes based on competitors’ actions.
    • Observing competitor actions or customer preferences collapses the quantum state, leading to an updated pricing or inventory strategy.
  2. Quantum Action Selection:
    • The seller selects pricing, discounting, and advertising actions, not from a fixed set of choices, but from a probability distribution of optimal strategies.
    • The PPO model refines this distribution over multiple iterations using reinforcement learning.
  3. Quantum Reward Function with Entanglement Effects:
    • Rewards are influenced by entangled relationships between customer behavior, pricing decisions, and competitor responses.
    • For example, if a competitor raises prices, the probability of a seller’s sales increasing also rises due to entanglement in the competitive pricing space.
  4. Quantum Policy Updates:
    • Instead of deterministic policy updates, quantum-inspired policy gradients guide the seller’s adaptation.
    • ◦PPO’s clipped objective function is adjusted to include quantum uncertainty corrections, ensuring smoother convergence toward optimal strategies.

An example of a dynamic game is shown in Table 6.

5.4. Integrating quantum concepts into multi-level Markov process

In a quantum-enhanced dynamic game, the seller must make decisions based on different market conditions (such as boom or recession) and the behavior of competitors. Unlike classical models, where decisions are deterministic and history-independent, the Quantum Multi-Level Markov Process (QMLMP) incorporates superposition, interference, and entanglement, allowing the seller to account for non-classical decision-making patterns in an uncertain market.

The Quantum Multi-Level Markov Process (QMLMP) enables the seller to predict future market conditions in a probabilistic yet non-deterministic manner, ensuring that quantum state transitions capture complex interdependencies. Furthermore, reinforcement learning techniques such as Proximal Policy Optimization (PPO) leverage these quantum transitions to develop dynamic strategies that adapt to competitive behaviors and market fluctuations.

In this system, the Quantum Multi-Level Markov Process, dynamic games, and reinforcement learning (PPO) interact seamlessly, enabling the seller to make optimal decisions under uncertainty and competitive pressures.

How the Quantum Multi-Level Markov Process Helps the Seller in Dynamic Games:

  1. Quantum Modeling of Future Market States:
    • In a dynamic game, the seller is in a quantum state composed of multiple potential market conditions (e.g., boom, recession, or stagnation).
    • Unlike classical models where the seller is in a single discrete state, quantum superposition allows them to exist in a combination of different market states simultaneously.
    • The measurement of the quantum state (such as new competitor pricing strategies or economic shifts) collapses the seller’s state into a definite market condition, influencing their next decision.
  2. Quantum Probabilistic State Transitions with Interference Effects:
    • In the classical Markov process, a transition from “boom” to “recession” is modeled with fixed probabilities.
    • In the quantum Markov model, state transitions do not occur independently but interfere with each other due to probability amplitude effects.
    • This means that some transitions are enhanced while others are suppressed, allowing the seller to leverage interference effects to predict shifts in the market more accurately.
  3. Reducing Complexity and Enhancing Decision-Making with Quantum Memory:
    • Unlike classical Markov models, which assume memoryless state transitions, quantum Markov processes allow quantum memory effects where entangled historical states influence future transitions.
    • This means that, rather than assuming past states do not affect future outcomes, the seller can utilize quantum correlations to retain past influences on decision-making.
    • As a result, the seller does not need to track all past market states but instead relies on entangled probability distributions, which encode past decision-making dynamics.
  4. Enhancing Reinforcement Learning with Quantum Transitions
    • The Quantum Multi-Level Markov Process provides the state transition probabilities for the PPO reinforcement learning model.
    • PPO then updates policies based on quantum-derived state transitions, ensuring that market fluctuations, competitive reactions, and customer behaviors are dynamically integrated into pricing and inventory decisions.
    • Quantum-inspired PPO optimization adapts dynamically to new market conditions by refining quantum probability distributions, allowing for faster convergence and more accurate strategy formulation.

Key Advantages of Quantum Multi-Level Markov Process in Dynamic Markets:

  • More Accurate Market State Predictions: By leveraging quantum probability amplitudes and interference effects, state transitions are predicted with greater precision than classical models.
  • Accounts for Non-Deterministic Decision-Making: Customers, competitors, and the overall market do not follow purely rational, deterministic choices, making quantum models more realistic.
  • Enhances PPO Learning Efficiency: By incorporating quantum state transitions, the reinforcement learning process adapts more dynamically to sudden market shifts and uncertainty.
  • Encodes Historical Dependencies Without Increasing Complexity: Unlike classical Markov models, which require tracking long histories, quantum models store past influences through entanglement without increasing computational burden.

In summary, by integrating Quantum Multi-Level Markov Processes (QMLMP) into PPO-based dynamic decision-making, the seller can optimize pricing and inventory strategies with unprecedented adaptability, precision, and robustness to uncertainty.

6. Optimization results of the proposed model using PPO

In this section, the optimization results of the proposed model using the Proximal Policy Optimization (PPO) reinforcement learning algorithm are examined. Given the complexities of customer behavior, market competition, and retailer interactions, the proposed model requires an optimization approach capable of adjusting pricing strategies and inventory management under uncertain and dynamic market conditions. The PPO algorithm, one of the advanced reinforcement learning methods, has been selected due to its high stability, flexibility in complex environments, and ability to learn from market interactions. In this study, PPO is integrated with Quantum Dynamic Game and Quantum Multi-Level Markov Process (QMLMP) to not only optimize decision-making policies but also improve the accuracy of predicting customer behavior and market fluctuations. By employing this approach, the model can progressively learn from market data and enhance pricing and inventory management over time. The charts presented in this section illustrate how the model refines its policies and achieves more optimal solutions. These visualizations demonstrate the model’s learning and optimization process over time.

Cumulative Reward Over Time Chart: This chart depicts how the model learns and improves over time. The upward trend in cumulative rewards indicates an increase in the model’s efficiency in optimal decision-making. This trend is clearly illustrated in Fig 1.

This chart illustrates how the model learns and improves over time. The upward trend in cumulative rewards indicates that the model gradually adopts better strategies and enhances its performance within the environment.

Policy Updates Over Episodes Chart:This chart, shown in Fig 2, represents the policy updates of the model over time. The gradual decrease in policy changes suggests that the model becomes more stable over time, requiring fewer adjustments while converging toward more optimal policies.

This chart illustrates the degree of policy changes over time. Initially, the variations are high because the model is still learning and adapting. However, as time progresses, the fluctuations decrease, indicating that the model is gradually converging toward an optimal policy. The reduction in volatility suggests that the model is approaching a stable and efficient decision-making strategy.

Inventory Efficiency Over Time Chart: This chart measures inventory management efficiency. If the fluctuations decrease and the values stabilize at an appropriate level, it indicates that the model has reached a balanced state in inventory management. Fig 3 effectively illustrates this trend.

Action Distribution Over Time Chart: This chart illustrates how the model distributes its decisions among different action choices during learning. Convergence to a specific distribution indicates successful learning. In other words, this chart shows how the model prioritizes different actions over time. If the model initially makes random decisions but gradually focuses on a specific set of actions, it means the learning process has been successful. However, if the model continues selecting actions randomly, it suggests that learning is incomplete. This trend is effectively depicted in Fig 4.

Value Function Convergence Chart: This chart represents the improvement in the model’s estimation of future value. The stabilization of the value function indicates that the model has successfully optimized long-term outcomes and is gradually converging toward better decision-making. If the value function gradually stabilizes, it means the model has learned how to optimize long-term rewards. However, high fluctuations in this chart may suggest a lack of convergence, indicating the need for better hyperparameter tuning. This trend is effectively illustrated in Fig 5.

Comparison of Pricing Strategies in Different Market Conditions: This visualization illustrates the impact of market conditions (boom and recession) on pricing decisions. The horizontal axis represents the months of the year (1–12), while the vertical axis shows the pricing strategy, which may indicate either the proposed price or the competitive price level. Two different scenarios are analyzed:

  • Boom Market Prices: In this case, prices experience significant fluctuations but generally follow an upward trend. Price increases during economic booms typically occur due to rising demand and increased consumer purchasing power.
  • Recession Market Prices: In this scenario, prices fluctuate initially but begin to decline after the eighth month. This price reduction is usually driven by decreasing demand and intensified competition among sellers during economic downturns.

Overall, Fig 6 demonstrates that pricing strategies behave differently in boom and recession conditions.

thumbnail
Fig 6. Comparison of Pricing Strategies in Different Market Conditions.

https://doi.org/10.1371/journal.pone.0333068.g006

  • During booms, prices increase due to higher demand.
  • During recessions, prices decline as sellers reduce prices and offer competitive discounts to maintain market share.

This analysis helps managers optimize their pricing strategies based on economic conditions and adapt their sales policies to market fluctuations.

In this section, we present a detailed analysis of the results obtained from the proposed quantum-based pricing and inventory optimization model. To assess the effectiveness of our approach, we compare the initial and optimized values for product prices and inventory levels in both physical and online sales channels. Table 7 and visual representations are provided to illustrate: Changes in product prices before and after optimization. Adjusted inventory levels in physical stores and online channels. The impact of optimization on stock distribution and cost efficiency. Table 7 provides a comparison of product prices before and after implementing the proposed model:

thumbnail
Table 7. Comparison of Product Prices and Inventory Levels Before and After Implementing the Proposed Model.

https://doi.org/10.1371/journal.pone.0333068.t007

As seen in Table 7, the model adjusted product prices based on demand elasticity. For instance, the price of the Laptop (Lenovo) increased by 4% due to low price sensitivity, whereas the price of Dress shirt (Zara) decreased by 8% to attract more price-sensitive customers.

Fig 7 demonstrates how the proposed model adjusts pricing dynamically, while Fig 8 highlights the redistribution of stock across different sales channels. These results confirm that the quantum-based reinforcement learning approach improves decision-making by optimizing both price and inventory management.

7. Comparison of the model with traditional methods

Traditional inventory management and pricing models typically rely on classical Markov chains and classical reinforcement learning. These models assume that customers make rational and deterministic decisions, predicting their behavior solely based on classical probability theory. However, behavioral psychology and cognitive economics research suggests that customer decisions are influenced by uncertainty, emotions, and complex social interactions. The proposed model overcomes the limitations of classical models by integrating Quantum Decision Theory (QDT), Quantum Markov Chains (QMC), and Reinforcement Learning (RL) to better capture cognitive uncertainty in customer behavior.

Key Advantages of the Quantum Model Over Traditional Methods:

  1. Quantum Superposition: Customers can exist in both a purchasing and non-purchasing state simultaneously until an external factor (e.g., an advertisement or price change) collapses the decision into a final state.
  2. Observer Effect: Observing new prices or advertisements alters the cognitive state of customers, thereby changing their purchase behavior.
  3. Interference Effect: Unexpected interactions between purchase options dynamically influence customer decision-making.
  4. Quantum Reinforcement Learning (QRL) Optimization: Unlike traditional methods, the model learns from market feedback and dynamically optimizes pricing and inventory policies adaptively.

Numerical results and simulations indicate that the proposed model, compared to classical methods, leads to higher accuracy in predicting customer behavior, improved profitability, and reduced inventory management costs. Therefore, integrating quantum methods with reinforcement learning not only enhances decision-making performance in omnichannel retail but also provides a new framework for analyzing and managing customer behavior in uncertain markets. A comparison of the performance of traditional and quantum methods is presented in Tables 8 and 9.

thumbnail
Table 9. Performance Comparison of Quantum and Traditional Models.

https://doi.org/10.1371/journal.pone.0333068.t009

Key Findings from this Comparison:

  1. Higher Accuracy in Customer Behavior Prediction: The quantum model has 10% less error in predicting customer purchases.
  2. Increased Profitability: The quantum model generates 23% more profit compared to the traditional model.
  3. Reduced Inventory Holding Costs: Due to better demand forecasting and storage decisions, inventory management costs are reduced by 27%.
  4. Faster Convergence: The quantum model reaches optimal policies 37% faster than the traditional model, reducing training time.
  5. Lower Stockout Rate (58% Reduction): Customers face fewer stock shortages, leading to increased sales and higher customer satisfaction.
  6. Faster Response to Competitors: The quantum model reacts 70% faster to competitor price changes, demonstrating its high adaptability in competitive markets.

Compared to traditional models, the proposed quantum model shows significantly better performance: customer behavior prediction error is reduced from 18% to 8%, profit increases by 23%, inventory costs drop by 27%, and convergence is 37% faster. These results clearly demonstrate the advantage of integrating quantum decision theory with reinforcement learning. The comparison results indicate that the proposed model, based on reinforcement learning and quantum decision theory, outperforms traditional models. These improvements include higher accuracy in predicting customer behavior, increased profitability, reduced inventory management costs, faster convergence, and quicker adaptation to market changes. In a dynamic retail environment characterized by uncertainty and complex retailer interactions, adopting a quantum-based approach enhances decision-making and profitability. Fig 9 visually illustrates the key improvement percentages of the quantum model compared to the traditional model.

thumbnail
Fig 9. Performance improvement of proposed quantum model compared to traditional models.

https://doi.org/10.1371/journal.pone.0333068.g009

8. Sensitivity analysis

Sensitivity Analysis is a key component in evaluating decision-making models, showing how changes in key parameters affect pricing policies, inventory levels, and customer behavior. In this study, sensitivity analysis was conducted based on the following parameters:

  1. Price Elasticity of Demand: Examining customer sensitivity to price changes and its impact on sales volume.
  2. Advertising Intensity: Analyzing the effect of increasing or decreasing advertising efforts on customer purchase behavior and decision-making probability.
  3. Market Volatility: Assessing sudden demand fluctuations and their impact on model performance.
  4. Initial Inventory Level: Evaluating how changes in the initial inventory affect profitability and inventory management costs.
  5. Competitive Interactions: Studying the impact of competitor pricing changes on retailer pricing strategies.

The sensitivity analysis results indicate that the proposed model maintains optimal performance even with increasing market volatility and changes in price elasticity. Through adaptive learning, it dynamically adjusts its strategies under varying conditions. This is a key advantage of the quantum model over traditional methods, as conventional models typically require manual adjustments and redesigns when facing sudden market changes. However, the proposed model can automatically and intelligently optimize its strategies, making more dynamic decisions. Table 10 demonstrates the stability and effectiveness of the quantum model.

thumbnail
Table 10. Adaptive Analysis of Traditional and Quantum Models in Response to Market Changes.

https://doi.org/10.1371/journal.pone.0333068.t010

The Fig 10 shows that as market volatility increases, profits decrease for both models, but the quantum model remains more stable. In the traditional model, profits drop sharply with rising volatility, experiencing up to a 65% decline at 40% volatility. In contrast, the quantum model exhibits a much milder decline, with only a 30% profit reduction at the same level of volatility. This demonstrates that the quantum model offers greater resilience and stability in highly uncertain market conditions compared to the traditional model.

thumbnail
Fig 10. Profit Comparison in Traditional and Quantum Financial Models Under Market Volatility.

https://doi.org/10.1371/journal.pone.0333068.g010

9. Discussion and conclusion

This study introduces an innovative and advanced framework for optimizing dynamic pricing and inventory management in omnichannel retail, leveraging Quantum Decision Theory (QDT), Quantum Dynamic Games, Quantum Markov Chains (QMC), and Reinforcement Learning (RL). This approach marks a fundamental transformation in modeling customer decision-making and strategic management in business environments, enabling intelligent and adaptive decision-making in dynamic markets. Unlike classical models that assume deterministic customer behavior and competition, this model incorporates quantum superposition and interference effects in purchase decisions, while market state transitions are modeled using Quantum Markov Chains. The use of PPO reinforcement learning allows the retailer to continuously optimize pricing and inventory strategies based on market feedback, while Quantum Markov Chains enhance predictive accuracy by preserving historical dependencies through quantum entanglement. Additionally, Quantum Dynamic Game Theory enables retailers to strategically respond to competitors and increase market share. Key Advantages of the Proposed Model:

  • Highly Accurate Customer Behavior Prediction: The quantum model effectively simulates irrational and fluctuating customer behaviors, reducing demand forecasting errors by 10% compared to traditional methods.
  • Significant Profitability Boost: Through optimized pricing and inventory management, this model has increased retailer profits by 23%, providing a major competitive advantage.
  • Advanced Inventory Management and Cost Reduction: More precise demand forecasting has led to a 27% reduction in inventory holding costs, preventing excess stock accumulation.
  • Unmatched Flexibility in Volatile Markets: The quantum intelligence system reacts 70% faster to price changes than competitors, enabling optimal strategy adjustments in uncertain market conditions.
  • Integration of Quantum Models with Machine Learning: For the first time, this research combines quantum decision theories with reinforcement learning, creating a powerful analytical framework for advanced management strategies.

This novel approach outperforms classical methods by offering superior predictive accuracy, increased profitability, optimized inventory control, faster convergence, and enhanced adaptability in uncertain and competitive market environments.

The results of this study indicate that the combination of reinforcement learning, dynamic game theory, and quantum models can provide more accurate, adaptable, and optimized decision-making for inventory management and pricing in competitive markets compared to classical methods. The findings highlight the significant superiority of the proposed model over traditional approaches. The hybrid model, integrating Quantum Decision Theory (QDT), Quantum Markov Chains (QMC), and Quantum Reinforcement Learning (QRL), has demonstrated higher accuracy in predicting customer behavior, increased profitability, reduced inventory management costs, and improved responsiveness to market fluctuations. Data analysis and simulations show that this model: Reduces customer behavior prediction errors by 10%, Increases profitability by 23%, Decreases inventory holding costs by 27%, and Responds 70% faster to market fluctuations. These results confirm that applying quantum mechanics in decision-making can play a crucial role in optimizing pricing strategies and inventory management. Furthermore, comparing the proposed model with traditional Markov Chain-based models and conventional reinforcement learning reveals that the quantum model incorporates superposition, observer effects, and quantum interference in customer decision-making, whereas traditional models rely solely on classical probabilities and assume rational customer behavior. This distinction enables the quantum model to achieve higher prediction accuracy and greater stability in response to market fluctuations. Thus, this study demonstrates that integrating reinforcement learning with quantum models offers an innovative and effective approach to omnichannel retail management.

In today’s world, where businesses face a massive volume of dynamic data, market uncertainties, and unpredictable customer behavior, traditional methods are insufficient and unreliable. In this context, Quantum Decision Theory (QDT)—with its features of superposition, observer effect, and quantum interference—has emerged as a groundbreaking advancement in management science, marketing, and digital economics.

  • Pricing Management and Digital Marketing: Quantum models enhance demand forecasting accuracy and optimize pricing strategies, making them essential for both online and brick-and-mortar retailers.
  • Supply Chain and Logistics Optimization: Quantum theory enables companies to minimize risks related to supply and demand fluctuations and adopt more efficient inventory management strategies.
  • Sustainable Competitive Advantage in Global Markets: Businesses leveraging quantum models in decision-making can gain a competitive edge in the digital economy and respond to market changes faster than their rivals.

There are also limitations in implementing the model:

  1. High Computational Cost: Quantum models are more complex to execute compared to classical models.
  2. Need for High-Quality Data: The model requires precise and extensive data for effective learning.
  3. Challenges in Real-World Implementation: Deploying the model in commercial environments requires integration with existing systems.

Generalization capability:

The model performs well in markets with complete and omnichannel data, but in smaller markets or where data is insufficient, simplification may be necessary. Combining classical and quantum models (hybrid models) could enhance applicability across different industries.

To address these challenges, several solutions can be proposed, including:

  • Using dimensionality reduction and approximation techniques to lower computational costs.
  • Implementing data preprocessing methods and improving data collection to ensure access to high-quality information.
  • Developing hybrid models that combine traditional and quantum approaches for more practical and efficient implementation.

Utilizing Quantum Reinforcement Learning (QRL):

  • •Integrating the proposed model with quantum reinforcement learning algorithms can enhance prediction accuracy, reduce decision-making risks, and optimize multiple economic variables simultaneously.

Modeling Competitive Interactions Using Quantum Games:

  • In competitive environments, retailers need to optimize their strategies. Applying quantum game theory can better model competitor behavior and provide more precise strategic decisions.

Leveraging Quantum Artificial Intelligence in Business Management:

  • The future of management research and digital economy trends toward integrating artificial intelligence with quantum computing. This combination can enable advanced optimization in data management, information processing, and large-scale organizational decision-making.
  • Develop lighter and faster versions of the model using approximation methods and dimensionality reduction.
  • Employ hybrid models to reduce computational costs.
  • Enhance algorithms to operate effectively in environments with incomplete or noisy data.

References

  1. 1. Mou Y, Guan Z, Zhang J. Integrated optimization of assortment, inventory and pricing considering omnichannel retailer’s risk aversion and customer’s time preference. Expert Systems with Applications. 2024;237:121479.
  2. 2. Fang F, Nguyen T-D, Currie CSM. Joint pricing and inventory decisions for substitutable and perishable products under demand uncertainty. European Journal of Operational Research. 2021;293(2):594–602.
  3. 3. Sepehri A, Mishra U, Tseng M-L, Sarkar B. Joint Pricing and Inventory Model for Deteriorating Items with Maximum Lifetime and Controllable Carbon Emissions under Permissible Delay in Payments. Mathematics. 2021;9(5):470.
  4. 4. Bardhan S, Pal H, Giri BC. Optimal replenishment policy and preservation technology investment for a non-instantaneous deteriorating item with stock-dependent demand. Oper Res Int J. 2017;19(2):347–68.
  5. 5. Tiwari S, Cárdenas-Barrón LE, Goh M, Shaikh AA. Joint pricing and inventory model for deteriorating items with expiration dates and partial backlogging under two-level partial trade credits in supply chain. International Journal of Production Economics. 2018;200:16–36.
  6. 6. Liu S, Wang J, Wang R, Zhang Y, Song Y, Xing L. Data-driven dynamic pricing and inventory management of an omni-channel retailer in an uncertain demand environment. Expert Systems with Applications. 2024;244:122948.
  7. 7. Zhou Y-W, Zhang X, Zhong Y, Cao B, Cheng TCE. Dynamic pricing and cross-channel fulfillment for omnichannel retailing industry: An approximation policy and implications. Transportation Research Part E: Logistics and Transportation Review. 2021;156:102524.
  8. 8. Ma B, Mao B, Liu S, Meng F, Liu J. Managing physical inventory and return policies for omnichannel retailing. Computers & Industrial Engineering. 2024;190:109986.
  9. 9. Guchhait R, Bhattacharya S, Sarkar B, Gunasekaran A. Pricing strategy based on a stochastic problem with barter exchange under variable promotional effort for a retail channel. Journal of Retailing and Consumer Services. 2024;81:103954.
  10. 10. Qiu R, Ma L, Sun M. A robust omnichannel pricing and ordering optimization approach with return policies based on data-driven support vector clustering. European Journal of Operational Research. 2023;305(3):1337–54.
  11. 11. Xu J, Lyu G, Bai Q, Luo Q. Robust pricing and inventory strategies for an omnichannel retailer under carbon tax and cap-and-offset regulations. Computers & Industrial Engineering. 2023;185:109615.
  12. 12. Gupta VK, Ting QU, Tiwari MK. Multi-period price optimization problem for omnichannel retailers accounting for customer heterogeneity. International Journal of Production Economics. 2019;212:155–67.
  13. 13. Monroe KB. Pricing Making Profitable Decisions. 1990.
  14. 14. Nouri-Harzvili M, Hosseini-Motlagh S-M. Dynamic discount pricing in online retail systems: Effects of post-discount dynamic forces. Expert Systems with Applications. 2023;232:120864.
  15. 15. Anderson PL, McLellan RD, Overton JP, Wolfram GL. Price elasticity of demand. McKinac Center for Public Policy; 1997. Accessed October, 13(2).
  16. 16. Wenner LM. Expected prices as reference points—Theory and experiments. European Economic Review. 2015;75:60–79.
  17. 17. Rios JH, Vera JR. Dynamic pricing and inventory control for multiple products in a retail chain. Computers & Industrial Engineering. 2023;177:109065.
  18. 18. Temizöz T, Imdahl C, Dijkman R, Lamghari-Idrissi D, van Jaarsveld W. Deep Controlled Learning for Inventory Control. European Journal of Operational Research. 2025;324(1):104–17.
  19. 19. Wu Q, Liu X, Qin J, Wang W, Zhou L. A linguistic distribution behavioral multi-criteria group decision making model integrating extended generalized TODIM and quantum decision theory. Applied Soft Computing. 2021;98:106757.
  20. 20. Rika H, Aviv I, Weitzfeld R. Unleashing the Potentials of Quantum Probability Theory for Customer Experience Analytics. BDCC. 2022;6(4):135.
  21. 21. Meghdadi A, Akbarzadeh-T M-R, Javidan K. A Quantum-Like Model for Predicting Human Decisions in the Entangled Social Systems. IEEE Trans Cybern. 2022;52(7):5778–88. pmid:35044924
  22. 22. Tukel OI, Dixit A. Application of customer lifetime value model in make‐to‐order manufacturing. Journal of Business & Industrial Marketing. 2013;28(6):468–74.
  23. 23. AboElHamd E, Shamma HM, Saleh M. Maximizing customer lifetime value using dynamic programming: Theoretical and practical implications. Academy of Marketing Studies Journal. 2020;24(1):1–25.
  24. 24. Norouzi V. Predicting e-commerce CLV with neural networks: The role of NPS, ATV, and CES. Journal of Economy and Technology. 2024;2:174–89.
  25. 25. Nan D, Shin E, Barnett GA, Cheah S, Kim JH. Will coolness factors predict user satisfaction and loyalty? Evidence from an artificial neural network–structural equation model approach. Information Processing & Management. 2022;59(6):103108.
  26. 26. Lee C-H, Zhao X. Data Collection, data mining and transfer of learning based on customer temperament-centered complaint handling system and one-of-a-kind complaint handling dataset. Advanced Engineering Informatics. 2024;60:102520.
  27. 27. Singh B. Predicting airline passengers’ loyalty using artificial neural network theory. Journal of Air Transport Management. 2021;94:102080.
  28. 28. den Boer AV, Keskin NB. Dynamic Pricing with Demand Learning and Reference Effects. Management Science. 2022;68(10):7112–30.
  29. 29. Srinivasan D, Rajgarhia S, Radhakrishnan BM, Sharma A, Khincha HP. Game-Theory based dynamic pricing strategies for demand side management in smart grids. Energy. 2017;126:132–43.
  30. 30. Zhou Q, Yang Y, Fu S. Deep reinforcement learning approach for solving joint pricing and inventory problem with reference price effects. Expert Systems with Applications. 2022;195:116564.
  31. 31. Goedhart J, Haijema R, Akkerman R. Modelling the influence of returns for an omni-channel retailer. European Journal of Operational Research. 2023;306(3):1248–63.
  32. 32. White CJ, Tong E. On linking socioeconomic status to consumer loyalty behaviour. Journal of Retailing and Consumer Services. 2019;50:60–5.
  33. 33. Di Domenico SI, Fournier MA. Socioeconomic Status, Income Inequality, and Health Complaints: A Basic Psychological Needs Perspective. Soc Indic Res. 2014;119(3):1679–97.
  34. 34. Oliver RL. Whence Consumer Loyalty?. Journal of Marketing. 1999;63:33.
  35. 35. Ajzen I. The Theory of planned behavior. Organizational Behavior and Human Decision Processes. 1991.
  36. 36. Chintapalli P. Simultaneous pricing and inventory management of deteriorating perishable products. Ann Oper Res. 2014;229(1):287–301.
  37. 37. Alfares HK, Ghaithan AM. Inventory and pricing model with price-dependent demand, time-varying holding cost, and quantity discounts. Computers & Industrial Engineering. 2016;94:170–7.
  38. 38. Huang H, He Y, Li D. Pricing and inventory decisions in the food supply chain with production disruption and controllable deterioration. Journal of Cleaner Production. 2018;180:280–96.
  39. 39. Chen B, Chao X, Ahn H-S. Coordinating Pricing and Inventory Replenishment with Nonparametric Demand Learning. Operations Research. 2019.
  40. 40. Barman A, Das R, De PK. An analysis of optimal pricing strategy and inventory scheduling policy for a non-instantaneous deteriorating item in a two-layer supply chain. Appl Intell. 2021;52(4):4626–50.
  41. 41. Sun Y, Qiu R, Shao S, Sun M, Fan Z-P. Omnichannel strategies and data-driven robust inventory policies with demand uncertainties. Computers & Operations Research. 2025;173:106830.
  42. 42. Choudhury M, Das S, Weber G-W, Mahata GC. Carbon-regulated inventory pricing and ordering policies with marketing initiatives under order volume discounting scheme. Ann Oper Res. 2025.
  43. 43. Maihami R, Kannan D, Fattahi M, Bai C. Supply disruptions and the problem of pricing, advertising, and sourcing strategies in a retail duopoly market. Ann Oper Res. 2024;344(1):217–66.
  44. 44. Du Z, Li X, Fan Z-P. Inventory and pricing decisions of brand owners and streamers under demand uncertainty. IMDS. 2025;125(3):1052–77.
  45. 45. Busemeyer JR, Bruza PD. Quantum models of cognition and decision. Cambridge University Press; 2012.
  46. 46. Yukalov VI, Sornette D. Preference reversal in quantum decision theory. Front Psychol. 2015;6:1538. pmid:26500592
  47. 47. Zurek WH. Decoherence, einselection, and the quantum origins of the classical. Rev Mod Phys. 2003;75(3):715–75.
  48. 48. Feynman RP, Leighton RB, Sands M, Hafner EM. The feynman lectures on physics; vol. i. American Journal of Physics. 1965;33(9):750–2.
  49. 49. Griffiths DJ, Schroeter DF. Introduction to quantum mechanics 3rd ed. 2018.
  50. 50. McFadden D. Conditional logit analysis of qualitative choice behavior. 1972.
  51. 51. Pangarkar A, Arora V, Shukla Y. Exploring phygital omnichannel luxury retailing for immersive customer experience: The role of rapport and social engagement. Journal of Retailing and Consumer Services. 2022;68:103001.
  52. 52. Karunakaran AR. A data-driven approach for optimizing omni-channel pricing strategies through machine learning. Journal of Artificial Intelligence Research and Applications. 2023;3(2):588–630.
  53. 53. Zhang X, Park Y, Park J, Zhang H. Demonstrating the influencing factors and outcomes of customer experience in omnichannel retail. Journal of Retailing and Consumer Services. 2024;77:103622.
  54. 54. Malik V, Mittal R, Chaudhry R, Yadav SA. Predicting Purchases and Personalizing the Customer Journey with Artificial Intelligence. In 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE; 2024. p.1–5.
  55. 55. Brahma PR, Revi KN. Dynamic Modeling of Brand Loyalty in Retail: A Semi-Supervised Approach Incorporating Temporal Effects and Purchase Behavior Sequences. In 2024 IEEE 9th International Conference for Convergence in Technology (I2CT). IEEE; 2024. p. 1–7.
  56. 56. Zheng L, Li Y. Customer journey design in omnichannel retailing: Examining the effect of autonomy-competence-relatedness in brand relationship building. Journal of Retailing and Consumer Services. 2024;78:103776.
  57. 57. Grewal D, Roggeveen AL, Benoit S, Lucila Osorio Andrade M, Wetzels R, Wetzels M. A new era of technology-infused retailing. Journal of Business Research. 2025;188:115095.
  58. 58. Flacandji M, Vlad M, Lunardo R. The effects of retail apps on shopping well-being and loyalty intention: A matter of competence more than autonomy. Journal of Retailing and Consumer Services. 2024;78:103762.