Impact of incentive and selection strength on green technology innovation in Moran process

Methods of previous researches on green technology innovation will have difficulty in finite population. One solution is the use of stochastic evolutionary game dynamic-Moran process. In this paper we study stochastic dynamic games about green technology innovation with a two-stage free riding problem. Results illustrate the incentive and selection strength play positive roles in promoting participant to be more useful to society, but with threshold effect: too slighted strength makes no effect due to the randomness of the evolution process in finite population. Two-stage free riding problem can be solved with the use of inequality incentives, however, higher inequality can make policy achieves faster but more unstable, so there would be an optimal range. In this paper we provided the key variables of green technology innovation incentive and principles for the environmental regulation policy making. Also reminded that it’s difficult to formulate policies reasonably and make them achieve the expected results.


Introduction
Since the publication of 《game theory and economic behavior》 by Von Neumann & Morgenstern [1] in 1944, people have begun to analyze the conflict and competition in politics, economy and social according to the game method. How to promote cooperation and altruistic behavior has always been the focus of game theory research [2] [3].
But a lot of problems cann't be explained under the assumption of rational man, even should not have occurred such as the tragedy of commons. In 1957, concept of bounded rationality was put forward by Simon [4]. Smith [5] developed evolutionary game theory in 1982 and proposed a practical tool for studying the dynamics of natural selection: evolutionary stable strategy, to represent the stable state of evolutionary game. In 1978, Taylor & Jonker [6] used replicator dynamic to represent the dynamic convergence process to a stable state. Evolutionary game theory is of great practical significance, which provides a great use for biology [7] and various social sciences especially economics [8].
Evolutionary game dynamics involves deterministic and stochastic evolutionary game dynamics [9] [10] [11] [12]. Deterministic evolutionary game dynamics studies the mixed a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 infinite population, usually the participant's attribute influences and determines the game strategy and the successful strategy spreads in the group [13], which can be described by replicator dynamics equation [14] [15] [16].
Green Technology innovation refers to the process for enterprises to replace the original products with more environmentally friendly products by updating process and reforming equipment under the pressure of external system or market demand and other induced factors [17]. It promotes sustainable development by reducing pollutant emissions, improving fuel combustion efficiency and changing enterprise production mode [18] [19] [20] [21]. However, as the supplier of products, enterprises are lacking the enthusiasm and initiative of green technological innovation when facing with high risks of technological innovation, high investment in R&D and the uncertain factors in the market [22]. Moreover, the production activities of enterprises are often locked in the high carbon mode due to the effect of technological lock-in [23]. Reasonable environmental regulation policy is conducive to innovation compensation effect [24] to stimulate the innovation behavior of enterprises, the theory has been proved to be effective in empirical researches [25]: Testa [26] found that environmental protection standards will stimulate technology research and development of enterprises, Rassier [27] found that regulatory system can promote new technology R&D of enterprises and have a positive impact on business performance.
In recent years, many scholars have studied the relationship between environmental regulation and enterprise technological innovation using evolutionary game method, mainly involving economic incentives and institutional designs [28] [29] [30]: Estalaki [31] used the heuristic game optimization method to study the river water quality management. The analysis shows that the reward and punishment system has a certain impact on the quality of water. Huang [32] studied the supply chain model under the government subsidy mechanism and the centralized control mechanism without subsidy based on the duopoly environment and found that compared with the centralized control, incentive is more effective in promoting technological innovation and protecting the environment. Wang [33] set up a dynamic game model of government and enterprise to study the impact of different government subsidies on green technology innovation decision-making in different stages of innovation. Krass [34] built a Stackelberg game model between the government and enterprises and found that the combination of environmental taxes and subsidies can encourage enterprises to adopt low-carbon emission reduction technology. Enterprises will choose different green innovation modes mainly affected by the government green innovation subsidies, carbon tax and other environmental policies [35]. However, single environmental regulation method cannot stimulate more green innovation, only a variety of methods can produce better results [36].
Existing studies in this field have mainly focused on replication dynamic equation. Though it has good mathematical characteristics, it can only describe the deterministic evolutionary dynamics in infinite population [37] [38]. The randomness in finite population plays an important role so it will bring noise interference if we continue to use the replicator dynamics equation, and the long-term stable state will cannot be captured. The result in infinite population cannot be simply recursive in finite population, therefore stochastic evolution dynamic game model which it is more realistic to explore the situation in finite population is developed [39] [40] [41] [42] [43]. The evolutionary game dynamics in finite population is described as a stochastic process based on the assumption of bounded rationality and insufficient information, the stochastic process method is used to analyze the process of reaching equilibrium [44].
Previous studies of stochastic evolutionary game in finite population mainly focused on the renewal mechanism of research strategy: synchronous update process and asynchronous update process [45]. The former one which mainly including Moran process and Fermi process refers to that a participant is selected to produce a replica according to its fitness and randomly replace a participant in the population at each time step, so that the population remains unchanged [46] [47]. The latter one which mainly including Wright-Fisher process refers that all participants produce replicas at the same time and then select the next generation from the replicas, so that the population remains unchanged [48] [49].
For the situation of this paper, each participant can make decision at any time independently, so synchronous update process is more likely to be used. Further, replacement possibilities in Fermi process are obtained by comparing participants in pairs [50], whose in Moran's are by comparing to the other N − 1 participants [51]. Moran process is more in line with the requirements in fully mixed finite population in this paper.
Moran process describes the dynamic process following Markov state transition matrix [52]. Taylor showed up the difference between the evolution in Moran process and the replicator dynamics equation in infinite population. Nowak [53] gave the expression of the expected return of all kinds of participants and deduce the changes of replacement probabilities before and after the process. Traulsen [54] explored the functions of the expected return under strong and weak strength of selection. Moran process is not only widely used in sociology and theoretical biology [55], also in behavioral economics: Chai [56] studied the evolution of gross income maximization strategy and net profit maximization strategy in the process of manufacturing strategy selection. Wang [57] applied Moran process to the evolution of consumer crowdfunding strategy selection.
To address the existing knowledge gaps in the evolutionary game of green technology innovation, two aims are mentioned while using a number of different parameters and combinations in Moran process model: (1) Summarize the dynamic nature of the relationship between benefits of different strategies and the probabilities of selecting them. (2) Determine how the strength of natural selection, the strength of incentives and differential coefficient are related to variations in participants strategies over time. Those highlight how the importance of properly designed policies, that is, how the benefits of participants should be consistent with the input, rather than promoting free riding strategies.
In Chapter 2 we introduce the basic settings and quantify the benefits of different participants in each period. This is followed by the Moran process method, simulations and key conclusions with results in Chapter 3.1, 3.2 and Chapter 4, respectively. The theoretical and practical implications, implications and the future research directions are involved in Chapter 5.

Model
In this study, we establish a two-stage stochastic evolutionary game model in finite population, each participant can choose strategy "innovate" or strategy "do not innovate" in the first one, when all participants choose innovate, strategy "leading" or strategy "following" will be chosen in the second stage.

The first stage (t 0 -t 1 )
We assume the green technology innovation reward takes path of the marginal cost reduction in producing process. The marginal cost for a non-innovator is C 1 (1<C 1 ) and for innovators reduces to 1. The innovators have absolute dominance and form a monopoly during this period. By setting the price to C 1 , the innovating reward is The term R is used here to refer to all the reward through green technology for the society, and term a (0 < a < 1) is the proportion of innovators' share of the reward R.
Ecological damage can be regarded as public goods with negative externalities, so if there is no innovator, each participant bears a loss −R � because no one promotes green progress [58],

The second stage (t 1 -t 2 )
Analogy with the assumption of Chapter2.1, the marginal cost for both leading innovators and following innovators is 1, the price is still set to C 1 . Reward R is divided equally between the leading innovator and the following innovator Where b is the proportion of two kinds of innovators' total share of the reward R after the market changed from absolute monopoly to duopoly. Since the intervention of following innovators has a crowding-out effect on the profits of the leading innovators, 0 < ab < 2a.
The total discounted utility of the leading innovators used the expression of discount utility of technology patents defined by An [59], it is assumed t 0 = 0 and all types of selectors are riskneutral, the preference function is linear and the coefficient is 1.
Where r is the discount rate of future utility. And the total discounted utility for following innovators: Computational method and simulation

The Moran process
In order to inspire more participants to choose "innovate" strategy, an incentive reward which can be gotten after innovate successfully for participant is considered in this model. We consider 2 × 2 strategies in finite population N. The payoff matrix of strategies symmetric game between "innovate(V)" and "do not innovate(D)" given by Where C is the input of R&D on green technology innovation, P(C) is the probability to succeed According to the Table 1. when the number of "innovate" participants is i and of "do not innovate" participants is N − i in a fully mixed population sized N, the expected returns of "V" Table 1. Payoff matrix of "V" and "D".

Innovative(V)
PðCÞ participants and "D" participants are In evolutionary game algorithm, the fitness of the strategy depends on the expected return of the strategy, the diffusion rate of the strategy is positively related to the expected return of the corresponding strategy. At present, there are two ways to express the fitness: linear mapping and exponential mapping. Although exponential mapping is more advantageous under strong selection intensity [60], more evidence shows that participants make decisions under weak selection intensity [61] [62]. Moreover, frequency-dependent Moran processes [63] has convenient properties for the analysis of weak selection, different mapping forms of fitness will not change the qualitative results, but only the diffusion speed of process, so linear mapping is still worked and widely used [64].
Therefore, we choose linear mapping form, the fitness of strategy V and strategy D is the linear function of their expected return [65] Where u 2 (0,1) is the strength of natural selection, which indicates the sensitivity of the participant to the payoffs with different strategies.
At every time step it will take fitness as probability to generate a new participant to randomly replace an old selector. As we pointed above, the probability of the new participant choosing strategy "V" is if V if V þðNÀ iÞf D , and the probability of the new participant choosing strategy "D" is ðNÀ iÞf D if V þðNÀ iÞf D , and there is 1% mutation rate. So at every time step in the frequency-dependent Moran process, the number of "innovate" participants maybe adds 1/reduces 1/holds, but the population size is always N. Therfore, the Markov probability transfer matrix of Moran process is a tridiagonal matrix, the diagonal element is and the other elements are 0.
Moran process has two stable states: i = 0, and i = N, which means all participants will all choose strategy "V" or strategy "D". Hence we can get from the total probability formula Where F i,V is the distribution function from the initial state i to all N participants choose strategy "innovate V".
If the initial state is i = 1, the replacement probability of strategy "V" will be If the initial state is i = N − 1, the replacement probability of strategy "D" will be The ratio of fixed point probability of strategy "V" and strategy "D" Q V Q D > 1 means strategy "V" has more invasion dynamic to invade strategy "D", strategy "V" is more likely to defeat strategy "D" and become evolutionary stability strategy. As a result, The disadvantaged position of strategy "D" in the evolution process will eventually make them completely replaced by strategy "V", all the participants will become innovators. Moreover, what is more realistic is that nonobjective factors such as decision makers' emotions, preferences, social responsibility affect their decision-making, which is not entirely based on expected return, so in this evolutionary game model, we consider the evolution trend of strategy choice under weak selection(u ! 0). Taylor expansion of the solution above at u ! 0 Conclusion 1: strategy "V" invades with strength of natural selection increases.
Extract the common factor containing T Conclusion 2: strategy "V" invades with the strength of incentives increases. Further more, strategy "V" becomes the evolutionary stability strategy, all the participants choose to innovate, there is still a problem called second-order free riding problem in the second stage: Following innovators are not willing to pay R&D costs but still want to take benefits by following to leading innovators with little cost. The central argument of free riding theory is that once public good exists, every member of society can enjoy the benefits whether they have contributed to it or not. This characteristic of public goods determines that everyone in a group of rational people may want others to work hard to achieve the goal, while he or she will enjoy it.
But if all the participants want to be the following innovators, they will all become noninnovators as a result. There is additional incentive method needed. Olson [66] put forward a series of ways to solve the free riding dilemma. The basic idea is that though public goods provide a collective incentive, it's not enough for a rational person to strive for certain public goods, selective incentive is necessary. Selective incentive means that you will lose or be not qualified to get something if don't participate in an action. There are several kinds of selective incentive, which of them most workable in this situation is "principle of inequality" [67]. If an individual or a small group can get more rewarded from making a direct contribution independently, then may contribute to a certain cause alone. We set the differential coefficient μ 2 [1,2]: the incentives giving to leading innovators are T 1 = μT, and the incentives giving to following innovators are T 2 = (2 − μ)T, thus Still consider a 2 × 2 strategies in size N. The payoff matrix of strategies symmetric game between "innovate(L)" and "do not innovate(F)" given by However, followers may misreport that he/she is a leading innovator when asymmetric information exists, so the cost and accuracy of judging whether the information is true or false will also seriously affect the final results. It should be valued in practice though this problem is not in the scope of this article.
According to the Table 2. when the number of "leading" participants is i and of "following" participants is N − i, the expected returns of "L" participants and "F" participants are i ¼ 1; 2; 3; . . . ; N À 1 After calculations which are similar to (7)- (17), the ratio of fixed point probability of strategy "L" and strategy "F" Q L Q F > 1 means strategy "L" has more invasion dynamic to invade strategy "F", strategy "L" is more likely to defeat strategy "F" and become evolutionary stability strategy. As a result, the disadvantaged position of strategy "F" in the evolution process will eventually make them completely replaced by strategy "L", all the participants will become leading innovators. As we pointed above, what is more realistic is under weak selection(u ! 0). Taylor expansion of the solution above at u ! 0 PðCÞ Extract the common factor containing T or μ All the previous conclusions still hold. And Conclusion 3. strategy "L" invades with the differential coefficient increases. Table 2. Payoff matrix of "L" and "F".

Simulation
The model has been implemented in R studio(version 3.6.3) and cascading style sheets(CSS), All the R code, server code and dataset are all attached. The relationship between evolutionary stability strategy and the strength of incentive T can be simulated according the Table 1(N = 100, C 1 = 2, a = 0.4, b = 0.7, R = 1000, C = 100, P(C) = 0.4, r = 0.015, t 1 = 100, t 2 = 300, −R � = −100, mutation rate = 0.01) .  Fig 1 has shown that the dataset has a negative intercept, which means incentive has threshold effect, when the advantage of innovate is small enough, it may be offset by the randomness of the evolution process in finite population [68] as we discussed in the Chapter1.
Conclusion 4: Strength of incentive T has threshold effect. Further, the relationship between evolutionary stability strategy and coefficient μ under different strength of incentive T can be simulated according the Table 2 We take [T = 85, μ = 1.297, ESS = (0,0.6774,1)] in Fig 2 as an example, different u = 0.5/0.9 leads to different invasion dynamics and replacement probabilities(The strength of natural selection doesn't change the ESS) in Table 3.
Higher u leads to a higher replacement probabitity and a higher invasion dynamic compared with the lower one in Table 3, which confirmed conclusion 1.

PLOS ONE
Impact of incentive and selection strength on green technology innovation in Moran process

Conclusion 5:
With the same ESS, combination which has higher differential coefficient μ makes strategy "L" invades faster but more unstable than others.

Results
With the equations derived in Chapter 3.1, we can summarize the dynamic nature of the relationship between benefits of different strategies and the probabilities of selecting them Further more, the strength of natural selection, the strength of incentives and differential coefficient are related to variations in participants strategies over time.
In the first-order free riding problem, the strength of incentive T and strength of natural selection u both play a positive role in promoting strategy "V" to become evolutionary stable strategy. The simulation results are also consistent with the conclusions, more specifically, strategies with higher T or u has a higher invasion dynamic and replacement probability.
The previous conclusions still hold in the second-order free riding problem, we also find strategy "L" invades with the differential coefficient increases. So the strength of incentive T, strength of natural selection u and differential coefficient μ all play a positive role in promoting strategy "L" to become evolutionary stable strategy. The simulation results are still consistent with those, higher T, u, μ may have a higher invasion dynamic and replacement probability.
Strength of incentive T has threshold effect, too slight T makes no effect due to the randomness of the evolution process in finite size. Higher T often leads to better results, however, it is unwise to blindly increase T for it would be a waste of limited public resources. More micro observation perspectives and complex research methods are needed to deeper analysis.
We also find that different combinations of incentive methods μT may have the same evolutionary stable strategy. But the combination with higher μ has a higher invasion dynamic but lower replacement probability, which can make policy achieves the evolutionary stable strategy faster but more unstable. It is consistent with theory of public goods: collective incentive is not enough to strive for certain public goods, in contrast, though it deviates from the principle of fairness and even social stability, the reward gotten by an individual in an organization may become a selective incentive mechanism to make more contributions to the organization [69] [70].

Discussion
This paper presented a two-stage stochastic evolutionary game model in finite population, each participant can choose strategy "V" or strategy "D", strategy "L" or strategy "F" respectively in two stages. According to the birth-death algorithm of Moran process, we constructed the linear fitness equation of strategies to describe the expected payoff, and used the Markov probability transfer matrix of stochastic process to calculate the probability of strategy "V" and strategy "L" becoming the final result under different T, u, μ combinations. The simulation produces results which are consistent with the theoretical conclusions, moreover proved that there is threshold effect and optimal range in the selection of policy combinations.
In practical implications, we closed the gap of research on the analysis of green technology innovation by using stochastic evolutionary game, provided the key variables of green technology innovation incentive and principles for the environmental regulation policy making.
In a broader sense, this paper illustrates that economic society is a non-linear complex system, which means that policies aimed at promoting a particular aspect often produces unexpected results [71] [72], so it is very difficult to formulate policies reasonably and make them achieve the expected results [73]. Therefore, Professionalism and foresight are needed, any kind of policy should be carefully chosen.
Although this paper uses complicated interdisciplinary computing method in order to be as close to reality as possible, there are still some aspects that have not been mentioned, which would be our future research directions: In reality, preference changes: participants can acquire new preferences or modify existing ones after learning the relationship between their strategy and rewarding [74]. A few assumptions of preference changes have been implicated in models of decision-making in psychology and behavioral economics [75] [80]. Besides, in large-scale group decision making methods, it is more realistic to assume that each participant has only a limited number of contacts which will be updated according to some rules, therefore participants would be divided into smaller subgroups and some of them may be clustered. Social network relations have been taken into consideration in some researches [81] [82] [83] [84] [85]. If we can simulate group decision-making process by using the fuzzy or variable preferences and social network model including hierarchy and aggregation structure [86], it will help us to understand the evolution process and promotion mechanism of behavior in more practical and intuitive aspects.