Maintaining vs. milking good reputation when customer feedback is inaccurate

In Internet transactions, customers and service providers often interact once and anonymously. To prevent deceptive behavior a reputation system is particularly important to reduce information asymmetries about the quality of the offered product or service. In this study we examine the effectiveness of a reputation system to reduce information asymmetries when customers may make mistakes in judging the provided service quality. In our model, a service provider makes strategic quality choices and short-lived customers are asked to evaluate the observed quality by providing ratings to a reputation system. The customer is not able to always evaluate the service quality correctly and possibly submits an erroneous rating according to a predefined probability. Considering reputation profiles of the last three sales, within the theoretical model we derive that the service provider’s dichotomous quality decisions are independent of the reputation profile and depend only on the probabilities of receiving positive and negative ratings when providing low or high quality. Thus, a service provider optimally either maintains a good reputation or completely refrains from any reputation building process. However, when mapping our theoretical model to an experimental design we find that a significant share of subjects in the role of the service provider deviates from optimal behavior and chooses actions which are conditional on the current reputation profile. With respect to these individual quality choices we see that subjects use milking strategies which means that they exploit a good reputation. In particular, if the sales price is high, low quality is delivered until the price drops below a certain threshold, and then high quality is chosen until the price increases again.


S1 Appendix
Formal description of the Markovian Decision Process and k-responsive Strategies We give rigorous definitions of the stationary MDP that describes the service provider's repeated decision problem and strategic behavior therein. Generally, a stationary MDP is described by a tuple (S, A, q, π, δ). S is a finite set of states, A a finite set of actions.
Transition probabilities are described by the function q : S × S × A → [0, 1] with s ∈S q(s , s, a) = 1: i.e., q(s , s, a) is the probability that the process migrates to state s when the current state is s and action a is taken. Phrased differently, q(·, s, a) is a probability distribution over states conditioned on current state s and action a. To emphasize conditional probabilities, we write q(s |s, a) instead of q(s , s, a). The function π : S × A → R takes each pair of (current) state and action to a payoff to the decision maker describing immediate rewards. That means, π(s, a) is the payoff when choosing action a ∈ A at current state s ∈ S.
Finally, δ ∈ (0, 1) denotes the decider's discount factor. In what follows, t = 0, 1, 2, . . . denotes the time index. Stationarity of the MDP in particular means that all ingredients of the MDP are independent of time and need not be indexed with t.
We now specify the general model for our purposes. A state is a vector of the r most recent customer ratings, i.e., the state space is S = {+, −} r . At each time t the service provider decides on either producing a service in high (H) or low (L) quality at a cost of cH or cL, respectively. Hence, the set of actions is A = {H, L}. Transition probabilities q depend on the rating probabilities α and β. For any state s ∈ S there are exactly two possible next states, depending on whether a positive or negative rating is given by the customer (see Fig 1).
To specify immediate rewards π, we first formalize the dependency of the sales price on reputation profiles. Let si for i = 1, 2, 3 denote the ith entry of the reputation profile s and let σ + (s) := |{i : si = +}| and σ − (s) := |{i : si = −}| be the numbers of positive and negative ratings in state s, respectively. Further, define σ(s) := σ + (s) − σ − (s). Note that σ(s) can only carry values in {3, 1, −1, −3}. The sales price P (s) in state s is given by P (s) =P + γσ(s) with a fixed base priceP and a positive constant γ (price increment). Then the immediate reward (from sales) of the service provider depends on the current state and chosen action and is given by π(s, H) = P (s) − cH and π(s, L) = P (s) − cL.
A pure Markovian strategy (or policy) F = (ft) t∈N specifies for each period t and each state s an action ft(s) that the service provider takes, when s is the current state in period t, i.e., ft : S → A. A Markovian strategy is stationary if ft = ft+1 for all t, i.e., if the chosen actions only depend on the state but not on the time. For a pure Markovian strategy F = (ft) t∈N define Q(ft) to be the |S| × |S| matrix containing the transition probabilities and π(ft) to be the vector in R |S| of immediate rewards induced by the actions in ft for all periods t. More precisely, the entry in Q(ft) at (s, s ) is given by q(s |s, ft(s)), while the coordinate corresponding to state s in π(ft) is π (s, ft(s)). Denote by Qt(F ) = Q(f1) · · · Q(ft) the product of transition matrices until t. The vector in R |S| of discounted expected profit from using the pure Markovian strategy F = (ft) t∈N is given by Π From [20, Corollary of Theorem 3] we know that there exists an optimal pure Markovian strategy that is stationary, and which can be determined by means of an algorithm. To verify whether a stationary strategy F * = (f * ) t∈N is optimal, we only need to ensure that for all g : S → A the inequality Π (F * ) ≥ Π ((g, F * )) = π(g) + δQ(g)Π (F * ) holds, where (g, F * ) is the strategy in which the decision maker deviates from the actions specified in f * to g in the first period and uses f * in all subsequent rounds. The inequality says that such one-time deviations do not pay off. This finding is used in the proof of Proposition 1.

Proof of Proposition 1
Let F = (f ) t∈N denote a stationary pure Markovian strategy that is independent of the state: i.e., the values f (s) are all either "H" or "L" for any state s. Hence, the probability to obtain a positive rating is given by p f and the probability for a negative rating is 1 − p f . Precisely, Then, the transition matrix is Lemma 1 (Powers of the Transition Matrix). The sequence of powers of the transition matrix for a state-independent stationary pure Markovian strategy is constant after three periods.
Proof of Lemma 1. The entries of the transition matrix are the probabilities to move from one state to another within exactly one time step. Similarly, if we consider n time steps then the entries of the nth power of the transition matrix are the probabilities to move in exactly n time steps from one state to another. As the considered strategy is independent of the state, we have by construction in every state an identical probability to obtain a positive rating.
Therefore, in exactly n time steps for n ≥ 3 we are able to get every reputation profile containing three specified ratings with the following probabilities: Proof of Proposition 1. Let FH = (fH ) t∈N denote the stationary pure Markovian strategy where the service provider always chooses quality H. Then, the immediate reward and the transition matrix are given by Applying Lemma 1 the infinite horizon payoff for strategy fH is given by For a stationary pure Markovian strategy G = (g) t∈N let Q(g) denote the transition matrix, where pg k is the probability for a positive evaluation, when s is the state corresponding to row k and action g(s) ∈ {H, L} is played, and let π(g) be the immediate reward, i.e., In the following, we show that for all g : holds, if and only if ( ) is satisfied. Here (g, fH ) is the strategy in which g is played in the first period and fH is played in all subsequent periods.
Note that the entries in each row of the matrix Q(fH ) sum up to one and in each row of Q(fH ) − Q(g) to zero. By writing we obtain δ (Q(fH ) − Q(g)) I + δQ(fH ) + δ 2 (Q(fH )) 2 P − cH e = δ P − cH 1 + δ + δ 2 (Q(fH ) − Q(g)) e = 0 (4) Therefore, we look at Thus, using (4) and (5) to simplify (3) we have for g = fL which means that independent of the current state, switching to action L does not strictly pay off, if and only if cL − cH + 2γδ (α − β) 1 + δ + δ 2 ≥ 0 holds, which is equivalent to ( ).

Responsive stationary pure Markovian strategies
The k-responsive strategies for reputation profiles consisting of the last three ratings are illustrated graphically in Fig A and analytically in Table A.  Besides the regressions for the analysis of the complete sample we also run them for the sub-samples which -according to the Brier score -play the reported strategy. When focusing on this sub-sample of subjects who pursued the milking strategies the estimates are even more pronounced. Note: Odds ratios were calculated, robust standard errors are reported in parentheses. Missing results due to low sample size are denoted by "-". Significance at the 1%, 5%, and 10% level is denoted by ***, ** and *, respectively.

Instructions (English version)
The instructions for all treatments and the questionnaire were originally in German. The English instructions presented here are a translation of the originally used ones. In each session the experiment was conducted twice with two different treatments. The sessions have been organized in four different constellations: That is T1/T3, T3/T1, T2/T4 and T4/T2. This procedure implies that we have observations for each treatment in the first and second position.

General instructions
Today, two experiments will be carried out. The experiments are independent of each other.
Hence, your decisions in experiment 1 do not influence your payments in experiment 2 and vice versa. Only one of the experiments will be paid. First, you will receive just the instructions for experiment 1. After its completion we will distribute the instructions for experiment 2. After both experiments are finished, we would like to ask you to fill out a questionnaire. For this you will receive a short instruction as soon as both experiments will have ended. For your responses there is no "right" or "wrong". The answers of the questionnaire have no influence on your payment.

Payments:
After both experiments are finished there will be a random decision on which of the experiments will be paid. One participant of the room will be asked to throw a dice representatively for all participants. In case of a 1, 2, and 3 all decisions from experiment 1 will be paid for all participants; whereas in case of a 4, 5, and 6 all decisions from experiment 2 will be paid for all participants. The according payments are explained in the related instructions.
Please note: • During the entire experiment, any form of communication is not permitted.
• All mobile phones must be switched off during the complete duration of the experiment.
• The decisions you make within this experiment are anonymous: i.e., none of the other participants gets to know the identity of a person who has made a specific decision.
• Please remain seated until the end of the experiment.
• You will be called forward for your payment by your seat number.
Good luck and thank you for your participation in this experiment!

Instructions for experiment [1/2]
• During the experiment all payments are stated in the fictitious currency "Taler".
• Your initial credit is 225 Taler.
• The experiment consists of at least 40 and at most 45 periods. At the beginning of period 41 until period 45 the experiment either continues with a probability of 50% or is terminated with a probability of 50%. This is decided by fair dice. In case of a 1, 2 or 3 the experiment continues and in case of a 4, 5 or 6 the experiment ends. The number rolled on the dice will be displayed on the screen.
• In none of the periods does your payment depend on the decisions of the other participants.

Course of the experiment
Imagine you are a service provider and in each period you sell a service that you produce yourself. In each period you decide if you produce your service in high or low quality.
Producing high quality costs you 50 Taler. Producing low quality costs you 35 Taler. In each period there is a customer who wants to purchase a service with high quality. Your payment in one period is computed as follows: The price that you receive from your customer, less the costs of your service that you produced in either high or low quality. Since your customer cannot evaluate the quality of your service in advance, your price depends on the ratings of your previous customers. Your customer in the actual period considers the ratings that you received in the previous three periods. Depending on the number of positive rations (between 0 and 3) your customer is willing to pay a certain price. The more positive ratings you have the higher this price is.

Prices
Number of positive ratings Price (within the last 3 periods) that your customer is willing to pay. Therefore, your first customer in period 1 is willing to pay a price of 65 Taler. After your decision to produce your service in high or low quality, your customer receives your service and rates it. Your customer rates your service as either positive or negative. The customer's rating is affected by errors. If you sell a service of high quality to your customer, you receive with a probability of [95%/95%/70%/70%] a positive rating. With a probability of [5%/5%/30%/30%] the rating is negative even if the service's quality was high. If you sell a service of low quality to your customer, you receive with a probability of [95%/70%/95%/70%] a negative rating.
With a probability of [5%/30%/5%/30%] the rating is positive even if the service's quality was low. The price that you receive in the next period from your customer depends again on the ratings of the previous three periods.

Example:
Suppose you are in period 4 and you received the following ratings in period 1 to 3: (period 1, period 2, period 3)=(positive, negative, positive).
In this case the current customer from period 4 is willing to pay a price of 65 Taler for your service. If you receive a positive rating afterwards, the price for the next period is as well 65 Taler. The reason is that the ratings of the last three periods are now (period 2, period 3, period 4)=(negative, positive, positive).
If you receive a negative rating from your customer, the price for the next period is now 55 Taler on the basis of the last three periods (period 2, period 3, period 4)=(negative, positive, negative).
In each period the ratings of the last three periods are displayed.