Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour

doi:10.1371/journal.pcbi.1013302

Fig 1.

Extensive-form game tree: Sir Philip Sidney game.

More »

Expand

Table 1.

Matrix of inclusive fitness given p = 1.

B is the row player and D is the column player.

More »

Expand

Table 2.

Matrix of inclusive fitness given p = 0.

B is the row player and D is the column player.

More »

Expand

Table 3.

Alternative categorisation of example parameter values that satisfy each set of inequalities.

For expectation, means the Beneficiary does not signal, and the Donor keeps the resource. Valid for any value of p.

More »

Expand

Fig 2.

Visual representation of threshold values where the evolutionarily stable strategies presented in this article would hold ().

The intersection of r = 0.8 with both c = 0.25 and c = 0.75 is within the shaded area of the two inequalities, therefore both thresholds are satisfied and we would expect the resulting strategy combination to be . However, for r = 0.2, the evolutionarily stable strategies of Not signal and Keep only hold for low signal costs as the intersection with c = 0.75 is outside the shaded area.

More »

Expand

Table 4.

Resulting strategies of learning agents, showing the top strategy learned and the proportion of runs this was the resulting strategy for the parameters LR=0.9, DR=0.1.

For expectation, means the Beneficiary does not signal, and the Donor keeps the resource. Valid for any value of p.

More »

Expand

Fig 3.

Beneficiary resulting strategies.

Percentage of runs that the Beneficiary learns each strategy. Not thirsty state, . Strategies described in Table 5.

More »

Expand

Fig 4.

Donor resulting strategies.

Percentage of runs that the Donor learns each strategy. Not thirsty state, . Strategies described in Table 6.

More »

Expand

Table 5.

Strategies most often learned by the Beneficiary with parameters S = 0.2, V = 0.2, r = 0.5 (Case 1), see Fig 3.

More »

Expand

Table 6.

Strategies most often learned by the Donor with parameters S = 0.2, V = 0.2, r = 0.5 (Case 1), see Fig 4.

More »

Expand

Fig 5.

Q-values Case 2, darker line is average across all, faint lines are average for each state.

More »

Expand

Fig 6.

Q-values Case 3, darker line is average across all, faint lines are average for each state.

More »

Expand