Reinforcement learning of altruistic punishment differs between cultures and across the lifespan

doi:10.1371/journal.pcbi.1012274

Fig 1.

Reinforcement Learning Altruistic Punishment Task.

Participants selected a team color (yellow) and were assigned as Player 3, with Players 1, 2, and observers computer-generated. As shown in the left panel, during the punishment decision-making stage, Player 1 chose to share 0 RMB with Player 2. Endowed with 1 RMB, Player 3 then chose to either accept or reject the allocation, with rejection equating to a costly punishment. As shown in the right panel, in the feedback stage, Player 3 received social feedback from the observers in the waiting room, conveyed by a thumbs-up for positive feedback or a thumbs-up overlaid with a red cross for negative feedback. The feedback was based on a 60% approval threshold. Participants were assigned to either the ‘punishment-encouragement’ or ‘acceptance-encouragement’ condition, incentivizing either punishment or acceptance with an 80% chance of positive feedback and a 20% chance of negative feedback, respectively.

More »

Expand

Fig 2.

Cultural and group influences on altruistic punishment behavior in pre-test and reinforcement learning stages.

A Pre-test stage: Chinese participant exhibited stronger ingroup bias compared to American participants. Overall, participants punished ingroup (green) more severely compared to outgroup (orange) dividers. However, Chinese participants exhibited an enhanced group bias (b = - 0.82, SE = 0.29, z = -2.94, p = .003). B Reinforcement learning stage: when acceptance was rewarded, Chinese participants punished ingroup dividers (green) less severely compared to outgroup dividers (b = -0.91, SE = 0.23, z = -4.06, p < .001), no significant differences were found between punishment towards ingroup and outgroup dividers among American participants (b = -0.20, SE = 0.26, z = -0.76, p = .871). C Reinforcement learning stage: when punishment was rewarded, Chinese participants punished ingroup dividers (green) less severely compared to outgroup dividers (b = -0.87, SE = 0.22, z = -4.01, p < .001), no significant differences were found between punishment towards ingroup and outgroup dividers among American participants (b = -0.61, SE = 0.26, z = -2.37, p = .084). D Group level trial-by-trial punishment rates in the two divider conditions (ingroup: green, outgroup: orange) for each culture group (Chinese: squares, American: circles). Points represent group mean; error bars are standard errors. E Regardless of whether punishment or acceptance was rewarded, Chinese participants had lower learning rates for ingroup norms (b = -0.07, SE = 0.01, t = -5.51, p < .001), while American participants showed no differences (b = -0.02, SE = 0.01, t = -1.46, p = .460). F Regardless of whether punishment or acceptance was rewarded, Chinese participants had lower β when learning ingroup norms (b = -0.23, SE = 0.04, t = -5.85, p < .001), while no differences were observed for American participants (b = -0.08, SE = 0.04, t = -1.90, p = .231). G Chinese participants showed a stronger bias compared to American participants, especially when punishment was rewarded (M_Diff = 0.06, SE = .01, t = 5.11, p < .001). Error bars represent standard errors. * p < .05, *** p < .001.

More »

Expand

Fig 3.

Computational modeling reveals distinct norm learning and decision-making processes in punishing ingroups and outgroups.

A Parameter recovery analysis for the winning 4α2β + bias model demonstrates the recoverability of the parameters. Data from 15,625 simulated participants were used, with a confusion matrix depicting correlations between simulated and fitted parameters. Enhanced colors represent higher values. See S6 Table for the source data. B When punishment was rewarded, a simulation experiment illustrated the quadratic relationship between α and the rate of punishment. An α of 0.64 was identified as the point at which the rate of punishment was maximized. C Quadratic relationship between β and the rate of punishment. A β of 1.51 was identified as the point at which the rate of punishment was minimized. D When acceptance was rewarded, a simulation experiment illustrated the quadratic relationship between α and the rate of punishment. An α of 0.61 was identified as the point at which the rate of punishment was minimized. E Quadratic relationship between β and punishment rates. A β of 1.66 was identified as the point at which the rate of punishment was maximized.

More »

Expand

Fig 4.

Age-related differences in ingroup bias and learning mechanisms among Chinese individuals.

A Pre-test stage: the interaction effect between age and dividers’ group reveals a decrease in punishing unfair ingroup dividers (blue) with increasing age (b = -0.03, SE = 0.02, z = -2.05, p = .040), while punishment rates for outgroup dividers (purple) remain unchanged (b = -0.01, SE = 0.02, z = -0.34, p = .734). B Reinforcement learning stage: younger participants’ behavioral responses toward ingroups were more consistent with the norms than were older participants: when acceptance was rewarded, younger participants exhibited less punishment compared to older individuals (b = 0.03, SE = 0.02, z = 2.32, p = .020); when punishment was rewarded, younger participants exhibited more punishment than older participants b = -0.03, SE = 0.02, z = -1.51, p = .131). By contrast, the age differences were not significant for unfair outgroup dividers under either social feedback condition. C Trial-by-trial punishment rates for ingroup (blue) and outgroup (purple) dividers across age groups (adolescents: squares, adults: circles). Points represent group mean, and error bars are standard errors. D learning rate (α) for punishing unfair ingroups decreased with age (b = -0.14, SE = 0.03, t = -4.21, p < .001), while the α for punishing outgroups was not associated with age (b = -0.02, SE = 0.03, t = -0.68 p = .500). E Temperature (beta): Lower β for learning ingroup norms than outgroup norms regardless of age (b = -0.25, SE = 0.05, t = -5.36, p < .001); β decreased with age (b = -0.12, SE = 0.04, t = -2.74, p = .006). F the bias term significantly increased with age regardless of punishment norms (b = 0.13, SE = 0.05, t = 2.65, p = .008). Error bars represent standard errors. * p < .05. ** p < .01. *** p < .001.

More »

Expand

Fig 5.

Indirect effect of culture on group bias in learning (α-out minus α-in) via group identity.

Participants’ self-reported group identity mediated the relationship between culture and group bias in learning rates (standardized indirect effect = 0.07, 95% CI [0.003, 0.140], p = .039).

More »

Expand

Table 1.

Demographic information of Chinese and American adults in Study 1 and Chinese adolescents Study 2.

More »

Expand