Reinforcement learning to develop policies for fair and productive employment: A case study on wage theft within the day-laborer community

doi:10.1371/journal.pcsy.0000079

Fig 1.

Proposed behavioral and decision science framework that includes artifactual data collection, synthetic data generation, and reinforcement learning to optimize the impact of interventions.

More »

Expand

Fig 2.

Employment journey of a day laborer.

The employment journey of a day laborer as they accept work offered by an employer, move into a working state (fair or unfair) that is partially controlled unilaterally by that employer. Adapted from [2].

More »

Expand

Fig 3.

The batch Q-learning environment.

The batch QL environment may encompass various data generation strategies, including the simplified simulation utilized in this study as a proof of concept, a more realistic ABM (see [2] for a relevant example that we used in that study of precarious labor and wage theft), or the real-world context (as examined by us in [1]).

More »

Expand

Fig 4.

State/action diagram for the single agent formulation.

The state, action, and rewards for the single agent formulation, including events that precipitate state changes.

More »

Expand

Fig 5.

Game tree of the 2-player state space.

In the multiagent formulation, orange denotes the states where the employer makes the decisions and the actions that can be made, black lines represent the states where the day-laborer makes decisions and the actions they can take, and blue denotes the environment and the end results. The game is immediately replayed if the job does not begin and rewards are changed for both players in each state transition.

More »

Expand

Table 1.

Empirically estimated system parameters.

More »

Expand

Fig 6.

Extended Q-learning environment.

A simulation environment generates states, actions, transitions, and rewards, which are learned by the reinforcement learning system. An optimal policy is found and used to direct the second round of the data generation process. The second round is then learned, and a new policy is established based on the previous one, indicating the optimal action at each state for a particular environment.

More »

Expand

Fig 7.

Single Agent Q-Values.

The variations in Q-values corresponding to two actions, “Report” and “NoAction,” within the “WorkingTheft” state across different probability values of report success. These results are generated by the Python code S1 File provided in the Supplement.

More »

Expand

Fig 8.

Multi-Agent Q Values.

Illustration of the employee’s Q-values for their decision on whether or not to report a steal on the left and the employer’s decision on whether or not to steal on the right for various values of the probability of reporting success.

More »

Expand

Table 2.

Q-values and convergence statistics by agent action and reporting probability.

More »

Expand

Fig 9.

Event Supplementation of Macro States.

Illustration of how macro states can be supplemented with events associated with microstate processes and associated reactive or proactive actions.

More »

Expand