Fig 1.
In the LazyAct architecture, a decision is made at each time, the agent returns multiple consecutive actions to achieve state skip.
The actor determines and stores the corresponding actions and skip duration for multi-agents within the environment in the Action Queue.
Fig 2.
The MAPPO adopts a mode of centralized training with decentralized execution, where multiple agents interact independently with the environment.
The central learner uses experience data to update the Actor and Critic networks and then sends the updated Actor network back to each agent. LazyAct outputs skip kt to skip unimportant states.
Table 1.
Single agent and multi-agents tasks for LazyAct.
Table 2.
Network architectures for single-agent task.
The “Conv” columns show the filer shape of the convolution, Channel(Kernel-size).
Table 3.
Network architectures for multi-agents task.
Fig 3.
The score curves of LazyAct, IMPALA and TempoRL.
LazyAct starts training from an unconstrained pre-trained model.
Fig 4.
The skip ratio curves of LazyAct with different ϵ in single-agent tasks.
Fig 5.
The alpha(α) curves of LazyAct with different ϵ in single-agent tasks.
Table 4.
Scores vs Skip ratios on single-agent task.
Each cell represents the score(skip ratio).
Fig 6.
The win rate curves of LazyAct and MAPPO.
LazyAct starts training from an unconstrained pre-trained model.
Fig 7.
The skip ratio curves of LazyAct with different ϵ in multi-agents tasks.
Fig 8.
The alpha(α) curves of LazyAct with different ϵ in multi-agents tasks.
Table 5.
Win rate vs Skip ratios on multi-agents tasks.
Each cell represents the Win rate(skip ratio).
Table 6.
Time(s) and GFLOPs savings of single-agent tasks.
Each cell represents the Time(GFLOPs).
Table 7.
Time(ms) and KFLOPs savings of multi-agents tasks.
Each cell represents the Time(KFLOPs).
Fig 9.
Visualization on SMAC-25m based on LazyAct.