LazyAct: Lazy actor with dynamic state skip based on constrained MDP

doi:10.1371/journal.pone.0318778

Fig 1.

In the LazyAct architecture, a decision is made at each time, the agent returns multiple consecutive actions to achieve state skip.

The actor determines and stores the corresponding actions and skip duration for multi-agents within the environment in the Action Queue.

More »

Expand

Fig 2.

The MAPPO adopts a mode of centralized training with decentralized execution, where multiple agents interact independently with the environment.

The central learner uses experience data to update the Actor and Critic networks and then sends the updated Actor network back to each agent. LazyAct outputs skip k_t to skip unimportant states.

More »

Expand

Table 1.

Single agent and multi-agents tasks for LazyAct.

More »

Expand

Table 2.

Network architectures for single-agent task.

The “Conv” columns show the filer shape of the convolution, Channel(Kernel-size).

More »

Expand

Table 3.

Network architectures for multi-agents task.

More »

Expand

Fig 3.

The score curves of LazyAct, IMPALA and TempoRL.

LazyAct starts training from an unconstrained pre-trained model.

More »

Expand

Fig 4.

The skip ratio curves of LazyAct with different ϵ in single-agent tasks.

More »

Expand

Fig 5.

The alpha(α) curves of LazyAct with different ϵ in single-agent tasks.

More »

Expand

Table 4.

Scores vs Skip ratios on single-agent task.

Each cell represents the score(skip ratio).

More »

Expand

Fig 6.

The win rate curves of LazyAct and MAPPO.

LazyAct starts training from an unconstrained pre-trained model.

More »

Expand

Fig 7.

The skip ratio curves of LazyAct with different ϵ in multi-agents tasks.

More »

Expand

Fig 8.

The alpha(α) curves of LazyAct with different ϵ in multi-agents tasks.

More »

Expand

Table 5.

Win rate vs Skip ratios on multi-agents tasks.

Each cell represents the Win rate(skip ratio).

More »

Expand

Table 6.

Time(s) and GFLOPs savings of single-agent tasks.

Each cell represents the Time(GFLOPs).

More »

Expand

Table 7.

Time(ms) and KFLOPs savings of multi-agents tasks.

Each cell represents the Time(KFLOPs).

More »

Expand

Fig 9.

Visualization on SMAC-25m based on LazyAct.

More »

Expand