Action-factorized Rainbow deep Q-network with token Transformer for computation offloading in edge computing-enabled Internet of Ships

doi:10.1371/journal.pone.0348376

Fig 1.

EC-IoS system model.

More »

Expand

Fig 2.

Training framework of AF-TT-RDQN.

More »

Expand

Fig 3.

Architecture of the state encoder.

More »

Expand

Table 1.

Three feature groups.

More »

Expand

Fig 4.

Architecture of the action encoder.

More »

Expand

Fig 5.

Architecture of evaluation and target networks.

More »

Expand

Fig 6.

Parallel training framework based on asynchronous sampling and centralized synchronous learning.

More »

Expand

Table 2.

Parameters used in the simulation.

More »

Expand

Table 3.

Network parameters of AF-TT-RDQN.

More »

Expand

Fig 7.

Training results of AF-TT-RDQN from multiple perspectives.

A: Episode reward convergence. B: Average reward per 100-episode intervals. C: Frequency distribution of episode rewards. D: Bad action percentage over episodes.

More »

Expand

Fig 8.

Parallel worker performance analysis in AF-TT-RDQN.

A: Individual worker reward convergence. B: Reward distribution by the worker.

More »

Expand

Fig 9.

Training convergence comparison of AF-TT-RDQN against baseline DRL methods.

More »

Expand

Fig 10.

Optimization performance comparison of AF-TT-RDQN against baseline methods.

A: Comparison of total reward. B: Comparison of bad actions. C: Comparison of average per-step latency. D: Comparison of average per-step energy consumption.

More »

Expand

Table 4.

Ablation study of key components in AF-TT-RDQN.

More »

Expand