Fig 1.
EC-IoS system model.
Fig 2.
Training framework of AF-TT-RDQN.
Fig 3.
Architecture of the state encoder.
Table 1.
Three feature groups.
Fig 4.
Architecture of the action encoder.
Fig 5.
Architecture of evaluation and target networks.
Fig 6.
Parallel training framework based on asynchronous sampling and centralized synchronous learning.
Table 2.
Parameters used in the simulation.
Table 3.
Network parameters of AF-TT-RDQN.
Fig 7.
Training results of AF-TT-RDQN from multiple perspectives.
A: Episode reward convergence. B: Average reward per 100-episode intervals. C: Frequency distribution of episode rewards. D: Bad action percentage over episodes.
Fig 8.
Parallel worker performance analysis in AF-TT-RDQN.
A: Individual worker reward convergence. B: Reward distribution by the worker.
Fig 9.
Training convergence comparison of AF-TT-RDQN against baseline DRL methods.
Fig 10.
Optimization performance comparison of AF-TT-RDQN against baseline methods.
A: Comparison of total reward. B: Comparison of bad actions. C: Comparison of average per-step latency. D: Comparison of average per-step energy consumption.
Table 4.
Ablation study of key components in AF-TT-RDQN.