Reinforcement Operator Learning (ROL): A hybrid DeepONet-guided reinforcement learning framework for stabilizing the Kuramoto–Sivashinsky equation

doi:10.1371/journal.pone.0341161

Fig 1.

A hybrid DeepONet-guided reinforcement learning framework for stabilizing the Kuramoto–Sivashinsky equation.

More »

Expand

Table 1.

Hyperparameters and empirical justification.

Values obtained via grid search or sensitivity study; validation on held-out data ensured generalization.

More »

Expand

Fig 2.

DeepONet training loss decreases from 0.000549 to (90.0% reduction) over 1200 epochs, an order of magnitude lower than Pure TD3’s initialization threshold, enabling efficient offline learning.

More »

Expand

Fig 3.

RL reward trajectories: DeepONet RL reaches –5.0 (63.0% higher than Pure TD3’s –13.52), with 9.1% better sample efficiency, showcasing enhanced learning efficiency.

More »

Expand

Fig 4.

TD3 actor-critic losses: DeepONet RL’s critic loss (0.02) is 33.3% lower than Pure TD3’s (0.03), with 40.0% less variance, indicating superior stability.

More »

Expand

Fig 5.

Mean energy with ± SEM: DeepONet RL achieves 0.397 (99.1% reduction from LQR, 64.6% from Pure TD3), with minimal SEM overlap in final steps, outperforming both baselines.

More »

Expand

Fig 6.

Spatio-temporal state u(x, t): DeepONet RL suppresses amplitudes to 0.5 units (90.0% from LQR, 66.7% from Pure TD3), with 80.0% lower spatial variance, demonstrating superior control.

More »

Expand

Table 2.

Final energy (mean ± std) over the last 10 steps across five trials: DeepONet RL’s mean is 64.6% lower than Pure RL’s and 107.7-fold lower than LQR’s, with 75.4% and 92.9% less variability, respectively.

More »

Expand

Table 3.

Comparative performance of ROL against state-of-the-art methods for controlling the 1D Kuramoto-Sivashinsky (KS) equation.

Our method is highlighted in bold. The final energy metric is defined as .

More »

Expand

Table 4.

Operating envelope for the ROL framework.

Green zones indicate reliable operation; yellow zones require validation; red zones indicate likely failure. Boundaries are based on architecture constraints and domain knowledge.

More »

Expand