Multi-objective Big Bang Big Crunch framework for reliable rice disease and variety classification with conditional calibration

doi:10.1371/journal.pone.0340807

Fig 1.

Samples from the public PaddyDoctor dataset spanning common diseases and varieties.

All image tiles in this figure are directly composed from PaddyDoctor images [18] without any third-party sources. The montage layout and annotations were generated by the authors using Python scripts.

More »

Expand

Fig 2.

Calibration- and uncertainty-aware multitask network.

Schematic of the proposed approach: frozen MobileNetV2 embeddings feed a two-head MLP with MC-Dropout, multi-objective optimization (MO–BBBC), and conditional temperature scaling. The diagram was drawn manually by the authors and does not reuse any third-party graphical material.

More »

Expand

More »

Expand

Fig 3.

Learning dynamics for both heads: accuracy and loss per epoch (train/validation).

(a) Disease accuracy (train/val) per epoch. (b) Variety accuracy (train/val) per epoch. (c) Disease loss (train/val) per epoch. (d) Variety loss (train/val) per epoch.

More »

Expand

Fig 4.

Reliability diagrams complement Table 2.

Disease reliability diagram (test). (b) Variety reliability diagram (test).

More »

Expand

Fig 5.

Micro-averaged ROC and precision–recall curves for disease and variety heads on the test set.

(a) Disease ROC (b) Disease PR (c) Variety ROC (d) Variety PR.

More »

Expand

Fig 6.

Confusion matrices on the test set (rows = ground truth, columns = prediction; colour intensity increases with the number of samples).

(a) Disease head (b) Variety head.

More »

Expand

Fig 7.

(Top) Pareto fronts along key axes; the asterisk marks the knee solution.

(Bottom) Error–energy fronts for search strategies under a matched budget (, , 20-epoch candidates). (a) Error vs energy (b) Error vs size (c) Energy vs calibration. (d) MO_BBBC (e) Random (f) TPE_lite.

More »

Expand

Fig 8.

NSGA2_lite front under the matched budget.

More »

Expand

Table 1.

Headline test metrics (conditional temperature calibration where beneficial).

More »

Expand

Table 2.

Calibration on the test set, before/after conditional temperature scaling (lower is better).

More »

Expand

Table 3.

Knee genome (rounded) and objectives at the Pareto knee. Objective values are reported in the robustly normalized space used for knee selection (see Eq (6)).

More »

Expand

Table 4.

Runtime/resource summary (lower is better for latency/energy).

More »

Expand

Table 5.

Ablation of curriculum strategies (test set; uncalibrated).

More »

Expand

Table 6.

Seed stability (; mean ± std, reduced budget as described).

More »

Expand

Table 7.

AUROC for ID vs noise-like OOD separation using uncertainty scores.

More »

Expand

Fig 9.

Runtime and uncertainty analyses.

BALD histogram is analogous (not shown). Latency by device (b) PE: ID vs OOD.

More »

Expand

Fig 10.

Headline radar plot comparing the disease and variety heads across Accuracy, Macro-F1, micro-AUC, micro-AP, and AECE.

For readability, the AECE axis is inverted (larger is better after inversion). Values are from the calibrated state (Table 1/2): disease acc 0.906, macro-F1 0.902, micro-AUC 0.994, micro-AP 0.961, AECE 0.0138; variety acc 0.979, macro-F1 0.907, micro-AUC 0.999, micro-AP 0.994, AECE 0.0138.

More »

Expand