Table 1.
Identified research gaps from the literature review.
Table 2.
Class distribution of the dataset (DS1).
Table 3.
Distribution of samples in dataset (DS2) for different diseases.
Fig 1.
Sample images from Eye_Disease_Classification dataset: DS1 and color fundus images dataset: DS2.
Fig 2.
Complete block diagram of experiments to diagnose eye disease.
Fig 3.
Architectural overview of multi-axis vision transformer (MaxViT-T).
Fig 4.
Architectural overview of proposed MaxGRNet with GRN_MLP integration.
Fig 5.
GRN-based MLP for proposed MaxViT architecture.
Table 4.
Training configuration and hardware specifications.
Table 5.
Dataset-wise performance comparison using macro-averaged metrics.
Fig 6.
Accuracy curves for transfer learning-based models used in our study.
Fig 7.
Confusion matrices and class-wise ROC curves for ViT-B16, Swin-T, ResNet50, and MaxViT-T on DS1 and DS2.
Fig 8.
Confusion matrices and class-wise ROC curves for MaxGRNet on DS1 and DS2.
Table 6.
Cross-validation performance comparison of different models on DS1. Values are reported as mean ± standard deviation across 5 folds.
Table 7.
Paired statistical significance analysis (5-fold cross-validation) comparing MaxGRNet with baseline models on DS1. denotes the mean fold-wise difference (MaxGRNet-Baseline). 95% confidence intervals (CI) are computed over paired fold-wise differences using the t-distribution (df = 4). pHolm denotes the Holm–Bonferroni corrected p-values across the four baseline comparisons (paired t-test). pWilc. denotes Wilcoxon signed-rank test p-values (two-sided).
Table 8.
Quantitative evaluation of explainability using insertion/deletion AUC on the test set under a GaussianBlur baseline (kernel = 51, =8.0, 50 steps). We report mean AUC with 95% bootstrap confidence intervals. Higher Insertion and higher (1- deletion) indicate more faithful explanations. Correct-only protocol was used (target class = predicted class).
Table 9.
Paired statistical comparison of explanation faithfulness using per-image AUCs on the common subset (intersection by index, n = 369). We report the mean difference (MaxGRNet minus baseline model) and p-values from paired t-test and Wilcoxon signed-rank test. A positive
indicates that MaxGRNet is better for Insertion and (1-Deletion). GaussianBlur baseline (k = 51,
=8.0; 50 steps).
Table 10.
Overall statistical significance summary for MaxGRNet vs baselines on DS1. Prediction: 5-fold paired tests with Holm correction. Explainability: per-image paired tests on common test images (GaussianBlur baseline, 50 steps; correct-only protocol), using Insertion AUC and 1–Deletion AUC.
Fig 9.
t-SNE Feature separation plot for MaxViT-T and proposed MaxGRNet model with the test set of both datasets.
Fig 10.
Visualizing the proposed MaxGRNet model with gradient map.
Fig 11.
Insertion and deletion curves (mean ± std) comparing Grad-CAM vs random mask under a GaussianBlur baseline (k = 51, ) for proposed MaxGRNet.
Fig 12.
Overall insertion and deletion curves (mean) under a GaussianBlur baseline comparing MaxGRNet, MaxViT-T, and ResNet50.
Fig 13.
Examples of images passed to the model during the computation of IAUC and DAUC with Grad-CAM, per class.
Table 11.
Comparison between the proposed MaxGRNet and the models from literature in terms of F1-Score.