MaxGRNet: A multi-axis vision transformer with improved generalization for eye disease classification using explainable AI with insertion-deletion operations on fundus images

doi:10.1371/journal.pone.0346329

Table 1.

Identified research gaps from the literature review.

More »

Expand

Table 2.

Class distribution of the dataset (DS1).

More »

Expand

Table 3.

Distribution of samples in dataset (DS2) for different diseases.

More »

Expand

Fig 1.

Sample images from Eye_Disease_Classification dataset: DS1 and color fundus images dataset: DS2.

More »

Expand

Fig 2.

Complete block diagram of experiments to diagnose eye disease.

More »

Expand

Fig 3.

Architectural overview of multi-axis vision transformer (MaxViT-T).

More »

Expand

Fig 4.

Architectural overview of proposed MaxGRNet with GRN_MLP integration.

More »

Expand

Fig 5.

GRN-based MLP for proposed MaxViT architecture.

More »

Expand

Table 4.

Training configuration and hardware specifications.

More »

Expand

Table 5.

Dataset-wise performance comparison using macro-averaged metrics.

More »

Expand

Fig 6.

Accuracy curves for transfer learning-based models used in our study.

More »

Expand

Fig 7.

Confusion matrices and class-wise ROC curves for ViT-B16, Swin-T, ResNet50, and MaxViT-T on DS1 and DS2.

More »

Expand

Fig 8.

Confusion matrices and class-wise ROC curves for MaxGRNet on DS1 and DS2.

More »

Expand

Table 6.

Cross-validation performance comparison of different models on DS1. Values are reported as mean ± standard deviation across 5 folds.

More »

Expand

Table 7.

Paired statistical significance analysis (5-fold cross-validation) comparing MaxGRNet with baseline models on DS1. denotes the mean fold-wise difference (MaxGRNet-Baseline). 95% confidence intervals (CI) are computed over paired fold-wise differences using the t-distribution (df = 4). p_Holm denotes the Holm–Bonferroni corrected p-values across the four baseline comparisons (paired t-test). p_Wilc. denotes Wilcoxon signed-rank test p-values (two-sided).

More »

Expand

Table 8.

Quantitative evaluation of explainability using insertion/deletion AUC on the test set under a GaussianBlur baseline (kernel = 51, =8.0, 50 steps). We report mean AUC with 95% bootstrap confidence intervals. Higher Insertion and higher (1- deletion) indicate more faithful explanations. Correct-only protocol was used (target class = predicted class).

More »

Expand

Table 9.

Paired statistical comparison of explanation faithfulness using per-image AUCs on the common subset (intersection by index, n = 369). We report the mean difference (MaxGRNet minus baseline model) and p-values from paired t-test and Wilcoxon signed-rank test. A positive indicates that MaxGRNet is better for Insertion and (1-Deletion). GaussianBlur baseline (k = 51, =8.0; 50 steps).

More »

Expand

Table 10.

Overall statistical significance summary for MaxGRNet vs baselines on DS1. Prediction: 5-fold paired tests with Holm correction. Explainability: per-image paired tests on common test images (GaussianBlur baseline, 50 steps; correct-only protocol), using Insertion AUC and 1–Deletion AUC.

More »