Table 1.
Performance metrics (on testset) of deep learning models on Dataset 1 (Chest X-ray images for pneumonia detection).
Table 2.
Performance metrics (on testset) of deep learning models on Dataset 2 (CT Scans for COVID-19 detection).
Fig 1.
Accuracy and loss for the deep learning models used for Dataset 1 (Chest X-ray images for pneumonia detection).
The plots illustrate the model’s performance and convergence during the training process.
Fig 2.
Accuracy and loss for the deep learning model used for Dataset 2 (CT Scans for COVID-19 detection).
The plots illustrate the model’s performance and convergence during the training process. (a) ROC curve for Dataset 1 (Chest X-ray Scans for Pneumonia detection) (b) ROC curve for Dataset 2 (Chest CT Scans for COVID-19 detection).
Fig 3.
Receiver Operating Characteristic (ROC) (green) with Area under the ROC Curve (AUC) generated for the clinical case studies (Datasets 1 and 2).
ROC and AUC plots are generated for the best performing deep learning models for classifying Chest X-rays with and without Pneumonia, and CT scans with and without COVID-19. (a) ROC curve for Dataset 1 (Chest X-ray Scans for Pneumonia detection), (b) ROC curve for Dataset 2 (Chest CT Scans for COVID-19 detection).
Fig 4.
Constructed questions for questionnaire—part 1.1.
Questions aim to collect basic information concerning participants’ medical speciality, medical imaging knowledge, and overall experience in the medical field.
Fig 5.
Constructed questions for questionnaire—part 1.2.
Questions aim to explore participants’ familiarity with the concept of XAI in medical imaging.
Fig 6.
Explainable AI (XAI) visualization results for clinical case study one.
This figure illustrates XAI techniques (Grad-CAM (b and f) and LIME (d and h)) applied to chest X-ray images for pneumonia detection, highlighting the regions and features of the images that the deep learning model focuses on to make its predictions.
Fig 7.
Explainable AI (XAI) visualization results for clinical case study two.
This figure illustrates XAI techniques (Grad-CAM (b and f) and LIME (d and h)) applied to chest CT images for COVID-19 detection, highlighting the regions and features of the images that the deep learning model focuses on to make its predictions.
Fig 8.
Constructed questions for questionnaire—part 3.1.
Questions aim to assess the quality of the explanation provided by both Grad-CAM and LIME, and assess their effectiveness in influencing clinical decision-making within radiology workflow.
Fig 9.
Constructed questions for questionnaire—part 3.2.
Questions aim to assess the impact of the coloring scheme on the XAI visual results.
Fig 10.
Constructed questions for questionnaire—part 4.1.
Questions aim to collect recommendations for improving the explainability of AI models in medical imaging from the users’ perspective.
Fig 11.
Distribution of participant’s total medical experience.
The figure indicates that 18 participants have more than 10 years of experience, showcasing the overall experience levels within the participant group.
Fig 12.
Distribution of participants’ experience analyzing radiology images.
The histogram indicates that 13 participants have more than 10 years of experience, highlighting the expertise level within the group.
Fig 13.
Distribution of participants’ experience with AI-based medical imaging tools.
The figure reveals that 16 participants have zero experience with AI-based medical imaging tools, highlighting a significant portion of the group with no prior exposure to this technology.
Fig 14.
Distribution of participants’ familiarity with AI.
The figure shows that 14 participants reported being “little familiar” with AI, highlighting the varying levels of AI knowledge among the participants.
Fig 15.
Distribution of participants’ comfort with the general widespread use of AI.
The figure shows that most participants are feeling very comfortable with the general widespread use of AI.
Fig 16.
Distribution of participants’ comfort with the medical decisions generated from AI-based tolls.
The figure shows that opinions almost split between being Not sure and Comfortable.
Fig 17.
Distribution of participants’ confidence in AI-based diagnostic tools.
The figure shows that most participants, fourteen in total, reported poor confidence in AI-based diagnostic decisions.
Fig 18.
Distribution of participants’ support for understanding the decision-making process of AI algorithms used in medical imaging.
The figure illustrates that nineteen participants consider it crucial for medical practitioners to understand the rationale of AI decisions in medical imaging systems, while only five participants view this aspect as unimportant.
Fig 19.
Distribution of participants’ awareness of XAI.
The figure shows that most participants reported being poor familiarity of XAI in medical imaging.
Fig 20.
Distribution of participants’ belief in the effectiveness of XAI tools insights.
The figure shows that most participants didn’t respond to this question due to their poor familiarity with the XAI concept.
Fig 21.
Grad-CAM clinical relevance (Usefulness).
The figure shows that most participants expressed positive evaluations on the usefulness of the Grad-CAM method in explaining the AI results.
Fig 22.
Participants’ views on Grad-CAM colouring scheme.
The figure shows that thirteen participants indicated that the colored heatmaps had a negative impact on the readability of the XAI results.
Fig 23.
LIME clinical relevance (Usefulness).
The figure shows that nine participants scored LIME less than 2 for the usefulness criteria.
Fig 24.
The figure shows that twenty-two participants rated the heatmap visualisations positively, with scores of three or higher.
Fig 25.
The figure shows that only six participants assigned a score of 4 or 5 for the comprehensibility criteria.
Fig 26.
Participants’ preference between Grad-CAM and LIME.
The figure shows that nineteen participants favoured Grad-CAM (heatmap) over LIME visualisations.
Fig 27.
The figure shows that nine participants expressed confidence in the accuracy of the XAI visualisations, while seven participants lacked confidence in the results.
Fig 28.
Impact of XAI on improving trust in AI.
The figure shows that twelve participants expressed uncertainty about the impact on their trust in AI results in medical imaging, and eleven participants reported an improvement in their trust in AI systems after reviewing XAI visualizations.
Fig 29.
Participants’ support for textual explanations.
The figure shows that most participants support including textual explanations alongside the visual explanation for improved XAI results.
Fig 30.
Participants’ support for involvement in XAI design and development.
The figure shows that most participants support XAI co-design.
Fig 31.
Participants’ belief in XAI benefits.
The figure illustrates that most participants believe that XAI visualisations have the potential to enhance radiology practices.