Fig 1.
Coronal computed tomography images of representative cases with (A) healthy control, (B) chronic rhinosinusitis, and (C) maxillary sinus fungal ball on half coronal CT slices.
Table 1.
The number of 3-D full-stacks (patients) of OMU CT images for each class in the internal and external dataset (total n = 512 and 64 respectively).
Table 2.
The number of 3-D hemi-stacks of OMU CT images for each class in the internal and external dataset (total n = 1024 and 128 respectively).
Fig 2.
Overview of the proposed network algorithm.
In the first stage (A), key slices were automatically extracted from the entire section of the coronal CT image using the 2D-CNN technique. In the second stage (B), disease classification was performed through the 3D-CNN by taking a 3D stack composed of only key CT slices as input. MFB; maxillary sinus fungal ball, CRS; chronic rhinosinusitis, HC; healthy control, CNN; convolutional neural network.
Fig 3.
Illustration of cropping, division, and flipping processes.
Table 3.
AUC and accuracy of AI in the internal validation.
The average and standard deviation were derived from each of 5-fold cross-validations.
Table 4.
Sensitivity, precision, and F1 score of AI in the internal validation.
The average and standard deviation in 5-fold cross-validation were denoted.
Table 5.
Performance comparison between the proposed AI system and resident test.
Confusion matrices for external validation data.
Table 6.
AUC and accuracy of AI and humans in the external validation.
The external validation was evaluated from each AI model trained by 5-fold cross-validation or each of six human classification tests, of which average and standard deviation were given.
Table 7.
Performance comparison between human and AI in the external validation.
Sensitivity, precision, and F1 score of each class were given. The average and standard deviation were denoted from each model trained by the 5-fold internal validation.
Table 8.
Performance comparison between AI models with and without considering the ensemble technique in the external validation.
The class-wise average and its standard deviation was given in measuring the sensitivity and precision of the ensemble AI.
Table 9.
2-D CNN performance comparison for detecting key-slices.
The average and standard deviation were derived from each of 5-fold cross-validations.
Table 10.
3-D CNN performance comparison of 3-label classification with and without using the proposed key-slice selector.
The external validation was evaluated from each of the five pre-trained models.
Fig 4.
Performance comparison of the proposed scheme and the other configurations.
The external validation results of macro-average AUCs were given from each of the five pre-trained models. K; the proposed key slice detector. Average-based 2-D CNN; 2-D CNN that takes the input as the average of 2-D coronal slices.
Table 11.
Performance comparison of 3-D CNN used for the second stage.
The external validation was evaluated from each of the five pre-trained models.
Fig 5.
The internal (A and B) and external (C and D) validation results of the Grad-CAM of the patients with CRS and MFB.
Fig 6.
Examples of the Grad-CAM misclassified by the AI system on the external validation.
MFB; maxillary sinus fungal ball, CRS; chronic rhinosinusitis, HC; healthy control.