Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep learning detection and classification of fungal and non-fungal calcifications on paranasal sinus CT imaging

  • Zepa Yang,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Computer Engineering, Soonchunhyang University, Asan, Republic of Korea, Biomedical research center, Korea University Guro Hospital, Seoul, Republic of Korea

  • Insung Choi,

    Roles Investigation, Methodology, Software, Validation

    Affiliations Biomedical research center, Korea University Guro Hospital, Seoul, Republic of Korea, Seers Technology Co. Ltd., Gyeonggi-do, Republic of Korea

  • Hoo Yun,

    Roles Data curation, Methodology

    Affiliations Biomedical research center, Korea University Guro Hospital, Seoul, Republic of Korea, Korea University College of Medicine, Seoul, Republic of Korea

  • Siwoo Kim,

    Roles Data curation, Methodology

    Affiliations Biomedical research center, Korea University Guro Hospital, Seoul, Republic of Korea, Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea

  • Hye Na Jung,

    Roles Data curation, Formal analysis

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Guro Hospital, Seoul, Republic of Korea

  • Sangil Suh,

    Roles Data curation

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Guro Hospital, Seoul, Republic of Korea

  • Bo Kyu Kim,

    Roles Data curation

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Anam Hospital, Seoul, Republic of Korea

  • Byungjun Kim,

    Roles Data curation

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Anam Hospital, Seoul, Republic of Korea

  • Sung-Hye You,

    Roles Data curation, Methodology

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Anam Hospital, Seoul, Republic of Korea

  • Inseon Ryoo

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    isryoo@gmail.com

    Affiliations Korea University College of Medicine, Seoul, Republic of Korea, Department of Radiology, Korea University Guro Hospital, Seoul, Republic of Korea

Abstract

This study aimed to develop and evaluate a deep learning algorithm for detecting and classifying intrasinus calcifications on paranasal sinus (PNS) computed tomography (CT) for the diagnosis of fungal sinusitis and differentiation of fungal and non-fungal sinusitis. A dataset of 277 PNS CT cases from Korea University Guro Hospital, supplemented by temporal and geographic external test sets, was utilized. A 3D U-Net model was employed to segment maxillary sinus regions. YOLO v5 identified calcifications, followed by classification into three patterns: normal sinus or chronic sinusitis without calcifications, dense peripheral dystrophic calcification, and central punctate fungal calcification. A separate convolutional neural network (CNN) refined the classification to ensure accurate categorization of calcification patterns. The 3D U-Net model achieved a Dice Similarity Coefficient of 0.9674, indicating accurate segmentation. YOLO v5 demonstrated precision of 79.50% and recall of 92.14% in detecting calcifications. The CNN classification model attained F1 scores of 94.73%, 90.60%, and 94.01%, and overall accuracies of 97.48%, 86.87%, and 94.01% for internal, temporal, and geographic test sets, respectively. This study demonstrated the capability of deep learning algorithms to accurately detect and classify fungal sinusitis-related calcifications on PNS CT scans. The developed framework achieved high accuracy in segmentation of sinus area and detection/classification of intrasinus calcifications. The framework also demonstrated its potential for broader application to radiographic imaging.

Introduction

During the past decade, artificial intelligence and deep learning technologies have been applied to various industrial fields including medicine [14]. As a huge amount of medical imaging data are acquired and stored every day, radiology is a promising field for deep learning technologies, and many studies have shown the variety of possibilities in this field [512]. Furthermore, the number of medical images continues to increase explosively due to advancements in medical imaging techniques, which has tremendously increased workloads for radiologists [1315]. Application of deep learning technologies to radiology can be very beneficial to radiologists and can serve as a novel academic field in radiology.

It is crucial to differentiate fungal sinusitis from non-fungal sinusitis to determine the appropriate treatment strategy for chronic sinusitis. In particular, early detection of fungal sinusitis is important for the prevention of complications in immunocompromised individuals, since fungal sinusitis can progress to fatal invasive fungal infections in that population [16]. Therefore, patients scheduled to receive immunosuppressive treatment usually undergo screening paranasal sinus (PNS) computed tomography (CT) prior to the treatment to detect and treat existing fungal sinusitis.

Intrasinus calcification is a characteristic feature of fungal ball, usually aspergillosis. Approximately 69–77% of patients with aspergillosis have been reported to have intrasinus calcifications on PNS CT [1719]. Intrasinus calcifications can also occur in non-fungal inflammatory diseases of the PNS, such as mucocele or bacterial sinusitis. However, intrasinus calcification is uncommon in non-fungal inflammatory sinonasal disease; less than 3% of cases exhibit intrasinus calcifications [17,18,20]. In addition, the shape and location of intrasinus calcifications of fungal sinusitis are different from those of non-fungal sinusitis [18].

There have been several studies applying deep learning algorithms to PNS CT images. Most of the studies evaluated the presence and severity of chronic sinusitis or sinus opacifications on PNS CT using deep learning algorithms [2124]. Recently, a study assessed the performance of the algorithm on PNS CT for distinguishing among chronic sinusitis, fungal sinusitis, and healthy controls [25]. However, internal calcifications and their patterns were not evaluated and have largely been overlooked.

In this study, we developed a deep learning algorithm for detecting calcifications in the maxillary sinuses on PNS CT and classifying these features to diagnose fungal sinusitis, especially fungal ball. We also evaluated the accuracy of the algorithm for detecting and classifying calcifications.

Materials and methods

Data acquisition

A dataset comprising PNS CT images from 277 cases (554 images of sinuses) was collected from Korea University Guro Hospital (KUGH). The dataset included patients diagnosed with fungal calcifications in the maxillary sinus from January 2015 to February 2022. The diagnosis of fungal calcifications was confirmed by pathological examination of specimens obtained from endoscopic sinus surgery. Pathologists diagnosed fungal ball with specific findings such as fungal hyphae. As PNS CT scans had bilateral maxillary sinuses, the contralateral side without calcification was also included in this study. We also collected data from 71 patients (142 sinuses) at KUGH from March 2022 to December 2023 as a temporal external test set and data from 105 patients (210 sinuses) at Korea University Anam Hospital (KUAH) from January 2021 to December 2023 as a geographic external test set. Although the images in the external test set were collected from a different branch of the same university-affiliated hospital system, the practicing physicians and radiologists, as well as the CT machines and imaging protocols, were different. Moreover, since the two hospitals are located in opposite parts of the city (northeast vs. southwest), the patient populations they serve differ significantly. We obtained approval from the Institutional Review Boards (IRBs) of KUGH and KUAH (IRB No.: 2022GR0432 and 2023AN373) and the need for informed consent was waived. The data comprised Digital Imaging and Communications in Medicine (DICOM) images, each with a resolution of 512 x 512 pixels. These images were collected under a strict protocol approved by the IRB, ensuring that all patient-related data were handled with confidentiality and integrity. The de-identified data were then used exclusively for the purpose of this study. The overall characteristics of the patient dataset are presented in Table 1.

thumbnail
Table 1. The overall characteristics of the patient dataset. Distribution of class labels of each dataset was displayed. Left/right sinuses are counted separately.

https://doi.org/10.1371/journal.pone.0340832.t001

Development environment

The machine learning models in this study were developed and trained using the PyTorch deep learning framework (version 1.10.2). All computational experiments were conducted on a high-performance workstation equipped with an NVIDIA RTX A6000 GPU.

3D maxillary sinus area segmentation

The 3D maxillary sinus area segmentation model focused on processing PNS CT images to accurately delineate the maxillary sinus area. The model used 554 DICOM images from 277 patients. These images were subjected to a series of preprocessing steps to enhance the robustness and accuracy of the model. The initial preprocessing step involved cropping the maxillary sinus area from the images. This was achieved by bisecting the image along the central vertical line and calculating the y-axis range, ensuring the inclusion of the maxillary sinus area in all images, resulting in images of 256 x 256 pixels. To ensure consistency across the dataset, images of the left and right maxillary sinus areas were processed to maintain a standard orientation. This involved horizontally flipping the images of the left maxillary sinus, a method also serving as data augmentation. Additionally, to account for variation in patient positioning, these images were rotated within a 10–15-degree range.

The core of the segmentation model was the 3D U-Net architecture, which consisted of an encoder-decoder structure, allowing for precise localization while retaining important contextual information. The overall graphical diagram of proposed model is shown in Fig 1.

thumbnail
Fig 1. Three-dimensional U-Net (N = 4 down-sampling levels) for maxillary sinus segmentation.

The encoder comprises four levels; each level applies two 3 × 3 × 3 convolutions with ReLU activation and batch normalization, followed by 3D max-pooling. Feature channels by encoder level are 32, 64, 128, and 256. The decoder mirrors the encoder with transposed convolutions for up-sampling and skip connections to the corresponding encoder features, followed by two 3 × 3 × 3 convolutions (ReLU and batch normalization). A final 1 × 1 × 1 convolution with softmax produces voxel-wise probability maps of the maxillary sinus. Training optimized a Dice-based loss with standard spatial and intensity augmentation.

https://doi.org/10.1371/journal.pone.0340832.g001

The encoder path of the U-Net began with a series of convolutional layers, each followed by Rectified Linear Unit (ReLU) activation functions and batch normalization to stabilize the learning process. The convolutional layers progressively reduced the spatial resolution of the feature maps while increasing the number of feature channels, allowing the model to capture increasingly abstract features. Each down-sampling step was implemented with 3D max-pooling operations, which reduced the dimensions of the feature maps by half, effectively capturing spatial hierarchies at different scales.

In the decoder path, the feature maps were up-sampled using transposed convolutions, reversing the resolution reduction performed in the encoder. These up-sampled feature maps were then concatenated with the corresponding feature maps from the encoder path through skip connections, ensuring that the fine-grained detail lost during down-sampling were preserved. Each up-sampling step was followed by 3D convolution layers to refine the segmented regions. The final layer of the network used a softmax activation function to generate a probability map, representing the likelihood of each voxel belonging to the maxillary sinus area.

During training, the Dice loss function was employed to optimize the network. This function measures the similarity between predicted and actual segmentation areas, ensuring that the model’s predictions align closely with the true anatomical structures.

Detection and classification model for calcifications in fungal sinusitis

To facilitate the detection and classification of calcification patterns in fungal sinusitis, the input DICOM images were processed by applying a maxillary sinus mask, isolating the region of interest (ROI). The images retained the original Hounsfield unit values in the maxillary sinus region. Calcification patterns were labeled as ROIs by a head and neck radiologist, and this labeled data underwent a coordinate transformation process for use in model training.

The detection model employed YOLO(You Only Look Once) v5 to identify calcifications within the segmented maxillary sinus regions. Both the model design and the preprocessing pipeline were specifically adapted to accommodate 10-bit images, including modifications to image normalization parameters, dataset mean, and standard deviation, ensuring accurate handling of the full range of CT Hounsfield values. The training dataset consisted of 554 images of the sinuses, split into 70% for training, 10% for testing, and 20% for internal validation.

Pre-classification and labeling of the classes were based on interpretation by head and neck radiologists. Under the supervision and thorough examination of these radiologists, two radiology technicians performed labeling. Labeling focused on regions identified as fungus, with approximately 8,000 bounding boxes labeled across 554 sinuses.

During the labeling process, all YOLO-detected bounding boxes and their adjacent sinus areas were reviewed by two board-certified radiologists. Chronic or non-fungal sinusitis cases without visible calcification were categorized as Class 1 (no calcification) under radiologists’ supervision. The YOLO detector was trained with a relatively low confidence threshold to minimize false negatives, ensuring that subtle or ambiguous sinus regions were proposed for subsequent classification.

In our study, the classification model recognized several distinct classes based on calcification patterns: 1) Normal sinus or chronic sinusitis, representing clear sinuses or sinus opacifications without abnormal calcifications, 2) Dense peripheral calcification (dystrophic), denoting denser calcification areas often associated with non-fungal sinusitis, and 3) Central punctate calcification patterns indicative of fungal sinusitis. Each of the calcification patterns classified in our dataset is presented in Fig 2.

thumbnail
Fig 2. A series of axial slices from PNS CT scans demonstrating maxillary sinus anatomy and calcification patterns.

a) normal sinus or chronic sinusitis, representing clear sinuses or sinus opacifications without abnormal calcifications, b) dense peripheral calcification (dystrophic), denoting denser calcification areas often associated with non-fungal sinusitis, and c) central punctate calcification pattern seen in fungal sinusitis.

https://doi.org/10.1371/journal.pone.0340832.g002

During training, the Complete Intersection over Union (CIoU) loss function was used for region detection, considering the predicted bounding box’s center position, size, and aspect ratio. Binary Cross-Entropy (BCE) loss function was utilized to classify the areas. These adaptations and training methods were vital in developing a model capable of accurately detecting calcification patterns in fungal sinusitis.

Additionally, a separate classification process was assigned for the bounding boxes produced by YOLO’s detection results. The classification model was expected to perform multi-class classification based on calcification patterns, refining its predictions following the issues identified during the above detection stage by minimizing false positives related to artifacts.

A simple Convolutional Neural Network (CNN) with five layers was designed to classify calcification patterns in the bounding box results of the YOLO v5 model, which was assumed to reflect sinusitis with calcifications. The first convolutional layer applied 32 filters 3 x 3 in size with a stride of 1 and padding to preserve the spatial dimensions. A ReLU activation function was used to introduce non-linearity. A 2 x 2 max-pooling operation with a stride of 2 was used to reduce the spatial dimensions of the feature maps. The second convolutional layer employed 64 filters 3 x 3 in size, with similar stride and padding settings, followed by another ReLU activation function. This layer included another 2 x 2 max-pooling operation with a stride of 2. The third convolutional layer utilized 128 filters 3 x 3 in size, with a stride of 1 and padding, followed by a ReLU activation function. The output from the convolutional layers was flattened and fed into a fully connected layer with 512 neurons, followed by a ReLU activation function. The final output layer was a fully connected layer with a softmax activation function designed to output the probability distribution over the classification labels. The training dataset was augmented using random rotations, shifts, and flips to improve the model’s robustness and generalization capabilities.

Model performance evaluation

The segmentation performance in the maxillary sinus region was initially evaluated using the Dice Similarity Coefficient (DSC), a machine learning metric. This was complemented by visual assessment by head and neck radiologists. For the detection and classification of fungal sinusitis calcifications, we employed a confusion matrix. Each patient’s data were assessed and validated at the patient case-wise level, with separate evaluations for the left and right maxillary sinus regions. The gold standard was defined not solely by the conventional training criteria of the deep learning model, but by clinical calcification patterns confirmed through pathologic results.

Our classification model, trained on cases with a slice thickness of 3 mm, was also adapted for cases with a 1-mm slice thickness by dividing the slices into three sets and integrating the results. A key element of classification was identifying the presence of fungal calcification patterns in the central area of the maxillary sinus and the density of the object. The classification process was refined by evaluating the intensity brightness of the target object relative to the background, ensuring that it could be differentiated but was not significantly brighter than non-fungal dense calcification patterns. Therefore, if any slice in a patient’s CT scan showed a calcification pattern in the central area and the intensity brightness met the specified criteria, the case was classified into Class 3. This classification approach was applied consistently to both 3-mm and 1-mm slice-thickness cases. Accuracy was defined as the sum of the correctly matched diagnosis and prediction results, specifically the sum of the diagonal values in the confusion matrix. The error rate was defined as the sum of all other values in the matrix, representing the total number of instances where the diagnosis and prediction results did not match. All classification metrics were reported as macro-averaged values, unless otherwise specified. Each class (normal/chronic sinusitis, peripheral calcification, and fungal punctate calcification) was equally considered in these evaluations.

For external validation with both the temporal external test set and geographic external test set, the confusion matrix was calculated, allowing for comprehensive evaluation of the model’s performance across different datasets.

To evaluate the discriminative performance of the calcification classification model, receiver operating characteristic (ROC) curves were generated, and the area under the ROC curve (AUROC) was measured for each class.

We report two complementary accuracy measures. Overall accuracy (micro) is the proportion of correctly classified sinuses out of all sinuses across all classes. Balanced accuracy (macro recall) is the simple average of the recalls computed separately for Class 1, Class 2, and Class 3; it mitigates the influence of class imbalance by giving each class equal weight. We also report per-class precision, recall, and F1, as well as AUROC.

Results

Maxillary sinus segmentation performance

The 3D U-Net model effectively segmented the maxillary sinus regions. The model achieved a Dice loss of 0.0326 and a DSC of 0.9674, showing a high level of agreement between the predicted segmentation and the ground truth masks. Expert visual assessment of the actual segmentation results confirmed that the model produced highly accurate segmentation. Segmentation of the space between the maxillary sinus and the nasal cavity was performed, and the model also achieved relatively high accuracy in delineating the bone contours. Even in areas with complex topology and ambiguous boundaries, the model showed excellent performance. The results of the maxillary sinus area segmentation are presented in Fig 3.

thumbnail
Fig 3. Predicted results of the maxillary sinus segmentation model.

The model accurately delineated the area of interest, successfully segmenting challenging regions with ambiguous boundaries due to topological changes.

https://doi.org/10.1371/journal.pone.0340832.g003

Detection and classification model for calcifications in fungal ball

The YOLO v5-based detection model effectively identified regions with suspected calcifications within the segmented maxillary sinus areas. The model achieved a CIoU loss of 0.0350 and a BCE loss of 0.0023, reflecting strong performance in both detection and localization tasks. The mAP50 (mean Average Precision) value was recorded at 0.6801, with mAP50-95 at 0.3557. The sample result of the detected bounding box in the target area is presented in Fig 4.

thumbnail
Fig 4. Predicted results of the fungal calcifications detection model.

Regions suspected of calcifications were identified and measured using bounding boxes. While the model accurately localized the areas of interest, there was a slight tendency for higher false positive cases, such as implantation artifacts and small bone structures.

https://doi.org/10.1371/journal.pone.0340832.g004

During training, the box loss was 0.3386, classification loss was 0.8624, and distribution focal loss (DFL) was 0.2818. Evaluation metrics indicated a precision of 0.79495 and a recall of 0.92143. While the high recall suggests that the model successfully detected most of the true positives, the relatively lower precision points to a higher occurrence of false positives. The mAP50 was 0.6927, and mAP50-95 was 0.3909. In the validation phase, the box loss was 0.5653, classification loss was 0.9312, and DFL was 0.2818.

The classification model performed multi-class classification based on calcification patterns. During training, the classification model achieved an accuracy of 0.9325 and a loss of 0.1342, while in validation, it recorded an accuracy of 0.9193 and a loss of 0.1557, demonstrating high performance on unseen data. The results of overall training result are presented in Table 2.

thumbnail
Table 2. Performance summary of the segmentation, detection, and classification models resulted during training process of each model. The table shows the metrics for segmentation (Dice loss, DSC), detection (CIoU loss, BCE loss), and classification (training/validation accuracy and loss), providing an overview of the models’ performance.

https://doi.org/10.1371/journal.pone.0340832.t002

External validation and generalization

In internal validation, the algorithm achieved a macro-averaged precision of 0.9270, recall of 0.9748, and F1 score of 0.9473 across all classes. The overall accuracy was 0.9748, as shown in Table 3. These values reflect the model’s ability to maintain a high balance between detecting true positives and minimizing false negatives within the training dataset. The relatively high recall of 0.9748 indicates that the algorithm performed well in identifying most positive cases, while precision at 0.9629 points to its effectiveness in minimizing false positives. The F1 score of 0.9607 confirms that the model balanced both precision and recall efficiently in the internal validation phase. The overall performance metrics of each validation set are presented in Table 3.

thumbnail
Table 3. Performance metrics of the calcification pattern classification model across internal validation set, temporal external test set, and geographic external test set. Overall accuracy and balanced accuracy (macro recall) are reported to provide a comprehensive evaluation of the model’s classification performance across fungal and non-fungal sinusitis cases.

https://doi.org/10.1371/journal.pone.0340832.t003

For external validation, the algorithm demonstrated a balanced accuracy of 95.07% for the temporal external test set and 96.19% for the geographic external test set. Confusion matrices were computed at the study level for both test sets, and F1 scores were derived using precision and recall. In the geographic external test set, the macro-averaged F1 score was 0.9401, and in the temporal external test set, it was 0.9060, reflecting robust classification performance across classes. The precision for the temporal test set was slightly lower compared to the geographic test, but recall remained high in both test sets, showing the model’s consistency in detecting positive cases across different datasets.

In evaluation of the discriminative performance, the AUROC for Class 1 (0.54) was relatively lower compared to the other classes, and Class 2 and Class 3 achieved AUROC values of 0.98 and 0.93, respectively, indicating strong performance. The ROC plot is presented in Fig 5, and the confusion matrix results for each validation dataset are shown in Fig 6.

thumbnail
Fig 5. Receiver Operating Characteristic (ROC) curves for multi-class classification across three experiments.

a) internal validation, b) temporal external test set, and c) geographic external test set. High discrimination performance was observed for Class 2 and Class 3, while the discrimination for Class 1 was relatively low.

https://doi.org/10.1371/journal.pone.0340832.g005

thumbnail
Fig 6. Confusion matrices for a) internal validation, b) temporal external test set, and c) geographic external test set.

https://doi.org/10.1371/journal.pone.0340832.g006

Discussion

Most of the intrasinus calcifications in fungal sinusitis or fungal ball are centrally located in the maxillary sinus, in contrast to the calcifications in non-fungal sinusitis, which are peripherally located near the walls of the maxillary sinuses. Fungal ball usually shows fine punctate calcifications, while non-fungal sinusitis shows smooth-margined, round, or eggshell calcifications [17,18]. These different patterns of intrasinus calcifications may result from the different pathogenesis. The calcifications in fungal sinusitis are formed from metabolic deposits of calcium within the necrotic area of the mycelial mass [18,2628]. These calcifications are located centrally, as the mycelial mass is usually located in the center of the maxillary sinus [26,27]. On the other hand, intrasinus calcifications of non-fungal sinusitis are dystrophic calcifications or ossifications caused by chronic inflammatory processes. This kind of calcification seems to occur near the thickened mucosal layer of the sinus repeatedly affected by chronic inflammation [29,30].

Because intrasinus calcifications are common in fungal ball and their shape and location are different from those of non-fungal sinusitis, detecting calcifications and classifying their patterns on PNS CT is important in the early diagnosis and treatment of fungal sinusitis.

The 3D U-Net demonstrated superior performance over the 2D U-Net for maxillary sinus segmentation. Unlike the 2D U-Net, which processes CT slices independently and may overlook spatial relationships, the 3D U-Net segments the sinus in three dimensions, capturing its structure more accurately. By leveraging information across consecutive slices, it improves boundary delineation and segmentation accuracy. These findings confirm the 3D U-Net’s suitability for precise anatomical segmentation in medical imaging.

Based on the mAP and box loss values, it can be concluded that the detection performance of the YOLO v5 model was reasonably high for identifying calcified regions. However, upon visually comparing the detection result tiles with the ground truth labels, it was observed that the detection bounding boxes tended to be larger than the actual labeled regions.

We adopted a two-stage detection–classification scheme in which YOLO v5 serves as a high-recall proposal generator and a separate CNN performs fine-grained class refinement (Classes 1–3). This allowed us to run YOLO v5 with a low detection threshold to maximize sensitivity for small calcifications, while delegating false-positive reduction and subtle discrimination—especially between non-fungal and fungal calcifications—to the CNN. In preliminary checks, the single-stage YOLO head was less discriminative for these nuanced patterns, which is consistent with YOLO’s emphasis on speed and objectness rather than fine-grained class separation. This design aligns with prior cascaded deep-learning frameworks for lesion detection and classification in medical imaging [3134].

In terms of classification, the model’s classification loss and precision indicated that the classification capability of the current YOLO model was not as robust as we expected. Specifically, the initial false positive rate of the YOLO model reached approximately 21.1%. The majority of these misclassifications were related to Class 2 and Class 3 images, which the model struggled to distinguish correctly. A key contributing factor to this issue was the relatively small amount of Class 2 data compared to other classes, leading to imbalanced training. Additionally, the ambiguity between Class 1 and the conventional background class further exacerbated the model’s misclassification. To address these issues, a separate CNN model was trained using augmentation techniques to improve classification accuracy. This approach successfully reduced the false positive rate, as reflected in the higher precision and recall values in the results, indicating a significant improvement in model performance. The augmentation process allowed the model to better distinguish between similar classes and improve its overall classification accuracy.

In this study, the YOLO based algorithm exhibited fair detection accuracy for intrasinus calcifications, with showing BCE loss of 0.0023 and mAP50 of 0.6801 during training, and for the internal dataset, the algorithm achieved a macro-averaged precision of 92.70%, recall of 97.48%, and F1-score of 94.73%, based on performance across all three classes. The overall accuracy for the internal dataset was 97.48%, while the balanced accuracy was 96.49%. The results for temporal and geographic external test sets also showed an accuracy of around 90%. The results of our study demonstrated slightly higher accuracy compared to the findings from Kim et al., which demonstrated 88% accuracy in internal validation [25]. While Kim et al. primarily focused on distinguishing different types of sinusitis, such as fungal sinusitis, chronic sinusitis, and healthy controls, without specifically detecting calcifications, our research emphasized the detection of intrasinus calcifications.

In Kim et al.’s study [25], they could not reveal the cause of misclassification. In some cases, incorrect regions were analyzed, such as outside the sinuses, while in other instances correct regions were analyzed but still misclassified. However, as we focused on calcifications in the sinuses, we could figure out the processes the algorithm performed. First, the algorithm detected calcifications and then classified them based on their patterns. This method might also increase the accuracy of diagnosing fungal sinusitis. Even though the calcification detection rate was 91%, the classification accuracy was 97.5%, as fungal sinusitis typically led to multiple calcifications in the sinuses. Furthermore, when we evaluated the false positive cases with class 2, we found that the peripheral portion of larger calcifications or tiny calcific foci surrounding dense dystrophic calcifications might be misinterpreted as Class 3 (S1 Fig).

Detecting calcifications and classifying them based on patterns is very important in the interpretation of medical images of not only fungal sinusitis but also many other diseases as well because many diseases have unique calcifications on medical images. Some cancers including breast and thyroid cancers have unique microcalcifications and vascular diseases such as atherosclerosis and coronary artery diseases have typical vascular wall calcification patterns [3538]. Our deep learning algorithms can be applied to those diseases as well. Furthermore, this technique could be adapted for use in radiographic imaging to identify and classify regions with specific attenuation characteristics other than calcifications. For example, by modifying the algorithm to detect areas with low attenuation, it could be applied to a wider range of conditions that involve fat, air, or other substances with low attenuation, thereby broadening its clinical utility.

Class 1 (no abnormal calcification) showed a lower AUROC (0.54) than the calcification classes. This does not contradict the confusion-matrix accuracy but reflects two design and data characteristics. First, we operated the proposal stage at a low detection threshold to reduce missed calcifications, which increases the number of non-calcification candidates forwarded to the classifier and can raise the false-positive rate for the negative class at some thresholds, thereby depressing ROC discrimination. Second, negative patches are heterogeneous, including mucosal thickening, small bony structures, and dental or metallic artifacts, so the “no-calcification” boundary is intrinsically broader than for calcification-positive classes. In practice, class-specific operating points and probability calibration are likely to stabilize Class-1 behavior.

As summarized in Table 1, Class 2 (non-fungal calcification) is less prevalent than Classes 1 and 3 across cohorts (Training, Internal, Temporal, Geographic). Such imbalance can inflate overall or micro-averaged metrics while obscuring class-specific errors. The relative scarcity and greater phenotypic heterogeneity of non-fungal calcifications likely contribute to lower recall and PPV for Class 2 in our results. Because labels were assigned at the sinus level with left and right counted separately, a single case can contribute different classes across sides; this design increases statistical power but complicates direct comparison with case-level summaries. For prospective deployment, cohort-specific calibration and operating-point selection may be warranted to accommodate prevalence shifts between the Internal, Temporal, and Geographic cohorts.

When analyzing the multi-class confusion matrix in terms of precision and recall, it was observed that the predictive performance for internal validation data, which constituted a relatively large portion of the training dataset, was relatively strong. The model achieved high precision and recall across all classes, indicating robust performance on data similar to what it was trained on. Although the performance on geographic or temporal external test sets was slightly lower than that on the internal validation data set, it still demonstrated high accuracy.

However, there is a concern that the recall values were unusually high, especially on the internal validation set, which may indicate a potential overfitting issue. The model’s tendency to classify a large number of cases as positive could indicate that it has learned patterns specific to the training data, potentially reducing its ability to generalize. This is particularly troubling given that the robustness of segmentation was not thoroughly established, yet the recall figures were still significantly elevated. The high recall might mean the model is too aggressive in identifying positive cases, which could lead to overfitting. The concern about overfitting might be acceptable to some extent since the process identifies additional regions within a segmented area rather than measuring volumetric data directly. When viewed as a pre-processing step to narrow down the search area, high recall could be justified. This approach frames segmentation as part of a broader strategy to refine target areas for further analysis. External validation using temporal and geographic test sets showed consistent performance, providing evidence against overfitting concerns. These results suggest the model captures meaningful patterns beyond the training data. Future work could address overfitting further by applying cross-validation with diverse datasets or using stricter regularization techniques during training.

There are several limitations in this study. First, 554 images represent a relatively small dataset for training a deep learning algorithm. To enhance the robustness and generalization of the model, multiple data augmentation techniques were employed, such as random rotation, scaling, and slight shifts of the original images, ensuring that the anatomical structures of the maxillary sinus remained intact. Although data augmentation techniques were employed to address data limitations, they may not fully replace an adequately large and balanced dataset. As more data is better for training an algorithm, further study with a large number of cases is needed. Second, we performed segmentation of the maxillary sinuses to facilitate calcification detection and other sinus areas were not investigated. To overcome this limitation, further research is needed, possibly involving the development or application of different models tailored to detect fungal sinusitis in additional areas. Expanding the scope of this model would provide a more comprehensive approach to diagnosing and managing fungal sinusitis across all affected sinus regions. Third, we included only non-invasive fungal sinusitis, specifically fungal ball, which typically exhibits characteristic calcification patterns on radiographic images. Allergic fungal sinusitis and invasive fungal sinusitis (both acute and chronic forms) were not included in this study. Allergic fungal sinusitis usually presents with hyperintense sinus contents, which largely overlap with findings in other inflammatory sinus diseases that contain high levels of proteinous or mucinous material. Invasive fungal sinusitis, on the other hand, is characterized by bony destruction and direct invasion into adjacent structures. These features were not assessed in the present study. Future research incorporating these findings will be necessary to achieve a more comprehensive evaluation of all types of fungal sinus diseases. Last, the number of Class 2 cases was relatively small, especially for the temporal external test set. That lowered the accuracy of the algorithm in the external tests.

Conclusion

This study demonstrated the effectiveness of a deep learning-based algorithm for detecting and classifying intrasinus calcifications on PNS CT to diagnose fungal sinusitis, specifically fungal ball. The developed framework achieved high accuracy in segmentation, detection, and classification of intrasinus calcifications, with consistent performance across internal and external test datasets. By focusing on calcification patterns, the proposed algorithm enhanced diagnostic precision for fungal sinusitis, highlighting its potential for broader application in medical imaging. Further optimization of this approach could support more efficient and accurate diagnostic workflows in clinical settings.

Supporting information

S1 Fig. Two cases diagnosed as chronic sinusitis with dystrophic calcifications (Class 2).

The algorithm misinterpreted these cases as fungal calcifications (Class 3). a) The lowest portion of a large calcification appears as punctate calcifications (red arrows). b) A small calcification was found adjacent to a large dense calcification (red arrows).

https://doi.org/10.1371/journal.pone.0340832.s001

(TIF)

References

  1. 1. Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag. 2012;29(6):82–97.
  2. 2. Leung MKK, Xiong HY, Lee LJ, Frey BJ. Deep learning of the tissue-regulated splicing code. Bioinformatics. 2014;30(12):i121-9. pmid:24931975
  3. 3. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49. pmid:27295650
  4. 4. Zhao C, Shao M, Carass A, Li H, Dewey BE, Ellingsen LM, et al. Applications of a deep learning method for anti-aliasing and super-resolution in MRI. Magn Reson Imaging. 2019;64:132–41. pmid:31247254
  5. 5. Aldoj N, Lukas S, Dewey M, Penzkofer T. Semi-automatic classification of prostate cancer on multi-parametric MR imaging using a multi-channel 3D convolutional neural network. Eur Radiol. 2020;30(2):1243–53. pmid:31468158
  6. 6. Fujima N, Andreu-Arasa VC, Onoue K, Weber PC, Hubbell RD, Setty BN, et al. Utility of deep learning for the diagnosis of otosclerosis on temporal bone CT. Eur Radiol. 2021;31(7):5206–11. pmid:33409781
  7. 7. Halder A, Dey D, Sadhu AK. Lung nodule detection from feature engineering to deep learning in thoracic CT images: a comprehensive review. J Digit Imaging. 2020;33(3):655–77. pmid:31997045
  8. 8. Hata A, Yanagawa M, Yamagata K, Suzuki Y, Kido S, Kawata A, et al. Deep learning algorithm for detection of aortic dissection on non-contrast-enhanced CT. Eur Radiol. 2021;31(2):1151–9. pmid:32857203
  9. 9. Li L, Wei M, Liu B, Atchaneeyasakul K, Zhou F, Pan Z, et al. Deep learning for hemorrhagic lesion detection and segmentation on brain CT images. IEEE J Biomed Health Inform. 2021;25(5):1646–59. pmid:33001810
  10. 10. Lee KJ, Ryoo I, Choi D, Sunwoo L, You SH, Jung HN. Performance of deep learning to detect mastoiditis using multiple conventional radiographs of mastoid. PLOS ONE. 2020;15:e0241796.
  11. 11. Malik H, Anees T, Din M, Naeem A. CDC_Net: multi-classification convolutional neural network model for detection of COVID-19, pneumothorax, pneumonia, lung Cancer, and tuberculosis using chest X-rays. Multimed Tools Appl. 2023;82(9):13855–80. pmid:36157356
  12. 12. Malik H, Naeem A, Naqvi RA, Loh W-K. DMFL_Net: a federated learning-based framework for the classification of COVID-19 from multiple chest diseases using X-rays. Sensors (Basel). 2023;23(2):743. pmid:36679541
  13. 13. Kamel SI, Levin DC, Parker L, Rao VM. Utilization trends in noncardiac thoracic imaging, 2002-2014. J Am Coll Radiol. 2017;14:337–42.
  14. 14. Lee SM, Seo JB, Yun J, Cho Y-H, Vogel-Claussen J, Schiebler ML, et al. Deep learning applications in chest radiography and computed tomography: current state of the art. J Thorac Imaging. 2019;34(2):75–85. pmid:30802231
  15. 15. Choi D, Sunwoo L, You S-H, Lee KJ, Ryoo I. Application of symmetry evaluation to deep learning algorithm in detection of mastoiditis on mastoid radiographs. Sci Rep. 2023;13(1):5337. pmid:37005429
  16. 16. Flint PW, Haughey BH, Lund VJ, Niparko JK, Robbins KT, Thomas JR, et al. Cummings otolaryngology: head and neck surgery, vol. 3; 2015;1.
  17. 17. Chang T, Teng MM, Wang SF, Li WY, Cheng CC, Lirng JF. Aspergillosis of the paranasal sinuses. Neuroradiology. 1992;34(6):520–3. pmid:1436464
  18. 18. Yoon JH, Na DG, Byun HS, Koh YH, Chung SK, Dong HJ. Calcification in chronic maxillary sinusitis: comparison of CT findings with histopathologic results. AJNR Am J Neuroradiol. 1999;20(4):571–4. pmid:10319962
  19. 19. Zinreich SJ, Kennedy DW, Malat J, Curtin HD, Epstein JI, Huff LC, et al. Fungal sinusitis: diagnosis with CT and MR imaging. Radiology. 1988;169(2):439–44. pmid:3174990
  20. 20. Som PM, Lidov M. The significance of sinonasal radiodensities: ossification, calcification, or residual bone? AJNR Am J Neuroradiol. 1994;15(5):917–22. pmid:8059661
  21. 21. Du W, Kang W, Lai S, Cai Z, Chen Y, Zhang X, et al. Deep learning in computed tomography to predict endotype in chronic rhinosinusitis with nasal polyps. BMC Med Imaging. 2024;24(1):25. pmid:38267881
  22. 22. Kwon KW, Kim J, Kang D. Automated detection of maxillary sinus opacifications compatible with sinusitis from CT images. Dentomaxillofac Radiol. 2024;53(8):549–57. pmid:39107903
  23. 23. Massey CJ, Ramos L, Beswick DM, Ramakrishnan VR, Humphries SM. Clinical validation and extension of an automated, deep learning-based algorithm for quantitative sinus CT analysis. AJNR Am J Neuroradiol. 2022;43(9):1318–24. pmid:36538385
  24. 24. Zou C, Ji H, Cui J, Qian B, Chen Y-C, Zhang Q, et al. Preliminary study on AI-assisted diagnosis of bone remodeling in chronic maxillary sinusitis. BMC Med Imaging. 2024;24(1):140. pmid:38858631
  25. 25. Kim K-S, Kim BK, Chung MJ, Cho HB, Cho BH, Jung YG. Detection of maxillary sinus fungal ball via 3-D CNN-based artificial intelligence: fully automated system and clinical validation. PLoS One. 2022;17(2):e0263125. pmid:35213545
  26. 26. Kopp W, Fotter R, Steiner H, Beaufort F, Stammberger H. Aspergillosis of the paranasal sinuses. Radiology. 1985;156(3):715–6. pmid:4023231
  27. 27. Stammberger H, Jakse R, Beaufort F. Aspergillosis of the paranasal sinuses x-ray diagnosis, histopathology, and clinical aspects. Ann Otol Rhinol Laryngol. 1984;93(3 Pt 1):251–6. pmid:6375518
  28. 28. Stammberger H, Jakse R, Raber J. Aspergillus mycoses of the paranasal sinuses. Detection and analysis of roentgen opaque structures in fungal concretions. HNO. 1983;31(5):161–7. pmid:6347993
  29. 29. Fehr P, Diener PA, Benz D, Lorenz U. Ossification of the endometrium: an unusual finding in secondary sterility. Gynakol Geburtshilfliche Rundsch. 1993;33(1):31–3. pmid:8471882
  30. 30. Kim KM. Apoptosis and calcification. Scanning Microsc. 1995;9(4):1137–75; discussion 1175-8. pmid:8819895
  31. 31. Ahmadyar Y, Kamali-Asl A, Arabi H, Samimi R, Zaidi H. Hierarchical approach for pulmonary-nodule identification from CT images using YOLO model and a 3D neural network classifier. Radiol Phys Technol. 2024;17(1):124–34. pmid:37980315
  32. 32. Tan M, Wu F, Yang B, Ma J, Kong D, Chen Z, et al. Pulmonary nodule detection using hybrid two-stage 3D CNNs. Med Phys. 2020;47(8):3376–88. pmid:32239521
  33. 33. El-Bana S, Al-Kabbany A, Sharkas M. A two-stage framework for automated malignant pulmonary nodule detection in CT scans. Diagnostics (Basel). 2020;10(3):131. pmid:32121281
  34. 34. Sekeroglu K, Soysal ÖM. Multi-perspective hierarchical deep-fusion learning framework for lung nodule classification. Sensors (Basel). 2022;22(22):8949. pmid:36433541
  35. 35. Kim D, Kim DW, Heo YJ, Baek JW, Lee YJ, Park YM, et al. Computed tomography features of benign and malignant calcified thyroid nodules: a single-center study. J Comput Assist Tomogr. 2017;41(6):937–40. pmid:28448414
  36. 36. Konijn LCD, Takx RAP, Mali W, Veger HTC, van Overhagen H. Different lower extremity arterial calcification patterns in patients with chronic limb-threatening ischemia compared with asymptomatic controls. J Med. 2021;11. Available from: https://www.ncbi.nlm.nih.gov/pubmed/34072908
  37. 37. Mercado CL. BI-RADS update. Radiol Clin North Am. 2014;52(3):481–7. pmid:24792650
  38. 38. Mustapha JA, Diaz-Sandoval LJ, Saab F. Infrapopliteal calcification patterns in critical limb ischemia: diagnostic, pathologic and therapeutic implications in the search for the endovascular holy grail. J Cardiovasc Surg (Torino). 2017;58(3):383–401. pmid:28240525