Table 1.
Demographic study.
Fig 1.
The architecture of the custom U-Net with dropout and its performance curves.
Fig 2.
Segmentation workflow showing UNet-based mask generation and lung ROI cropping.
Fig 3.
The workflow of the proposed repeated CXR-specific pretraining and fine-tuning.
Table 2.
Datasets and their distribution used in various stages of learning.
Fig 4.
A custom wide residual network (WRN) with dropout regularization.
Fig 5.
The architecture of the CNNs used in the first stage of repeated CXR-specific pretraining.
I/P = Input, I-PCNN = truncated ImageNet-pretrained CNNs, ZP = Zero-padding, CONV = Extra convolution layer, GAP = Global Average Pooling, DO = Dropout, D = Final dense layer with Softmax activation.
Fig 6.
The architecture of the CNNs used in the second stage of pretraining.
I/P = Input, CXR-Pre-CNN = CXR-specific CNNs from the first stage of pretraining, truncated at their deepest convolutional layer, GAP = Global Average Pooling, DO = Dropout, D = Final dense layer with Softmax activation.
Fig 7.
The architecture of the CNNs fine-tuned toward COVID-19 detection.
I/P = Input, CXR-Pre-CNN = CXR-pretrained CNNs from the second stage of pretraining, truncated at their deepest convolutional layer, GAP = Global Average Pooling, DO = Dropout, D = Final dense layer with Softmax activation.
Fig 8.
Examples showing inter-reader variability in annotating COVID-19 disease ROI.
(A) and (B) show the annotations (bounding boxes in blue) of Rad-1 and Rad-2, respectively, for a given COVID-19 disease labeled image; (C) and (D) shows the GT annotations of Rad-1 and Rad-2, respectively for another COVID-19 disease labeled image.
Table 3.
Performance metrics achieved during the first-stage of CXR-specific pretraining.
Fig 9.
Performance achieved using the VGG-19 model during the first-stage of CXR-specific pretraining.
(A) Confusion matrix; (B) ROC curves; (C) Normalized Sankey flow diagram.
Table 4.
Performance metrics achieved by the models during the second stage of CXR-specific pretraining.
Fig 10.
Performance achieved using the DenseNet-121 model during the second stage of CXR-specific pretraining.
(A) Confusion matrix; (B) ROC curves; (C) Normalized Sankey flow diagram.
Table 5.
Performance metrics achieved with fine-tuning the second-stage pretrained models for COVID-19 detection.
Fig 11.
Performance achieved using the ResNet-18 model during fine-tuning for COVID-19 detection.
(A) Confusion matrix; (B) ROC curves; (C) Normalized Sankey flow diagram.
Table 6.
Performance metrics achieved during fine-tuning the second-stage pretrained models for COVID-19 detection is compared with the baseline.
Fig 12.
COVID-19 viral disease ROI CRM-based localization achieved using the fine-tuned models and their baseline counterparts.
(A) Original CXR with STAPLE-generated consensus ROI (shown as blue box ROI); (B) Baseline VGG-16; (C) Baseline VGG-19; (D) Baseline MobileNet-V2; (E) Baseline ResNet-18; (F) Baseline Inception-V3; (G) Fine-tuned VGG-16; (H) Fine-tuned VGG-19; (I) Fine-tuned MobileNet-V2; (J) Fine-tuned ResNet-18; (K) Fine-tuned Inception-V3.
Fig 13.
Performance achieved through weighted averaging of the top-3 fine-tuned CNNs toward COVID-19 detection.
(A) Confusion matrix; (B) ROC curves; (C) Normalized Sankey flow diagram.
Table 7.
Performance achieved with an ensemble of top-3, top-5, and top-7 fine-tuned models toward COVID-19 detection.
Table 8.
Performance achieved in terms of CRM-based IoU and mAP values by the individual fine-tuned CNNs using the radiologists’ annotations and STAPLE-generated ROI consensus annotation.
Table 9.
IOU and mAP values obtained with top-3, top-5, and top-7 ensembles using annotations of Rad-1, Rad-2, and STAPLE-generated consensus ROI annotations.
Fig 14.
Sample CXRs from two different patients (rows A-D and E-H, respectively) show ROI annotations generated.
(A) and (E) Rad-1 (in blue); (B) and (F) Rad-2 (in green); (C) and (G) Top-3 ensemble using STAPLE-generated consensus ROI (program) (in yellow); (D) and (H) STAPLE-generated consensus ROI annotation (in red).
Fig 15.
Instances of ensemble CRMs combining top-N ensemble ROI predictions.
(A) top-3 CNNs using STAPLE-generated consensus ROI annotation; (B) top-5 CNNs using Rad-2 annotations. The green box denotes reference ROI annotation and the blue box denotes ensemble CRM localization.
Fig 16.
(A) Mean plot for the mAP scores obtained by the top-N ensembles using Rad-1, Rad-2, and STAPLE-generated consensus ROI annotations; Error bars represent standard errors. The differences are not statistically significant; (B) Residual plot showing the data follow the normal distribution.
Table 10.
Consolidated results of Shapiro–Wilk, Levene, and one-way ANOVA analyses.
Fig 17.
Assessing inter-reader variability and program performance.
The following performance metrics are measured and plotted for 10 different IoU thresholds in the range (0.1–0.7): (A) Kappa statistic; (B) Sensitivity; (C) Specificity; (D) PPV.
Table 11.
Performance level assessment and inter-reader variability analysis using STAPLE-generated consensus ROI.