COVID-19 pneumonia on chest X-rays: Performance of a deep learning-based computer-aided detection system

Chest X-rays (CXRs) can help triage for Coronavirus disease (COVID-19) patients in resource-constrained environments, and a computer-aided detection system (CAD) that can identify pneumonia on CXR may help the triage of patients in those environment where expert radiologists are not available. However, the performance of existing CAD for identifying COVID-19 and associated pneumonia on CXRs has been scarcely investigated. In this study, CXRs of patients with and without COVID-19 confirmed by reverse transcriptase polymerase chain reaction (RT-PCR) were retrospectively collected from four and one institution, respectively, and a commercialized, regulatory-approved CAD that can identify various abnormalities including pneumonia was used to analyze each CXR. Performance of the CAD was evaluated using area under the receiver operating characteristic curves (AUCs), with reference standards of the RT-PCR results and the presence of findings of pneumonia on chest CTs obtained within 24 hours from the CXR. For comparison, 5 thoracic radiologists and 5 non-radiologist physicians independently interpreted the CXRs. Afterward, they re-interpreted the CXRs with corresponding CAD results. The performance of CAD (AUCs, 0.714 and 0.790 against RT-PCR and chest CT, respectively hereinafter) were similar with those of thoracic radiologists (AUCs, 0.701 and 0.784), and higher than those of non-radiologist physicians (AUCs, 0.584 and 0.650). Non-radiologist physicians showed significantly improved performance when assisted with the CAD (AUCs, 0.584 to 0.664 and 0.650 to 0.738). In addition, inter-reader agreement among physicians was also improved in the CAD-assisted interpretation (Fleiss’ kappa coefficient, 0.209 to 0.322). In conclusion, radiologist-level performance of the CAD in identifying COVID-19 and associated pneumonia on CXR and enhanced performance of non-radiologist physicians with the CAD assistance suggest that the CAD can support physicians in interpreting CXRs and helping image-based triage of COVID-19 patients in resource-constrained environment.

Introduction Hospital, Seoul, Korea) with following inclusion criteria: a) COVID-19 patients confirmed by RT-PCR between January 20th and March 20th, 2020; and b) patients underwent CXR and chest CT within 24 hours (Fig 1).
Patients without COVID-19 were also retrospectively included from a single institution (Seoul National University Hospital) with following inclusion criteria: a) patients with negative RT-PCR result for COVID-19; b) patients underwent CXR and chest CT within 24 hours; c) patients without any abnormality suggesting pneumonia on chest CT (Fig 1).

Deep learning-based CAD
We used a commercialized deep learning-based CAD (Lunit INSIGHT CXR 2, Lunit Inc., Seoul, Korea) for evaluating the CXRs. The CAD was designed to detect pulmonary nodules or masses, pulmonary infiltrates, and pneumothoraxes on CXRs. The CAD was initially trained using 54,221 normal CXRs and 35,613 abnormal CXRs (including 6,903 CXRs with pneumonia) [21], and was not specifically trained with CXRs from COVID-19 patients.
The CAD provided a probability score between 0 and 100% for the presence of abnormality on each CXR, with a heat map overlaid on the CXR for the localization of abnormality when the probability score was 15% or greater (Figs 2 and 3).

Definition of reference standards
For evaluation of CXR interpretation, we defined two different classes of reference standard: a) the diagnosis of COVID-19 by RT-PCR; and b) the presence of pneumonia on chest CT.
To define the reference standard for the presence of pneumonia, two thoracic radiologists (E.J.H and C.M.P, 9 and 21 years of experience in CXR and chest CT interpretation respectively) who were blinded to the CAD results reviewed the chest CT images, which were obtained within 24 hours from the CXRs. The two radiologists determined whether there was any finding of pneumonia on CT, in consensus.
Features of pneumonia on chest CT (the presence of consolidations, ground-glass opacities, and pleural effusion, and the bilaterality of pneumonia) were evaluated by one thoracic radiologist (E.J.H). For evaluation of the extent of pneumonia, a previously reported scoring system [30] based on the segmental involvement of the infiltrates (CT severity score, range 0-40) was adopted. For evaluation of CT severity score, bilateral lungs were divided into 20 regions based on lobar and segmental anatomy. Opacities of all 20 lung regions were visually evaluated by the thoracic radiologist on CT, and score 0, 1, and 2 were assigned if parenchymal opacities involved 0%, <50%, and �50% of each region, respectively. Finally, the CT severity score was defined as the sum of the individual scores of the 20 lung regions [30].

Reader test
A total of 10 readers (5 thoracic radiologists [5-29 years of experience in CXR interpretation] and 5 non-radiologist physicians) participated in a reader test. All physicians independently interpreted the CXRs, to address whether there was any abnormality suggestive of pneumonia. Physicians were informed to provide a five-point scale score for each CXR regarding the presence and absence of the abnormality suggestive of pneumonia: a) score 1, definitely absent; b) score 2, probably absent; c) score 3, equivocal; d) score 4, probably present; and e) score 5, definitely present. All readers were informed that CXRs were obtained from patients suspected for COVID-19, however, the readers were blinded to other clinical information.
At first, each physician interpreted the CXRs at their discretion (reader-alone interpretation). Subsequently, physicians were provided with the CAD results and asked to modify their original decision as needed (CAD-assisted interpretation).

Subgroup analyses
To evaluate performances of the CAD and readers for detection of pneumonia in patients with different clinical and radiologic findings, we compared sensitivities of the CAD and readers for identification of findings of pneumonia, in following subgroups: a) patients with symptom duration >5 days versus �5 days; b) patients with versus without consolidation on their CTs; and c) patients with CT severity score >10 versus �10.

Statistical analyses
We evaluated the performances of positive interpretation results by the CAD and physicians for the prediction of positive reference standards: The diagnosis of COVID-19 by RT-PCR, and the presence of pneumonia on CT. Area under the receiver operating characteristic curves (AUCs), sensitivities, and specificities were used for performance evaluations. For evaluating sensitivity and specificity, the CAD results with a probability score �15% were considered as positive results, while physicians' scores �3 were considered as positive interpretations. Average AUCs of multiple readers were obtained and compared using multiple reader multiple cases receiver operating characteristic analyses, as suggested by Obuchowski and Rockette show no pulmonary abnormality suggestive of pneumonia. The CAD system did not detect any abnormalities in the CXR and the probability score was 13% (C). In the reader-alone interpretation, four thoracic radiologists and four non-radiologist physicians misclassified the CXR as having findings of pneumonia. In the CAD-assisted interpretation, only one thoracic radiologist and two nonradiologist physicians made false-positive classification of the CXR. Mediastinal window CT image (D) show pulmonary embolism in the right descending pulmonary artery (arrow), presumed cause of patients' symptom. https://doi.org/10.1371/journal.pone.0252440.g003

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays [31]. Average sensitivities and specificities of multiple readers were estimated using generalized estimating equations. Inter-reader agreements among physicians in five-point scale scores and in binary classifications were evaluated with Fleiss' kappa coefficient.
All statistical analyses were done with R (version 3.6.3, R project for statistical computing, Vienna, Austria). A P-value <0.05 was considered to indicate a statistically significant difference.

Pvalues
Patients with pneumonia on chest CTs (n = 67)

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays specificity of the CAD was significantly lower than that of average thoracic radiologist (64.3% [95% CI, 59.9-68.6]; P = .032), and did not significantly differ from that of average non-radiologist physician (58.9% [95% CI, 54.4-63.3%]; P = .236). Regarding individual readers, the CAD exhibited significantly higher AUC than four non-radiologist physicians, significantly higher sensitivity than one thoracic radiologist and four non-radiologist physicians. However, the specificity of the CAD was significantly lower than those of three thoracic radiologists and two non-radiologist physicians ( Table 2, Fig 4).

Reader-alone interpretations versus CAD-assisted interpretations
As for inter-reader agreement, the CAD-assisted interpretation exhibited better agreement

Sensitivities in different subgroups
In the reader-alone interpretations, the readers exhibited significantly higher sensitivities for patients with a symptom duration >5 days (77.7%), for patients with consolidations on their CTs (73.4%), and for patients with CT severity scores >10 (84.5%), compared to those whose symptom duration was �5 days (59.4%; P < .001), those who did not have consolidations on their CTs (63.4%; P = .005), and those with CT severity scores �10 (46.9%; P < .001), respectively ( Table 4). The CAD exhibited similar trends with physicians, and exhibited a higher sensitivity for patients with a symptom duration >5 days (90.3%), for patients with consolidations on their CTs (90.6%), and for patients with CT severity scores >10 (89.5%), compared to those whose symptom duration was �5 days (71.4%; P = .068), those who did not have consolidations on their CTs (71.4%; P = .065), and those with CT severity scores �10 (69.0%; P = .059), although the differences did not reach statistical significance.

Discussion
Herein, we evaluated the performance of a commercialized deep learning-based CAD for identification of CXRs from RT-PCR positive COVID-19 patients and those with associated pneumonia proven by chest CT and compare it with those of thoracic radiologists and nonradiologist physicians. The performance of CAD (AUC, 0.714 and 0.790 for RT-PCR positive COVID-19 and associated pneumonia, respectively) was similar with those of thoracic radiologists (AUC, 0.712 and 0.784 for RT-PCR positive COVID-19 and associated pneumonia, respectively), and higher than those of non-radiologist physicians (AUC, 0.584 and 0.650 for RT-PCR positive COVID-19 and associated pneumonia, respectively).

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays Since CXRs and chest CTs of COVID-19 patients may appear normal, especially in the early stages of the disease [9,32,33], diagnosing COVID-19 using CXRs or CTs may be inappropriate [11][12][13][14]. In our study population, 16.3% of COVID-19 patients did not exhibit any findings of pneumonia on CTs. Not surprisingly, the performances of the CAD and readers

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays against RT-PCR results were unsatisfactory. Sensitivities of the CAD (71.3%) and thoracic radiologists (56.3-68.8%) were comparable to previously reported sensitivity (69%) of baseline CXRs by Wong et al [18]. In spite of limited diagnostic performance compared to RT-PCR testing, identification of radiologic findings of pneumonia is still clinically important for the following reasons: first, in situations with limited medical resources due to the outbreak, a timely diagnosis with an RT-PCR test can be limited. Since the results of radiologic examinations can be obtained faster

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays than RT-PCR results, it can aid timely clinical decision-making in resource-constrained environments. Second, radiological findings of pneumonia may precede positive RT-PCR results [18,[34][35][36]. In situations where there is a high pre-test probability, identification of pneumonia via radiologic examinations can help early diagnosis and allow for isolation in order to prevent further transmission. Third, since the extent of radiological findings of pneumonia can mirror the clinical severity of COVID-19 [30,37,38], radiological findings of pneumonia may aid hospitalization and intensive care decision-making, and may be utilized for monitoring the severity of the disease. The radiologist-level performance of the CAD for identification of findings of pneumonia suggests that it may facilitate the triage of CXRs with COVID-pneumonia, especially when interpretations from expert radiologists are limited or unavailable.
The CAD evaluated in the present study was not specifically trained for findings of COVID-19. Instead, it was trained for various types of abnormalities including pulmonary nodules and infiltrates. The reasonably high performance of the CAD indicates that an existing versatile CAD can be utilized for the detection of COVID-19 pneumonia. Indeed, radiographic findings of COVID-19 pneumonia include bilateral ground-glass opacities and consolidation [8][9][10]39], which has substantial overlap with pneumonia from other etiologies. Although it is difficult to directly compare the performance across different studies because of difference in test datasets, recent studies where deep learning-based CADs that were specifically trained for COVID-19 pneumonia reported higher AUCs compared to our results (AUCs, 0.81-0.99, Table 5) for the identification of CXRs from COVID-19 patients [27,28,40,41]. Additional training of the CAD with COVID-19 CXRs may improve the performance.
In addition to stand-alone performance, we also observed that the CAD may enhance the performances of readers. Although thoracic radiologists did not exhibit significant

PLOS ONE
Deep learning for COVID-19 pneumonia on chest X-rays improvement in performance in the CAD-assisted interpretation, less-experienced non-radiologist physicians exhibited significantly improved AUCs, sensitivities, and specificities in the CAD-assisted interpretations. Our results suggest that the CAD may help less-experienced readers to identify subtle findings of pneumonia on CXRs. In addition, the substantial improvement of inter-reader agreement using the CAD suggests that CAD can also help reduce variability among physicians in terms of CXR interpretations. Regarding sensitivities in different subgroups, the CAD exhibited similar trends to physicians. Higher sensitivity in patients with consolidations and higher CT severity scores may be related to better visibility of pulmonary infiltrates on CXRs. A previous study reported that the entire burden of pneumonia reflecting both the extent and density of pneumonia on CTs was a significant factor for the visibility of pneumonia on CXRs [19]. Higher sensitivities in patients with longer time intervals since their symptom onset (>5days) may also be associated with the extent of pneumonia. Previous studies reported that the extent of pneumonia on CXR tended to increase until 2 weeks after symptom onset [30,39]. Considering that imaging-based triage is indicated in patients with moderate to severe clinical symptoms [11], and disease extents in CTs reflect the clinical severity of diseases [30], we believe the CAD can help triage COVID-19 patients with CXRs.
Limitations exist in the present study. Despite being a multi-center study, our study included a limited number of patients. In addition, we included patients with available chest CT, as a reference standard for findings of pneumonia. Therefore, patients in our study did not fully reflect actual clinical situation, and the generalizability of our result may be limited. Additionally, the reader tests performed in our study also did not fully reflect actual clinical practice, where radiologists and clinicians may interpret CXRs in different environments from the reader test and can use the additional clinical and laboratory information. Therefore, the performances of the physicians in our study may not be fully reproducible in actual practice.

Conclusion
A commercialized, clinically-available deep learning-based CAD exhibited similar performances CAD (AUC, 0.714 and 0.790 for RT-PCR positive COVID-19 and associated pneumonia, respectively) with thoracic radiologists (AUC, 0.712 and 0.784 for RT-PCR positive COVID-19 and associated pneumonia, respectively) in identifying COVID-19 and associated pneumonia on CXRs, and outperformed non-radiologist physicians (AUC, 0.584 and 0.650 for RT-PCR positive COVID-19 and associated pneumonia, respectively). It also enhanced the detection performances of non-radiologist physicians (AUC, 0.584 to 0.664 and 0.650 to 0.738 for RT-PCR positive COVID-19 and associated pneumonia, respectively) and inter-reader agreement (Fleiss' kappa coefficient, 0.510 to 0.688). We believe that the CAD system can help less-experienced physicians to identify findings of pneumonia associated with COVID-19 on CXRs and to reduce variability among physicians in CXR interpretations. The CAD may facilitate imaging-based triage of COVID-19 patients in resource-constrained environment.