Semi-supervised learning for an improved diagnosis of COVID-19 in CT images

Coronavirus disease 2019 (COVID-19) has been spread out all over the world. Although a real-time reverse-transcription polymerase chain reaction (RT-PCR) test has been used as a primary diagnostic tool for COVID-19, the utility of CT based diagnostic tools have been suggested to improve the diagnostic accuracy and reliability. Herein we propose a semi-supervised deep neural network for an improved detection of COVID-19. The proposed method utilizes CT images in a supervised and unsupervised manner to improve the accuracy and robustness of COVID-19 diagnosis. Both labeled and unlabeled CT images are employed. Labeled CT images are used for supervised leaning. Unlabeled CT images are utilized for unsupervised learning in a way that the feature representations are invariant to perturbations in CT images. To systematically evaluate the proposed method, two COVID-19 CT datasets and three public CT datasets with no COVID-19 CT images are employed. In distinguishing COVID-19 from non-COVID-19 CT images, the proposed method achieves an overall accuracy of 99.83%, sensitivity of 0.9286, specificity of 0.9832, and positive predictive value (PPV) of 0.9192. The results are consistent between the COVID-19 challenge dataset and the public CT datasets. For discriminating between COVID-19 and common pneumonia CT images, the proposed method obtains 97.32% accuracy, 0.9971 sensitivity, 0.9598 specificity, and 0.9326 PPV. Moreover, the comparative experiments with respect to supervised learning and training strategies demonstrate that the proposed method is able to improve the diagnostic accuracy and robustness without exhaustive labeling. The proposed semi-supervised method, exploiting both supervised and unsupervised learning, facilitates an accurate and reliable diagnosis for COVID-19, leading to an improved patient care and management.


Introduction
A global pandemic of coronavirus disease 2019 , declared by the World Health Organization (WHO), has been reported to affect over 19 million patients in >200 countries as of August 9, 2020. COVID-19, caused by the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is the 7th known coronavirus to infect humans [1], believed to be a zoonotic infection similar to severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) [2]. Fever and dry cough are the most common clinical symptoms of the disease [3], similar to many other viral syndromes. Nonspecific symptoms, including dyspnea, headache, muscle soreness, and fatigue, have been also observed. About 20% of the patients experience severe symptoms, and mortality is approximately 3% [4]. Elderly patients, in particular, have a higher chance of experiencing severe symptoms with increased mortality [5]. Therefore, there is a high demand for the immediate diagnosis, treatment, and management of the disease. A real-time reverse-transcription polymerase chain reaction (RT-PCR) test has been developed and used in clinics for the diagnosis of COVID-19. Even though it serves as the reference standard for the diagnosis, there is a growing number of evidence that CT could replace or complement to RT-PCR test for an improved patient care [6]. Several studies have identified imaging features for COVID-19 on CT [7]. Common CT features include bilateral and peripheral ground-glass and consolidative pulmonary opacities with or without vascular enlargement, interlobular septal thickening, and air bronchogram sign [8,9]. A longer time of infection has been also associated with more frequent CT findings, including greater total lung involvement, linear opacities, crazy-paving pattern, and the reverse halo sign [9], i.e., CT may be able to monitor and visualize the early, progressive, and severe status of COVID-19. Moreover, CT findings have shown to be effective in identifying COVID-19 for the patients with an initial false-negative result from RT-PCR test [10,11]. However, the enormous number of cases overwhelms the capacity of hospitals and radiologists, potentially leading to an inaccurate diagnosis. An automated, accurate, objective, and robust computerized system for COVID-19 diagnosis could aid in improving the diagnostic accuracy and yield and patient management and care.
Deep learning has been successfully applied to a wide range of problems in medical image processing and analysis [12,13]. For example, it can automatically detect abnormalities on head CT scans [14]; diabetic retinopathy and related eye disease can be identified in retinal images with high accuracy [15]; it is also able to detect lymph node metastases [16] as well as segment tissues [17] in pathology images. One of the immediate challenges in deep learning is to obtain a sufficient number of samples with pertinent labels. The capability of deep learning in dealing with a high quantity and quality of samples has been already proved in many applications [18]. Many of previous deep learning methods for medical imaging, in particular, have been mainly developed in a supervised fashion, requiring extensive data annotations by experts. However, it is fairly hard to acquire a large-scale dataset for COVID-19 with accurate labels in regard to the shortage of radiologists in clinics, the emergency of the disease, and etc. With a limited amount of samples, the performance of such methods is in question. Prior knowledge on the disease could aid in extracting useful features for the diagnosis, and thus alleviating the problem of lack of the labeled data. The level of the clinical and scientific understanding of COVID-19 is still immature, i.e., difficult to incorporate prior knowledge in developing a CT diagnostic system for COVID-19.
Several research efforts have been made to develop deep learning methods that can identify patients with COVID-19 in CT images [19]. Previous works can be roughly grouped into two types: 1) slice-level classification and 2) patient-level classification. Slice-level classification approaches conduct the classification either per CT image or per region in CT images. For example [20], proposed a deep learning framework for the classification of COVID-19, other pneumonia, and no pneumonia. [21] utilized an image processing technique to extract lung regions, a 3D CNN to select candidate regions, and a classification model to categorize the regions into COVID-19, Influenza-A-viral-penumonia, and irrelevant-to-infection. Many of patient-level classification approaches have adopted multiple neural networks for the diagnosis of COVID-19. For instance, two CNNs for lung/lobe segmentation and COVID-19 classification [22][23][24] and an attention-based 3D multiple instance network [25] and and a dual-sampling attention network [26] for COVID-19 diagnosis. Moreover [27], conducted the slicelevel infection segmentation for COVID-19 and [28] utilized a deep learning method to quantify lung burden changes in patients with COVID-19 from serial CT scans.
Semi-supervised learning is an approach where both labeled and unlabeled data could be utilized in a cooperative manner [29]. The amount of unlabeled data keeps increasing in medical imaging. Semi-supervised learning provides a way to utilize the unlabeled data without the cost of data annotation, which is a bottleneck of the technical advances. Its effectiveness in medical imaging has recently been demonstrated in several applications, including cardiac segmentation [30], lung nodule classification [31], and Sclerosis lesion segmentation [32]. As for CT images of COVID-19, although the amount of the labeled data is insufficient, there exists a great deal of lung CT datasets available to the public. Such lung CT datasets may contain CT images obtained from healthy subjects and patients with various types of diseases, including lung cancer, pulmonary nodules, and etc. By utilizing both CT images from the COVID-19 dataset and lung CT dataset, the diagnostic system for COVID-19 could explore more diverse and complicated patterns of lesions in lung, leading to an improved ability of characterizing and analyzing CT images. Therefore, semi-supervised learning could be beneficial to the development of the diagnostic system for COVID-19.
Herein, we present a deep learning-based system for COVID-19 diagnosis in CT images. The system is built upon advanced deep convolutional neural networks (CNNs) to extract and characterize latent features of infected lesions in CT images and to detect COVID-19 infection. During training, both COVID-19 and public lung CT datasets are used to train the deep learning system in a semi-supervised fashion. To assess the importance of semi-supervised learning, we conduct a number of comparative experiments with supervised and semi-supervised learning. The experimental results demonstrate the effectiveness and efficiency of the proposed method in identifying COVID-19 in CT images.
The main contributions of our work are summarized as follows: • We propose a semi-supervised learning-based deep neural network for COVID-19 diagnosis in CT images that can extensively explore a wide range of CT images with and without the ground truth label.
• We employ two COVID-19 CT datasets and three public lung CT datasets without COVID-19 to train, validate, and test the proposed diagnostic system in a rigorous manner.
• Utilizing the advanced deep learning techniques and semi-supervised learning, we achieve an accuracy of 99.83%, sensitivity of 0.9286, specificity of 0.9832, and positive predictive value (PPV) of 0.9192 in distinguishing COVID-19 from non-COVID-19 CT images. In the classification between COVID-19 and common pneumonia CT images, 96.98% accuracy, 0.9968 sensitivity, 0.9548 specificity, and 0.9248 PPV are obtained. The network is trained and tested using the labeled COVID-19 dataset in a supervised fashion. To improve the stability and precision of the network, additional public CT datasets as well as an unsupervised learning approach have been adopted.

Materials and methods
where f l and θ l indicate the lth operation and the corresponding parameters, respectively. We also note that ψ m represents the set of the first m stages, i.e., c m ¼ ff l ; y l g m l¼1 .

Network architecture
The proposed network adopts the architecture of efficientNet-b0 [33]. The network is composed of 20 stages (Fig 2). The first stage (f 1 ) simply conducts a convolution operation with a kernel size of 3x3. From f 3 to f 17 , each stage utilizes a mobile inverted bottleneck convolution (MBConv) block, which consists of a series of an expand point-wise convolution (1x1 convolution that expands the number of channels), a depth-wise separable convolution (a single convolution per channel), and a project point-wise convolution (1x1 convolution to project features to a low-dimensional space). f 2 omits the expand point-wise convolution. Each f l BN block has a shortcut connection between the input and the output of the block except f 3 , f 5 , f 7 , and f 13 blocks. A squeeze-and-excitation scheme is also utilized to incorporate the channelwise interdependencies. The squeeze-and-excitation scheme aims to aggregate the global information of feature maps via a global average pooling (squeeze) and to recalibrate the feature maps via a combination of a sigmoid function, RELU function, and convolutions (excitation). The last stage, including a 1x1 convolution layer, an average pooling layer, and a fullyconnected layer, conducts the classification. Swish function, instead of ReLU, is adopted as the activation function of the network. We can divide the whole network into two parts-a feature extractor (up to 19th stage) and a classifier (20th stage).

Loss functions
The set of the parameters of the proposed network fy l g L l¼1 is optimized by utilizing the loss function L defined as: where L CE is a standard cross entropy loss function for supervised learning, L Cons is a consistency loss function for unsupervised learning, and λ is a weighting factor for L Cons . For L CE , only the labeled data is utilized. As for L Cons , both labeled and unlabeled data can be used, ignoring the ground truth label for the labeled data.

PLOS ONE
Semi-supervised learning for COVID-19 diagnosis L CE is adopted to calculate the total entropy between p and y as follows: where p i is the output of the CNN ψ as given the input image x i , i.e., p i = ψ(x i ; θ 1 ,. . .,θ L ). During training, L CE is minimized using the labeled CT images in a supervised manner.
To improve the consistency and robustness of the CNN ψ, we force the CNN ψ to be invariant to a perturbation in an input image. Given an unlabeled input imagex i , ψ L−1 , i.e., a feature extract, produces a high-dimensional feature vector Suppose that there exists a set of augmentation functions T ¼ ft i : i ¼ 1; . . . ; kg that can deform the imagex i while maintaining its intrinsic structural and functional characteristics where k is the number augmentation functions. Then,x i and T ðx i Þ carry the same or similar latent information. There should be no or minimal difference between the two high-dimensional feature vectors c LÀ 1 ðx i ; y 1 ; . . . ; y LÀ 1 Þ and c LÀ 1 ðT ðx i Þ; y 1 ; . . . ; y LÀ 1 Þ. We define a consistency loss as:

Implementation details
Given an input image, we resize its spatial size to 256 x 256 pixels. Then, a number of data augmentation techniques are applied to transform the shape of the image as follows: 1) a random scaling in a range of [0. 8,12]; 2) a random translation within 1% of the width and height of the image; 3) a random shearing in a range of [-5˚, 5˚]; 4) a random horizontal flip with a probability of 0.5. Following the center-crop of size 224 x 224 pixels, one of three operations-a Gaussian blur, a median blur, and an additive Gaussian noise-are randomly selected and applied to make slight changes in the intensities of the image: 1) a Gaussian blur uses a Gaussian kernel with σ whose value is randomly chosen in a range of [0.0, 3.0]; 2) a median blur has a random kernel size with one of 3, 4, and 5; 3) an additive Gaussian noise is added to each channel that is randomly sampled from a Gaussian probability density function with μ = 0 and σ is a ran- We extend the two COVID-19 CT datasets by introducing three publicly available lung CT datasets: For these three public, called as Data PUB , we manually select a number of slices that can clearly visualize a lung. In the training set (Train PUB ), we randomly select 5 slices per scan for efficient training and balance with Data COV . In the test set (Test PUB ), all the slices are utilized. The details of the training set, validation set, and test set are available in Table 1. Some exemplary CT images are shown in Fig 3.

Experimental setup
To evaluate the proposed diagnostic system for COVID-19, we conduct two types of experiments. The first type of the experiments, employing Data COV and Data PUB , is to test if the

PLOS ONE
proposed system could distinguish COVID-19 CT slices from non-COVID-19 CT slices. The second type of the experiments, including Data COV-CP and Data PUB , attempts to assess whether the proposed system could discriminate between COVID-19 CT slices and CP CT slices. For the first type of the experiments, we train the proposed diagnostic system using both Train COV and Train PUB , select the best model based upon the performance on Validation COV , and test the best model on Test COV and Test PUB . During the training phase, Train COV is used to compute both L CE (as labeled data) and L Cons (as unlabeled data) and Train PUB is used to calculate L Cons only. In the testing phase, all the slices from Test PUB are treated as negative. To further assess the effectiveness of semi-supervised learning, we conduct a series of comparative experiments. First, the diagnostic system is trained in a supervised manner using Train COV only. Second, both Train COV and Train PUB are used for supervised learning only. Third, we train the diagnostic system using Train COV in a supervised and unsupervised fashion. Last, both Train COV and Train PUB are used for supervised and unsupervised training. In summary, the first two experiments are to compare semi-supervised learning with supervised-learning. The latter two experiments are to investigate the relationship between semi-supervised learning and the datasets. For all the experiments, the best model is chosen using Validation COV and is tested on Test COV and Test PUB .
Similarly, Data COV-CP and Data PUB are used to train the diagnostic system, select the best model, and test on the test set in the second type of the experiments. For the training purpose, L CE is computed using Train COV-CP and L Cons is calculated using both Train COV-CP and Train-PUB . In the testing phase, Test PUB is omitted since it includes CT scans from healthy subjects and patients with other diseases such as cancer. As for the comparative experiments, we conduct the first and third comparative experiments since Test PUB cannot be adopted as negative sample in this type of experiments.

Detection of COVID-19 in CT images
In distinguishing COVID-19 CT slices from non-COVID-19 slices, the proposed semi-supervised learning method obtained an overall accuracy of 99.83%, sensitivity of 0.9286, specificity of 0.9991, and positive predictive value (PPV) of 0.9192 on Test COV and Test PUB as shown in Table 2. In regard to the COVID-19 challenge dataset only (Test COV ), 94.67% accuracy, 0.9286 sensitivity, 0.9808 specificity, and 0.9891 PPV were achieved. On the public dataset (Test PUB ) where only negative (non-COVID-19) slices are included, the proposed method acquired 0.9835 specificity. Moreover, the proposed method was able to discriminate between COVID-19 CT slices and CP CT slices. The method achieved an accuracy of 97.32%, sensitivity of 0.9971, specificity of 0.9598, and PPV of 0.9326 on Test COV-CP (Table 3).

Comparative experiments: Supervised learning
In comparison to the COVID-19 detection model (first type of experiments), two models were built via supervised learning-one uses Train COV only and the other uses both Train COV and Train PUB . Using Train COV only, the performance of COVID-19 detection decreased by >0.6% in accuracy, >0.03 in sensitivity, >0.07 in specificity, and >0.44 in PPV on Test COV and Test-PUB . When it was tested on Test COV only, the performance degradation was even severer; >5% in accuracy. >0.03 in sensitivity, and >0.09 in specificity. Utilizing unsupervised learning, the ability of the classification model was substantially improved, in particular for the detection of COVID-19 samples.
Using both Train COV and Train PUB in a supervised fashion, there was a marginal improvement in the overall classification performance; for instance,~0.05% in accuracy,~0.02 in sensitivity, and~0.02 in PPV. However, the performance on Test COV was inferior to the proposed semi-supervised method; 2%,~0.1, and~0.05 decrease in accuracy, specificity, and PPV, respectively, leading to a substantial increase in false positive cases. As

Comparative experiments: Semi-supervised vs. supervised learning
In distinguishing COVID-19 from non-COVID-19 CT slices, each of the supervised methods was compared to its semi-supervised counterpart. Using Train COV in a semi-supervised manner, the accuracy, specificity, and PPV were improved by >0.3%,~0.004, and~0.15, respectively, on Test COV and Test PUB , in comparison to the supervised method. On Test COV only, 2%, >0.05, and >0.03 improvement in accuracy, specificity, and PPV, respectively, were observed. On Test PUB only, the semi-supervised method showed better performance in the specificity by >0.03. Similarly, using both Train COV and Train PUB , the semi-supervised method, by and large, outperformed the supervised counterpart. Moreover, we attained a performance gain by >0.03% in accuracy, >0.005 in specificity, and >0.008 in PPV as utilizing Train COV-CP in a semi-supervised manner on Test COV-CP . These results demonstrate the effectiveness of semisupervised learning in comparison to supervised learning for both classification tasks.
For both classification tasks, the semi-supervised method that uses either Train COV or Train COV-CP for both supervised and unsupervised learning was compared to the proposed

Discussions
A rapid, reliable, and accurate detection of COVID-19 has a direct bearing on the global healthcare due to the prevalence of the disease and its impact on the society. Although CT images have shown to be useful in detecting COVID-19, the limited understanding and lack of relevant COVID-19 data pose a difficulty in developing diagnostic tools. The proposed semisupervised learning approach demonstrates an alternative manner to develop an accurate and robust diagnostic tool for COVID-19 that can utilize both COVID-19 CT images and non-COVID-19 CT images. The proposed semi-supervised method was able to detect COVID-19 CT images with high accuracy. On the public CT images, including both healthy lungs and lungs with other types of diseases, the method was successfully applied. Even though the pubic CT images miss COVID-19, the good classification performance guarantees that there would be no or minimal misdiagnosis on healthy subjects with respect to COVID-19, lowering the burden and cost of healthcare. The additive value of the proposed method will be apparent as it is implemented as a pre-screening mechanism. For instance, the proposed method will mis-classify <1% of non-COVID-19 CT images as positive (at specificity of 0.9991), significantly lowering the workload, whereas 7.14% of COVID-19 CT images (at sensitivity of 0.9286) will be missed and~8% of the positive CT images will include non-COVID-19 CT images (at PPV of 0.9192). In regard to CP, <1% of COVID-19 CT images will be missed, while <5% of the CP CT images will make it through the screening and these CP CT images will occupy <8% of the positive CT images. We note that a direction comparison with the RT-PCT test may be misleading since the proposed method is applied to a per CT slice not per patient. Each CT scan per patient includes a number of CT images and a lesion may present in multiple CT images. The chance of mis-classifying multiple CT images, i.e., missing COVID-19 patients, will be likely lower than what reported here.
In comparison to the supervised methods, the effectiveness and efficiency of the proposed semi-supervised method have been demonstrated. The supervised method, trained on either Train COV or Train COV-CP , was inferior to the proposed method. The proposed semi-supervised method is capable of not only improving the diagnostic accuracy on COVID-19 CT images but also lowering the rate of misdiagnosis rate for non-COVID-19 CT images. However, the supervised method, utilizing both the COVID-19 dataset (Train COV ) and public dataset (Train PUB ), was generally superior to the proposed semi-supervised method by a small margin. Since supervised learning requires labeled data and labeling is, in general, costly and time-consuming, it is hard to further extend the approach. There is a trade-off between the performance and the labeling cost. The proposed semi-supervised method allows us to improve the classification performance as close as the supervised method without the high cost of data labeling.
The effect of semi-supervised learning was not dependent on the training dataset. For both supervised methods-one uses Train COV only and the other uses both Train COV and Train PUB , the adoption of unsupervised learning approach, i.e., calculating L Cons , gives rise to the improvement in the classification performance in general. This indicates that the proposed method enhances the utility of the available dataset, leading to an improved performance in disease diagnosis.
In semi-supervised learning, the classification performance varies depending on the training strategy. Although the same dataset (Train COV or Train COV-CP ) is used for supervised training, the addition of the extra dataset (Train PUB ) for unsupervised training results in an improved ability of COVID-19 detection, emphasizing the importance of the amount and diversity of the dataset in unsupervised training. Furthermore, we gained a slight performance gain by adding the labeled dataset (Train PUB ) in supervised training while the training dataset (Train COV and Train PUB ) remains the same for unsupervised training. As we discussed above, this requires an extensive labeling for all available datasets, which is impractical and inefficient in many of medical imaging domains.
This study has several limitations. A limited amount of CT images with COVID-19 were utilized in this study. The reported performance should be understood with respect to the characteristics of the dataset used here. A follow-up study should be conducted to further validate the accuracy and utility of the proposed method. The proposed method detects COVID-19 per CT slice. Localization of COVID-19 in a CT slice could aid in analyzing the disease and the classification results. The improved learning ability via the proposed semi-supervised approach should be applicable to the localization of COVID-19 on CT images.

Conclusions
Herein, we propose an advanced, diagnostic tool for COVID-19 in CT images. The approach adopts a semi-supervised learning scheme that exploits both supervised and unsupervised learning. The experimental results show that the proposed semi-supervised method is able to detect CT images with COVID-19 in an accuracy and robust manner. Adopting the public CT images in an unsupervised fashion, in particular, the proposed method achieves a substantial performance gain, indicating that the proposed method is able to utilize the non-COVID-19 CT images for an improved COVID-19 diagnosis. The approach is generic and can be applicable to other types of diseases and datasets in other domains to enhance the learning capability of the classification model and to extend the utility of the dataset.