Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks

Owing the epidemic of the novel coronavirus disease 2019 (COVID-19), chest X-ray computed tomography imaging is being used for effectively screening COVID-19 patients. The development of computer-aided systems based on deep neural networks (DNNs) has become an advanced open source to rapidly and accurately detect COVID-19 cases because the need for expert radiologists, who are limited in number, forms a bottleneck for screening. However, thus far, the vulnerability of DNN-based systems has been poorly evaluated, although realistic and high-risk attacks using universal adversarial perturbation (UAP), a single (input image agnostic) perturbation that can induce DNN failure in most classification tasks, are available. Thus, we focus on representative DNN models for detecting COVID-19 cases from chest X-ray images and evaluate their vulnerability to UAPs. We consider non-targeted UAPs, which cause a task failure, resulting in an input being assigned an incorrect label, and targeted UAPs, which cause the DNN to classify an input into a specific class. The results demonstrate that the models are vulnerable to non-targeted and targeted UAPs, even in the case of small UAPs. In particular, the 2% norm of the UAPs to the average norm of an image in the image dataset achieves >85% and >90% success rates for the non-targeted and targeted attacks, respectively. Owing to the non-targeted UAPs, the DNN models judge most chest X-ray images as COVID-19 cases. The targeted UAPs allow the DNN models to classify most chest X-ray images into a specified target class. The results indicate that careful consideration is required in practical applications of DNNs to COVID-19 diagnosis; in particular, they emphasize the need for strategies to address security concerns. As an example, we show that iterative fine-tuning of DNN models using UAPs improves the robustness of DNN models against UAPs.


Introduction
Coronavirus disease 2019 (COVID-19) [1] is an infectious disease caused by the coronavirus, called severe acute respiratory syndrome coronavirus 2. The COVID-19 epidemic started from Wuhan, China [2], and has had a severe impact on public health and the economy globally [3]. To reduce the spread of this epidemic, effective screening of COVID-19 patients is required. the generalization ability of DNNs, reduces model interpretability, and limits the applications of deep learning in safety-and security-critical environments [25]. Specifically, vulnerability is a severe problem in medical diagnosis [26]. Thus, it is important to evaluate the vulnerability of the proposed DNN-based systems to adversarial attacks (attacks based on UAPs, in particular) in practical applications. In addition, defense strategies against adversarial attacks (i.e., adversarial defense [22]) are required.
In this study, we focus on the COVID-Net models, which are representative models for detecting COVID-19 cases from chest X-ray images, and aim to evaluate the vulnerability of DNNs to adversarial attacks. Specifically, the vulnerability to non-targeted and targeted attacks, based on UAPs, is investigated. Moreover, adversarial defense is considered; in particular, we evaluate to what extent the robustness of COVID-Net models to non-targeted and targeted UAPs increases using adversarial retraining [23,27] (i.e., fine-tuning with adversarial images).

Universal adversarial perturbations
The UAPs for non-targeted and targeted attacks were generated using simple iterative algorithms [23,28], whose details are described in [23,28]. We used the non-targeted UAP algorithm available in the Adversarial Robustness 360 Toolbox (ART) [29] (version 1.0; github. com/IBM/adversarial-robustness-toolbox). The targeted UAP algorithm was implemented by modifying the non-targeted UAP algorithm in the ART in our previous study [24] (github. com/hkthirano/targeted_UAP_CIFAR10).
The algorithms consider a classifier, C(x), which returns the class or label with the highest confidence score for an input image, x. The algorithm starts with ρ = 0 (no perturbation) and iteratively updates the UAP, ρ, under the constraint that the L p norm of the perturbation is equal to or less than a small ξ value (i.e., kρk p � ξ), by additively obtaining an adversarial perturbation for an input image, x, which is randomly selected from an input image set, X, without replacement. These iterative updates continue until the number of iterations reaches a maximum i max .
We used the fast gradient sign method (FGSM) [21] to obtain an adversarial perturbation for the input image, instead of the original UAP algorithm [23], which uses the DeepFool method [30]. This is because FGSM is used for both non-targeted and targeted attacks, and DeepFool requires a higher computational cost than FGSM and only generates a non-targeted adversarial example for the input image. FGSM generates the adversarial perturbation,ρ, for x using gradient r x L(x, y) of the loss function at the specified image x and class y with respect to the pixels [21]. For the L 1 norm, a non-targeted perturbation that causes misclassification is computed asρ ¼ ϵ � signðr x Lðx; CðxÞÞ), whereas a targeted perturbation that causes C classification of an image x into class y is obtained asρ ¼ À ϵ � signðr x Lðx; yÞÞ, where ϵ (> 0) is the attack strength. For the L 1 and L 2 norms, a non-targeted perturbation is computed aŝ ρ ¼ ϵ � r x Lðx; CðxÞÞ= k r x Lðx; CðxÞÞ p k, whereas a targeted perturbation is obtained aŝ ρ ¼ À ϵ � r x Lðx; yÞ=k r x Lðx; yÞ k p .
In the algorithms, FGSM is performed based on the output C(x + ρ) of the classifier for the perturbed image x + ρ, at each iteration step. For non-targeted (targeted) attacks, an adversarial perturbation,ρ, for x + ρ is obtained using the FGSM if C(x + ρ) = C(x) � (C(x + ρ) 6 ¼ y). After generating the adversarial example (i.e., x adv x þ ρ þρ) at this step, the perturbation ρ is updated if C(x adv ) 6 ¼ C(x) (C(x adv ) = y) for non-targeted (targeted) attacks. When updating ρ, a projection function project, (x, p, ξ), is used to satisfy the constraint that kρk p � ξ: ρ project(x adv − x, p, ξ), where project(x, p, ξ) = arg min x 0 kx − x 0 k 2 subject to kρk p � ξ.
The non-targeted and targeted UAPs were generated using 13,569 training images in the COVIDx dataset. Parameter ϵ was set to 0.001; the cases where p = 2 and 1 were considered. Meanwhile, parameter ξ was determined based on the ratio z of the L p norm of the UAP to the average L p norm of an image in the COVIDx dataset. Cases in which z = 1% and 2% (i.e., almost imperceptible perpetuations) were considered. The average L 1 and L 2 norms were 237 and 32,589, respectively; i max was set to 15.
To compare the performance of the generated UAPs with that of random controls, we also generated random vectors (random UAPs) sampled uniformly from the sphere of a specified radius [23].

Vulnerability evaluation
To evaluate the vulnerability of the DNN models to UAPs, we used the fooling rate, R f , and targeted the attack success rate, R s , of non-targeted and targeted attacks, respectively. The R f of an image set is defined as the proportion of images that were not classified into their associated actual labels to all images in the set. The R s of an image set is the proportion of adversarial images classified into the target class to all images in the set. Additionally, we obtained the confusion matrices to evaluate the change in prediction owing to the UAPs for each class (infection type).

Adversarial retraining
We performed adversarial retraining to increase the robustness of the COVID-Net models to UAPs [23,27]; in particular, the models were fine-tuned with adversarial images, and the procedure was described in a previous study [23]. A brief description is provided below. 1) Ten UAPs against a DNN model were generated using the algorithm (for generating a non-targeted or targeted UAP) (see Materials and methods section) with the (clean) training image set. 2) A modified training image set was obtained by randomly selecting half of the training images and combining them with the rest, where each image was perturbed by a UAP randomly selected from 10 UAPs. 3) The model was fine-tuned by performing five extra epochs of training on the modified training image set. 4) A new UAP (against the fine-tuned model) was generated using the algorithm with the training image set. 5) R f and R s of the UAP for the test images were then computed. Steps 1)-5) were repeated five times.

Performance of COVID-Net models
The test accuracies of the COVIDNet-CXR Small and COVIDNet-CXR Large models were 92.6% and 94.4%, respectively, and their training accuracies were 95.8% and 94.1%, respectively. As shown in the COVID-Net study [7], we also confirmed that the COVID-Net models achieved good accuracies.

Vulnerability to non-targeted universal adversarial perturbations
However, we found that both COVIDNet-CXR Small and COVIDNet-CXR Large models were vulnerable to non-targeted UAPs (Table 1). Specifically, the fooling rate, R f , of the UAPs with z = 1% for the test image set was 81.0% at most. A higher z led to a higher R f . We observed that the R f of the UAP with z = 2% for the test image set was between 85.7% and 87.4%. Furthermore, the random UAPs with z = 2% misclassified the models; specifically, their R f were up to 22.1%. The change in R f did not exhibit significant dependence on the norm types (p = 2 or 1). The difference in R f for the test image set between p = 2 and p = 1 was up to 7%, the model and the other parameters being equal. R f of the UAP against the COVIDNet-CXR Small model was lower than that of the COVIDNet-CXR Large model in the case of z = 1%, the model and the other parameters being equal; however, no remarkable difference in R f between these models was observed in the case of z = 2%. The R f of the training image set was higher than that of the test image set because the UAPs were generated based on the training image set.
Owing to non-targeted UAPs, the models classified most images into COVID-19. Fig 1  shows the confusion matrices for the COVID-Net models attacked using non-targeted UAPs with p = 1. For the UAPs with z = 1%, the COVIDNet-CXR Small model classified >70% of the normal and pneumonia test images into COVID-19. Moreover, the COVIDNet-CXR Large model classified approximately 90% of the normal and pneumonia images into COVID-19. For a higher z, this tendency was more significant. In particular, the COVIDNet-CXR Small and Large models evaluated almost all normal and pneumonia test images as COVID-19 cases when z = 2%. Additionally, the tendency of adversarial images to be classified into COVID-19 was observed when considering UAPs with p = 2 and the training image set.
The non-targeted UAPs with z = 1% and z = 2% were almost imperceptible. Fig 2 shows the non-targeted UAPs p = 1 against the COVID-Net models and their adversarial images. The models classified the original X-ray images (left panels in Fig 2) and correctly predicted their actual classes; however, they evaluated all adversarial images as COVID-19 cases owing to the non-targeted UAPs. Similarly, the non-targeted UAPs p = 2 were almost imperceptible.

Vulnerability to targeted universal adversarial perturbations
Furthermore, we found that both the COVIDNet-CXR Small model (Table 2) and COVID-Net-CXR Large model (Table 3) were vulnerable to targeted UAPs. Subsequently, we considered the effect of the targeted attacks using UAPs in each class: normal, pneumonia, and  target classes. For the targeted attacks to normal and pneumonia, the R s of random UAPs for the test images were also relatively high; in particular, they were between approximately 35% and 45% and between approximately 30% and 45% for the COVIDNet-CXR Small model and COVIDNet-CXR Large model, respectively. It was difficult to classify the COVID-19 images into another targeted class (normal or pneumonia) when the UAPs were relatively weak (i.e., z = 1%).  normal (pneumonia) images were classified as targeted class pneumonia (normal). However, for a higher z (i.e., z = 2%), the targeted attacks of the COVID-19 images were successful; in particular, almost all COVID-19 images were classified into the target class (normal or pneumonia) because of the UAP. The classification of the images into COVID-19 using targeted UAPs was easier than that into the other classes. Owing to the UAP with z = 1%, the model judged approximately 80% of normal and pneumonia images as COVID-19 cases, respectively. Similar tendencies were observed in the COVIDNet-CXR Large model for targeted UAPs with p = 2 and on the training image set.
The targeted UAPs were also almost imperceptible. Fig 4 shows the targeted UAPs with p = 1 and z = 2% against the COVIDNet-CXR Small model and their adversarial images. The model classified the original images (left panels in Fig 4) and correctly predicted their actual classes (source classes); however, it classified the adversarial images into each target class because of the targeted UAPs. The UAPs with z = 1% were also imperceptible. Additionally, imperceptibility was confirmed in the UAPs with p = 2 and those against the COVIDNet-CXR Large model.

Effect of adversarial retraining
Adversarial retraining is often used to avoid adversarial attacks. In this study, we investigated the extent to which adversarial retraining increases the robustness of the COVIDNet-CXR Small model to non-targeted and targeted UAPs with p = 1. Adversarial retraining did not affect the test accuracy in either non-targeted or targeted cases; specifically, the accuracy on the (clean) test images remained constant at approximately 90% (Fig 5A and 5B).
For non-targeted attacks using UAPs with z = 2%, R f for the test images declined with the iterations for adversarial retraining; in particular, it was 22.1% after five iterations (Fig 5A). The confusion matrix (Fig 5C) for the fine-tuned model obtained after five iterations indicates that the normal and COVID-19 images were almost correctly classified despite the non-targeted UAPs. However, 45% of the pneumonia images were still misclassified. For targeted attacks to COVID-19 using UAPs with z = 1%, the R s for the test images decreased with the iterations for adversarial retraining ( Fig 5B); specifically, it was 16.5% after five iterations. The confusion matrix (Fig 5D) for the fine-tuned model obtained after five iterations indicates that the normal and COVID-19 images were almost correctly classified despite the targeted UAPs. However, 15% of the pneumonia images were still misclassified as COVID-19.

Discussion
The COVID-Net models were vulnerable to small UAPs; moreover, they were slightly less robust to random UAPs. The results indicated that the DNN-based systems were easy to mislead. Adversaries can result in failing the DNN-based systems at lower costs (i.e., using a single perturbation); specifically, they do not need to consider the distribution and diversity of input images when attacking the DNNs using UAPs, as UPAs are image agnostic. Considering that vulnerability to UAPs is observed in various DNN architectures [23,24], they are expected to exist universally in DNN-based systems for detecting COVID-19 cases.
For non-targeted attacks with UAPs, the COVID-Net models predicted most of the chest X-ray images as COVID-19 cases because of the UAPs (Fig 1), although the UAPs were almost imperceptible (Fig 2). This result is consistent with the tendency of DNN models to classify most inputs into a few specific classes because of non-targeted UAPs (i.e., existence of dominant labels in non-targeted attacks based on UAPs) [23]. Moreover, this indicates that the models provide false positives in COVID-19 diagnosis, which may cause unwanted mental stress to patients and complicate the estimation of the number of COVID-19 cases. The dominant label of COVID-19 observed in this study may be because the COVIDx dataset was imbalanced. The images in COVID-19 were predominantly fewer than those in normal and pneumonia cases. The algorithm considers maximizing the fooling rate; thus, a relatively large fooling rate is achieved when all inputs are classified into COVID-19 because of UAPs. In addition, the observed dominant label may be because the losses were computed by weighting the COVID-19 class to consider the imbalanced dataset. The decision for the COVID-19 class might be more susceptible to changes in pixel values than that for the other classes.
The relatively easy targeted attacks on COVID-19 (Fig 3) may be because COVID-19 was the dominant label. Moreover, targeted attacks to normal and pneumonia were possible, despite almost imperceptible UAPs (Fig 4). The results imply that adversaries can control DNN-based systems, which may lead to security concerns. The targeted attacks cause both false positives and negatives, and thus, can be used to adjust the number of COVID-19 cases. Here, R f and R s are for the test images. The accuracies (%) on the set of clean test images are also shown. The confusion matrices for the fine-tuned models were obtained after five iterations of adversarial retraining using the (C) non-targeted UAPs and (D) targeted UAPs. Note that these confusion matrices belong to the fine-tuned models attacked using non-targeted and targeted UAPs, respectively. https://doi.org/10.1371/journal.pone.0243963.g005 Moreover, they may affect individual and social awareness of COVID-19 (e.g., voluntary restraint and social distancing). These may lead to problems in terms of public health (i.e., minimizing the spread of the pandemic) and the economy. More generally, complex classifiers, including DNNs, are currently used for high-stake decision making in healthcare; however, they can potentially cause catastrophic harm to the society because they are often difficult to interpret [31].
The COVID-Net models, with tailored network architecture, seem to be more vulnerable to adversarial attacks than representative DNN models (e.g., VGG [32] and ResNet [33] models) for classifying ideal natural images (e.g., CIFAR-10 [34] and ImageNet datasets [35]). For these representative DNNs, UAPs with z = 5% and higher are required to achieve >80% success rates for non-targeted and targeted attacks [23,28]. Conversely, for the COVID-Net models, UAPs with z = 2% achieved >85% and >90% success rates for the non-targeted and targeted attacks, respectively. This result implies several possible reasons that caused the vulnerability of COV-ID-Net models. For example, the variance (visual difference) in chest X-ray images is much less than that in natural images. In this case, data points may aggregate around decision boundaries, indicating that the outputs of the DNN models are susceptible to changes in pixel values. As a result, adversarial examples are easy to generate. In addition, the fact that adversarial vulnerability of DNNs is known to increase with input dimension [36] may be one of the causes.
The UAPs used in this study are a type of white-box attack, which assumes that adversaries can access the model parameters (the gradient of the loss function, in this case) and training images; thus, they are security threats for open-source software projects, such as COVID-Net. A simple solution to prevent these adversarial attacks is to make DNN-based systems closedsource and publicly unavailable; however, this conflicts with the purpose of accelerating the development of computer-based systems for detecting COVID-19 cases and COVID-19 treatment. An alternative may be to consider black-box systems, such as closed application programming interfaces (APIs) and closed-source software in which only queries on inputs are allowed and outputs are accessible. Such closed APIs are better because they are at least publicly available. However, it is possible that APIs are vulnerable to adversarial attacks. This is because UAPs have generalizability [23] (i.e., UAPs for a DNN can mislead another DNN). That is, adversarial attacks on black-box DNN-based systems may be possible using the UAPs generated based on white-box DNNs. Moreover, several methods for adversarial attacks on black-box DNN-based systems, which estimate adversarial perturbations using only model outputs (e.g., confidence scores), have been proposed [37][38][39].
Therefore, defense strategies against adversarial attacks should be considered. A simple defense strategy is to fine-tune DNN models using adversarial images [22,23,27]. In fact, we demonstrated that iterative fine-tuning of a DNN model using UAPs improved the robustness of the DNN model to non-targeted and targeted UAPs (Fig 5). However, the iterative fine-tuning method required high computational costs, and it did not perfectly avoid vulnerability to UAPs. In addition, several methods breaching defenses using adversarial retraining have already been proposed [27]. Alternatively, dimensionality reduction (e.g., principle component analysis), distributional detection (e.g., maximum mean discrepancy), and normalization detection (e.g., dropout randomization) may be useful for adversarial defenses; however, adversarial examples are not easily detected using these approaches [27]. Defending against adversarial attacks is a cat-and-mouse game [26]; thus, it may be difficult to completely avoid security concerns caused by adversarial attacks. However, the development of methods for defending against adversarial attacks has advanced. For example, detecting adversarial attackbased robustness to random noise [40], the use of a discontinuous activation function that purposely invalidates the DNN's gradient at densely distributed input data points [41], and DNNs for purifying adversarial examples [42] may help reduce the concerns.
In conclusion, we demonstrated the vulnerability of DNNs for detecting COVID-19 cases to non-targeted and targeted attacks based on UAPs. However, many studies have developed DNN-based systems for detecting COVID-19 while ignoring the vulnerability. Our findings emphasize that careful consideration is required in developing DNN-based systems for detecting COVID-19 cases and their practical applications. Facile applications of DNNs to COVID-19 detection could lead to problems in terms of public health and the economy. Our study is the first to show the vulnerability of DNNs for COVID-19 detection and to alert such facile applications of DNNs. The code used in this study is available from our GitHub repository: github.com/hkthirano/UAP-COVID-Net. The chest X-ray images used in this study are publicly available online (see github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx. md for details).